Posts tagged ‘metadata’

digiKam, the perfect tool for Wikimedia Commons photographers?

From your computer to Commons using digiKam

From your computer to Commons using digiKam

For several years now, I have been a photographer for Wikimedia Commons. Commons is actually the reason why I became interested in photography, and why I bought a DSLR camera and various lenses. And I have ended up taking a lot of pictures.

The problem when you take a lot of pictures is that the time you need then to process and upload them grows exponentially. Besides, if you want to upload your pictures to several platforms (say, Commons, flickr and your blog), you have to deal with the specifics of each website. It would be much simpler and faster to describe, tag, classify, geotag the pictures only once (using an appropriate batch-editing tool) and then to have each website extract the metadata, or to use an upload tool that extracts them automatically.

Wikimedia Commons

So, on the one hand, you have Wikimedia Commons, a repository of freely-licensed multimedia files (photos, maps, diagrams, sounds, videos, etc.). It is a project of the Wikimedia Foundation, from which uploaded files can be used across all Wikimedia projects in all languages, including the famous encyclopedia Wikipedia.

Wikimedia Commons contains over 5 million files, the overwhelming majority of which have been contributed by volunteers. Such a large amount of files makes it mandatory to properly:

  • describe what each work is about
  • identify the author and the copyright status of each work
  • classify each work in order to be able to find it later and to manage such a large collection.

Besides these requirements, it is also considered a good practice to geotag your pictures (if applicable), i.e. to indicate the geographical location of where the picture was taken.

digiKam

On the other hand, there is digiKam, a FLOSSdigital asset management” application whose special features allows to annotate, classify and catalogue a large media library, although it provides some very handy editing and post-processing tools as well. digiKam uses the KIPI plugins, which provide handy features such as batch editing or export modules to a variety of websites, including Flickr, Picasa and Facebook. The 4th beta release of digiKam 1.0 was recently released and the official 1.0 release is planned for late October this year.

I won’t go through digiKam’s impressive list of features; instead, I will focus on those that make digiKam a very powerful tool for Wikimedia photographers.

The perfect combination?

So let’s take a quick look at the features digiKam offers that would be handy for people who want to contribute their pictures to Wikimedia Commons:

Commons requires or recommends… digiKam offers…
copyright information (author and license) full metadata read/write support, including Author and Copyright fields
multilingual descriptions for each file multilingual descriptions since digiKam 1.0, with appropriate language codes
classification with hierarchical categories classification with hierarchical tags
geotagging1 geotagging (automatically using GPS data or manually using Google Earth)

So, what’s missing? Well, you can do a lot of stuff with digiKam to prepare your pictures before they’re uploaded to Commons. Yet, you still have to upload them manually, or using an external tool such as Commonist. And even if you have been spending days to annotate your pictures, describe them in several languages, geotag them, etc., well, all this information remains in the image metadata. Neither Commonist nor MediaWiki knows how to extract these metadata meaningfully (yet); as a consequence, you have to redo all the work when uploading each file to Wikimedia Commons. The Ford Foundation recently awarded a $300,000 grant to the Wikimedia Foundation in order to improve the usability of the upload process. I expect some work will be done regarding the extraction of metadata from files. Yet, when uploading a large number of files, it would be more convenient to provide a standalone desktop upload tool like Commonist. Or…

What would really be neat is an integrated export module to Wikimedia Commons in digiKam; not only would it serve as a batch upload tool, but it would also read all the metadata and create the file description pages accordingly. A few weeks ago, I contacted the main developer of digiKam in order to discuss the possibility of implementing such an export module. He showed interest in this project even if he didn’t have the time to code it himself. As a consequence, I have initiated a roadmap and brainstorming page for the plugin.

Feel free to contact me if you’re interested in helping implement such a plugin.

Notes

  1. I recently noticed a JavaScript tool allowing users to geotag themselves, using a map from OpenStreetMap. This is exactly the kind of tool I would like to have for pictures; I can’t understand why it exists for users, but not for media files.

Ten features that would dramatically improve Wikimedia Commons

Logo of Wikimedia Commons

Logo of Wikimedia Commons

About two years ago, I said “Commons may be the next coolest project, as soon as developers find the time to improve its usability to make it more user-friendly”. Sadly, Wikimedia Commons hasn’t evolved much in terms of usability since then. MIT’s Technology Review recently published an article about improvements to come regarding the management of video content on Wikipedia and Wikimedia websites. I heard a lot of people say: “Good, but what about pictures?” Some technical improvements described by the Technology Review will be useful for both images and videos, such as the media and upload wizard currently developed by Michael Dale. However, Wikimedia Commons still needs many little (or big) features that would dramatically improve its user-friendliness.

Browsing & reusing

  1. Automatic localization: Websites such as Wikimedia Commons and meta-wiki host content in various languages and have a multilingual audience. These multilingual wikis should automagically detect the locale of the user’s browser and use it as language of the interface, especially for unregistered users. As for users with an account, their browser’s locale should be set as the default language in their preferences.
  2. Usage-centric page layout: It’s all very nice to know that such image is a “retouched picture” or that such diagram was “made using Inkscape”. But I think what most of the users want to know is: how to use the picture (in Wikimedia projects or elsewhere) and how to download it (using the best resolution available). Many people use the right-click-save-as method to save pictures from the Internet. If they do that on Commons, they will only save the low-resolution preview. There should be a big button « Download high-res », as well as snippets of code to embed a file with proper attribution.

Metadata

Full metadata support is the cornerstone of many other features. EXIF is probably the most known type of metadata, but there are also others such as IPTC or XMP.

  1. Pull metadata from files on upload: this idea is not a new one, yet it hasn’t been implemented. A fair amount of photographers add a lot of metadata to their files: author, description, copyright information, geotags, keywords, etc. and it is extremely cumbersome to have to redo all the work by hand during the upload.
  2. Store metadata in a database to make search and attribution easier, especially: description, license, media type (photo, diagram, map, etc.). It should be connected to the MediaWiki API to allow for easy extraction of these data.
  3. Push metadata to files on download: In the field of publishing, storing credit information directly into the file’s metadata is strongly recommended and is a standard practice to avoid losing track of it.

Related open bugs

  • bugzilla:6672: EXIF orientation not used (rotation from digital cameras)
  • bugzilla:3361: Image author, description, and copyright data saved in EXIF fields
  • bugzilla:16956: Show IPTC metadata on image description page
  • bugzilla:657: Pull copyright metadata from files on upload
  • bugzilla:11484: Include ISO rating in abbreviated exif metadata.

Editing

  1. Built-in basic editing features (lossless rotate, crop) and ability to save under another name (i.e. for crops). Similarly, a built-in geocoding feature using OpenStreetMap. Geocoding images means attaching geographic information about the place where the work was made. This may be made easier by the current initiative to integrate OpenStreetMap with Wikimedia projects. And of course it should save the coordinates as metadata.

Rating

  1. Some sort of community-managed rating feature; as someone said elsewhere, “Commons is a depository, and depositories are expected to host lots of junk”. A rating feature would allow the best of Commons to be presented first during the search, and junk to be presented last.

Searching

With currently more than 4.6 million files (and counting), it is becoming increasingly important to improve the search features of Wikimedia Commons.

  1. An “advanced search” feature similar to flickr’s. It should be possible to search by media type, by license, and to add toggles such as “safe mode” (explicit content) or “personality rights”.
  2. Multilingual search: Files on Commons are ordered in hierarchical categories, using English as lingua franca. If you want to find a file, you have to search in English. I imagine it is possible to use some dictionary (coupled to the language detection) to give good results for a search in any language.
  3. Google-Images-friendliness. A lot of people use Google Images to find pictures, but images from Wikimedia Commons rarely appear in these results (unless they are used on a Wikipedia page).

Note: All these ideas are given from a user point of view; their technical feasibility has to be assessed by a MediaWiki-literate developer.