When it comes to indexing and describing the content of videos, the key may just be to ask your users for help. At least, that seems to be the gist of a new study published this week by five researchers of the VU University Amsterdam, who analyzed around 420,000 user-generated tags collected during a video tagging project of the Netherlands Institute for Sound and Vision. The results not only show that most of these tags were useful, but also that they could complement any professional classification.
The Netherlands Institute for Sound and Vision has been testing video tagging with a special online game called Waisda that pairs up users and lets them compete by tagging the same video simultaneously, awarding points for both speed and accuracy of tags. The VU University researchers wanted to know how many of these tags are actually real words, as opposed to just mumbo-jumbo, and how these tags differ from the language used by professionals.
The idea of crowd-sourced tag-based taxonomies, sometimes also called “folksonomies,” became popular a few years ago on the heels of the success of Flickr and Delicious (s yhoo). Evangelists of tagging, such as Clay Shirky, have long claimed this kind of crowd-sourcing is the only economically feasible way to deal with the massive amounts of information generated online.
However, the once-obligatory Web 2.0 tag clouds have fallen out of fashion in recent years. The use of tags has become much more focused, and has, in some cases, been completely replaced by other types of filtering. Netflix (s NFLX), for example, relies heavily on user-contributed ratings instead of tagging to personalize its offering. The service does use a defined set of tags to categorize content, but doesn’t allow users to add their own tags.
Hulu, on the other hand, allows its users to tag any of its videos, and YouTube (s GOOG) also encourages creators to tag their videos. The study suggests these sites can benefit greatly from tags, especially because they capture descriptions and perspectives that people employed to categorize content tend to miss.
The researchers compared user-submitted tags with a thesaurus used by professionals to catalog audio-visual works. The result: Only eight percent of the tags submitted by users were also terms used by professional cataloguers. That means 92 percent of the tags used by average people don’t match the way professionals talk about this type of content. So if you want to help people like you and me find content, it helps to speak our language.
The study also shows there are some simple mechanisms to make tags more reliable: The Waisda game does encourage users to use tags that are more common, awarding points every time a player enters a tag that’s also used by his competitor. This mechanism adds some additional verification to the tags: terms used by more than one player are more likely relevant to the content than tags entered by just one person.
However, the researchers were sceptical of narrowing the focus too much by simply relying on these kinds of verified tags: “Our study shows that this approach would exclude many potentially useful tags,” they write, adding that most of the unverified terms still made sense to someone out there: The team entered tags that weren’t verified and couldn’t be found in either the professional thesaurus or a common dictionary into Google, and were able to get results for 82 percent of these search requests.
In other words: What you may see as a misspelled or otherwise useless tag may be someone else’s way to describe and ultimately discover your content.