Wikidata, a centralized structured data repository for facts and Wikimedia’s first big new project in the last 7 years, is now feeding the foundation’s main project, Wikipedia.
The Wikidata project was kicked off around a year ago by the German chapter of Wikimedia, which is still steering its gradual development. For Wikipedia, the advantage is simple and powerful — if there’s a central, machine-readable source for facts, such as the population of a city, then any update to that data can be instantly reflected across all the articles in which the facts are included.
To posit a morbid example: a singer may have dozens or even hundreds of language versions of her Wikipedia entry and, if she were to die, the addition of a date of death to the Wikidata database would immediately propagate across all those versions, with no need to manually update each one (yes, I can also see how this might go horribly wrong).
Indeed, Wikidata is now being used as a common data source for all 286 Wikipedia language versions. Here’s the under-development “item” page for Russia, if you want to see what Wikidata looks like in practise.
But the really interesting thing with Wikidata is that it’s not just for Wikipedia – although it’s worth remembering that its API is still under development, the database can be used by anyone as it is published under a Creative Commons 0 public domain dedication. Here’s how Wikidata project director Denny Vrandečić put it in a statement:
“It is the goal of Wikidata to collect the world’s complex knowledge in a structured manner so that anybody can benefit from it, whether that’s readers of Wikipedia who are able to be up to date about certain facts or engineers who can use this data to create new products that improve the way we access knowledge.”
There are already some pretty cool (if bare-bones) examples of what people can do with Wikidata. One is GeniaWiki, which is trying to map the family relationships between famous people (the first and so far only example is that of the Bach family), while a Tree of Life project is trying to put together a viable, Wikidata-based “taxonomy of all life”.
It’s worth noting that the initial funding for Wikidata’s development has come from Google, the Gordon and Betty Moore Foundation, and the Allen Institute for Artificial Intelligence. Ultimately, Wikidata is precisely the sort of venture that is needed to feed the nascent semantic web and AI movement.
It’s far from the only venture in this space – I’d also recommend keeping a close eye on Google’s Knowledge Graph, which powers Google Now, and Wolfram|Alpha, which partly powers Siri – but all these (often intertwined) projects are essentially trying to do the same thing: to turn facts into something that machines can understand.
And that, in conjunction with advances in natural language processing and machine learning, will ultimately help us converse with machines. These are the building blocks of artificial intelligence and the future of search, and Wikidata’s very permissive license should act as an open invitation to anyone dabbling in this space.