Wikipedia is now drawing facts from the Wikidata repository, and so can you

Wikidata, a centralized structured data repository for facts and Wikimedia’s first big new project in the last 7 years, is now feeding the foundation’s main project, Wikipedia.

The Wikidata project was kicked off around a year ago by the German chapter of Wikimedia, which is still steering its gradual development. For Wikipedia, the advantage is simple and powerful — if there’s a central, machine-readable source for facts, such as the population of a city, then any update to that data can be instantly reflected across all the articles in which the facts are included.

To posit a morbid example: a singer may have dozens or even hundreds of language versions of her Wikipedia entry and, if she were to die, the addition of a date of death to the Wikidata database would immediately propagate across all those versions, with no need to manually update each one (yes, I can also see how this might go horribly wrong).

Indeed, Wikidata is now being used as a common data source for all 286 Wikipedia language versions. Here’s the under-development “item” page for Russia, if you want to see what Wikidata looks like in practise.

Wikidata Russia

But the really interesting thing with Wikidata is that it’s not just for Wikipedia – although it’s worth remembering that its API is still under development, the database can be used by anyone as it is published under a Creative Commons 0 public domain dedication. Here’s how Wikidata project director Denny Vrande?i? put it in a statement:

“It is the goal of Wikidata to collect the world’s complex knowledge in a structured manner so that anybody can benefit from it, whether that’s readers of Wikipedia who are able to be up to date about certain facts or engineers who can use this data to create new products that improve the way we access knowledge.”

There are already some pretty cool (if bare-bones) examples of what people can do with Wikidata. One is GeniaWiki, which is trying to map the family relationships between famous people (the first and so far only example is that of the Bach family), while a Tree of Life project is trying to put together a viable, Wikidata-based “taxonomy of all life”.

It’s worth noting that the initial funding for Wikidata’s development has come from Google(s goog), the Gordon and Betty Moore Foundation, and the Allen Institute for Artificial Intelligence. Ultimately, Wikidata is precisely the sort of venture that is needed to feed the nascent semantic web and AI movement.

It’s far from the only venture in this space – I’d also recommend keeping a close eye on Google’s Knowledge Graph, which powers Google Now, and Wolfram|Alpha, which partly powers Siri(s aapl) – but all these (often intertwined) projects are essentially trying to do the same thing: to turn facts into something that machines can understand.

And that, in conjunction with advances in natural language processing and machine learning, will ultimately help us converse with machines. These are the building blocks of artificial intelligence and the future of search, and Wikidata’s very permissive license should act as an open invitation to anyone dabbling in this space.

8 Responses to “Wikipedia is now drawing facts from the Wikidata repository, and so can you”

  1. Only so long as the info at wikidata is kept top undisputed facts can it work for the betterment of all. My concern is that once it becomes a factual source that some with the power to do so may start branching out and adding content that is not undisputed including thing like politics, societal issues and even sciences like the hotly debated climate change and or carbon footprint. Regardless of where you land on that argument it’s clear that both are far from undisputed. If wikidata gains wide spread acceptance and gets to the point where people assume anything in it is accurate then it risks being abused and no guarantee from those who own/mange it now can ensure this won’t happen. Even if the wikidata controllers digitally promise under penalty of law that would not preclude the case where a third party comes in and takes over. This is how many a contracts between individuals and service/content providers have been undone.

  2. Honk Tonk

    Wikidata is great for easy access by machines, it will certainly enable many developers to easily build software on top of it. This has nothing to do with AI though.

    “Wikidata is precisely the sort of venture that is needed to feed the nascent semantic web and AI movement”.

    You should take a look at AI and e.g. Google these days – they look at blogs and whatever human readable, not-for-machines text they can find and extract semantic information. You will never be able to let machines access all human knowledge, by having humans enter it all in a way that a machine can understand it, we just don’t want to do that. It has to be the other way around: Machines will have to understand the way we communicate.

    I really have mixed feelings for this Wikidata thing because first and foremost, not all data in Wikipedia are accurate. Second, if Google Translate and other translation site cannot accurately translate a phrase (and form a coherent one either), do you think automatically translating it to another language would be wise?

    However, if we overlook all that, Wikidata is a great tool to ease the task of disseminating the updates thus giving all users more accurate information. Also, Wikepdia, consistently ranked as the top 6 most visited website by Alexa, does need to consistently update its content due to the large and diverse users relying on it.

    Your thoughts on this?

    The Wikimedia Foundation's first major new project in 7 years is now feeding the biggest project in that stable, Wikipedia itself. But anyone can take structured data from Wikidata, due to its open license.