Three reasons why the Semantic Web has failed

47 Comments

The semantic web is the vision of a web of interconnected data and meaning. This global web of knowledge would be something computers could understand and therefore provide us with a new frontier of information retrieval and intelligent agents.

After two decades of failed attempts, semantic web has become a dirty word with investors and consumers. So what exactly went wrong? Why are we still so far away from the web of data? Here’s my take on it.

The web of Obsoledge

Most attempts at creating a knowledge repository have involved converting “expert knowledge” into a web of data. The result is an inherently boring web of data. Google’s Knowledge Graph promotional video is a great example of how boring this web can be. “Let’s say you’re searching for Renaissance Painters”…. Really? Who searches for that?

More accessible technology is causing an explosion of information. This has the effect of making the shelf-life of knowledge shorter and shorter. Alvin Toffler has – in his seminal book Revolutionary Wealth – coined the term Obsoledge to refer to this increase of obsolete knowledge.

If we want to create a web of data we need to expand our definition of knowledge to go beyond obsolete knowledge and geeky factoids. I really don’t care what Leonardo DaVinci’s height was or which Nobel prize winners were born before 1945. I care about how other people feel about last night’s Breaking Bad series finale. How did they find the ending? What other series or movies might I enjoy based on those experiences?

We are living in the Now. The Now is eating ever greater quantities of our attention. It’s drowning out the obsolete past. Human attention, sentiment and emotion are key elements to today’s information age. They cannot be ignored. They need to be at the very core of any web of data.

Documents are dead

Deriving structured information from Wikipedia documents – a common practice – is fundamentally flawed. Not only does this create a web of boring facts, it assumes that documents are the source of knowledge somehow. They’re not. They are only a small sliver of the stuff that matters. And it’s the underlying conversation and activity that matters.

There is a sea change happening in the web and how we use it. It?s an evolution to a second phase of the web – the real-time web, or what I call the “Stream.” In the Stream, the focus is on messages not web pages. These vast amounts of messages are generated by social interaction, by conversation, by attention, by ideas, by little chunks of thought unleashed into a gigantic stream of data.

This also changes the way machines communicate with each other. Machines are still programmed by humans, and humans – especially programmers – are going to be lazy. They will use the easiest most pragmatic way to get machines to communicate. They aren’t going to spend days learning complicated RDF or OWL specs. They will use simple communication using JSON. And all the cool kids have abandoned XML.

Information should be pushed, not pulled

One less obvious problem is one of information retrieval. For the past two decades we’ve gotten so used to keyword search that google became an actual verb. Unfortunately, keyword search is now fundamentally broken. The more information is out there, the worse keyword search performs.

Advanced query systems like Facebook’s Graph Search or Wolfram Alpha are only marginally better than keyword search. Even conversation engines like Siri have a fundamental problem. No one knows what questions to ask.

We need a web in which information (both questions and answers) finds you based on how your attention, emotions and thinking interconnects with the rest of the world.

Meet the synaptic web

Keyword search is broken and we’re drowning in an unstoppable stream of information. The need for a next generation of information retrieval is now higher than ever. Is the semantic web going to be that next paradigm? I don’t think so. Not unless we radically revisit what a ‘web of data’ means.

It’s time to ditch the old paradigms of documents, knowledge and keyword search. We live in a world of big data, real-time streams and human emotions. It’s time for a revolution in information retrieval. We need a web that’s dynamic and centered around humans. A web in which data flows in a smarter way. A web that understands you and makes the proper data find you. This web doesn’t look like a database or a graph. It’s a web that’s intelligent, dynamic and sometimes chaotic. It’s the digital equivalent of the human brain. I call it the Synaptic Web.

Dominiek ter Heide is the CTO and co-founder of Bottlenose, which combines big data technologies with specialized data mining to make sense out of streams.

47 Comments

Gary

You can keep your miley cyrus twerk engine. I will stick with renaissance painters.

MCD

BobDC has it in a nutshell.

[We] don’t want vendors like you deciding what to send [us].

By the way, Dominiek, when will you change your company’s name to SkyNet?

Jacqui

So you are not interested in facts and the past, but is that not where we go wrong, just living for today means we make the same mistakes again and again….. ahh yes now I see…..

Bob

“Let’s say you’re searching for Renaissance Painters”…. Really? Who searches for that?

I search for info about renaissance painters frequently. I also read Gigaom.

Is Dominiek culturally stunted, or does he just assume that everyone in the world is interested in exactly the same things he’s interested in?

One other comment:
As long a there are entire industries (e.g. medicine, law, and business consulting) where the production of a document is directly tied on a 1:1 basis with getting paid, then the current “paradigms of documents, knowledge and keyword search” aren’t going anywhere. You can certainly add new knowledge paradigms which may be ideal for some other businesses such as marketing or financial services. You can also provide supplementary sources of data for those businesses that are built around existing document-based structures, but the document isn’t going to be “ditched” our life times.

Matthias Samwald

That’s what happens when you want to have an opinion about science/technology, but actually have no interest in science/technology besides pirating Breaking Bad episodes and watching funny cat pictures.

Kiki

What a lot of assumptions you make!

It’d be interesting to see a discussion about information and how we can use/retrieve/ share/organize it more effectively that isn’t undergirded by the scent of a sales pitch (“my company does this”, etc.), not that I can fault you for trying. But it is rather transparent; having an agenda skews an objective assessment of something and causes me to consider your opinion with less trust.

I’m with bencomp on not wanting everything I search for or do tracked and spit back to me in the guise of better information (since its context is my own input). There are so many temptations in wanting every bit of information faster, sooner, more focused–but doesn’t it boil down to a bit of impatience on our part? Is it really that difficult to find things on the web, or is there simply a more economically-driven reason to demand more, more, more, faster, faster, faster?

In theory, I am all for finding new ways to slice and inspect information I’ve retrieved efficiently and quickly. I’ve worked in libraries and make my living as a writer, so information is my stock-in-trade, and yet this idea does not excite me…yet. I love the name, Synaptic Web is very clever, but I question whether I want my data finding me. My inbox is full of stuff already finding me; the sites I use are specific to my trade; my iPhone finds me what I need when I’m on the go; when I want to be “social”, I know where to go; I have some trouble finding info I need on occasion, but feel I can work at it a bit and get what I need. Maybe because I grew up with encyclopedias, this still seems like an amazing tool to have sitting on my lap.

Sumanta

Linked open Data and still in research and there is high optimism regarding it. Similar to Internet in Early 90’s people are still not entirely aware of the extensive capability of these concept. When u haven’t yet discovered the cabability of a concept how can u tell if it is dead or not. Again i dont agree with the concept that what’s happening “now” its worth looking for. What is happening now is important i may want to know what’s happning in “Breaking Bad” but thats a fad , a temporary information not a knowledge that i can use. And people search for knowledge or Interdependency of knowledge , not useless information.

George

Very interesting article and really insightful viewpoints. Apart from the social sphere of the web, it’s also the interpretation that each human adds to make sense so as for the semantic web to be meaningless for the web but meaningful for its user.

Alexandre Passant

“people care not about knowledge graphs but about the people and current events happening in their social graphs.”

Well, a very good way to understand what people in your social circles are talking about is to use knowledge graphs as background data. We’ve just wrote about in the music discovery context: http://blog.seevl.fm/2013/11/03/knowledge-graphs-for-discovery/

So no, the Semantic Web is not dead. It might simply be time for people to understand what it’s really about, and that its potential for understanding the Web as a Social Machine is huge.

Venkateshprasanna

I’m glad that there are so many comments that disagrees with the content already. I landed here through Nova Spivack’s tweet about the article, and it appears like a desperate attempt to market a new buzzword and nothing more.

Shallow statements like the ones quoted below, with no attempts made to substantiate such claims irrespective of their correctness, will not hold much water.

>> “After two decades of failed attempts, semantic web has become a dirty word with investors and consumers.”

>> “The result is an inherently boring web of data. ”

>> ““Let’s say you’re searching for Renaissance Painters”…. Really? Who searches for that?”

>> “And all the cool kids have abandoned XML.”

>> “Unfortunately, keyword search is now fundamentally broken. The more information is out there, the worse keyword search performs.”

>> “Keyword search is broken and we’re drowning in an unstoppable stream of information.”

>> “We are living in the Now. The Now is eating ever greater quantities of our attention. It’s drowning out the obsolete past. Human attention, sentiment and emotion are key elements to today’s information age.”

Despite a hemorrhage of social network status updates of which most make little or no sense in a broader context of the Web and of human knowledge in general, we still have a lot of content that is substantial in their depth, and are more useful than the claimed “Obsoledge”. In fact if some information is becoming obsolete quickly, it does not even qualify as “knowledge” by definition. Knowledge stems from well processed and researched set of concepts that stand the test of time, and leads to the emergence of next set of concepts with that as foundation. Buzzwords and unstoppable stream that drowns us don’t constitute knowledge. Knowledge also represents ideas, thoughts and arguments that are unaffected by sentiment and emotions.

>> “I really don’t care what Leonardo DaVinci’s height was or which Nobel prize winners were born before 1945.”

>> “I care about how other people feel about last night’s Breaking Bad series finale. How did they find the ending? What other series or movies might I enjoy based on those experiences?”

This section, as it has come out, are only your preferences. That does not mean that it can be generalized. If da Vinci’s height is an example being considered to scoff at the vision of the web of data and its possibilities, then it is either ignorance or an attempt at misleading people with inappropriate examples, while there are plenty of scenarios where connecting the dots like that have remarkable benefits.

What you / Nova choose to call as “Now” and “Stream” don’t need new buzzwords too, the audience now understands the effect social media is having on the Web itself, with more and more people coming forward and creating information on the Web than ever before, and “big data” and “data science” have gained acceptance from being mere buzzwords a few years back to representing a class of problems and showcasing a new set of possibilities in solving them from the underlying foundation of statistics and machine learning, in combination with distributed computing and service oriented architectures.

>> “We need a web in which information (both questions and answers) finds you based on how your attention, emotions and thinking interconnects with the rest of the world.”

>> “A web that understands you and makes the proper data find you.”

Recommender Systems are everywhere today, and there are success stories in e-commerce domain about their ability to improve sales, and in many other domains too. But they cannot serve all of my knowledge needs. There are also concerns about the effect it might have when the search engines start deciding what is useful for this person in India as against that person Russia. We are at a stage where the concepts of Information Retrieval, Machine Learning, Recommendation Systems, Statistics, Semantic Web and many more are converging towards solutions that might continue to make a difference (hopefully in ways that are acceptable to the knowledge needs of the humanity) incrementally. It is time for everyone working in this domain to see how new thoughts towards any of these fairly broad areas of (computer) science align with the current knowledge and where improvements can be brought in, and focus on adding value there, instead of spending energy coining a new buzzword and try professing how that differs fundamentally from any concept known to all of mankind till date.

Twine’s inability to catch the world’s attention does not necessarily mean that there is no hope for information retrieval, semantic web, etc. It just means that we need to keep plugging away and bring about more innovation and real world case studies (which are out there aplenty) to showcase the possibilities, understand the shortcomings, tune further, and continue to build on top of them.

I.

Whatever the shortcomings of Semantic Web might be, the emphasis here is on “boring” data, and a number of buzzwords that flow in a not better defined stream. Sorry but, while many of the semantic web aims have always left me a bit cold, and while many of the widespread practices have made me wonder and sometimes scream /Why/, _Why_, WHY? with various kinds of emphasis added, this talk of “boring” and “I don’t care” leaves me with the feeling that I’m reading a very weak counterargument.
A lot of people out there find coding “boring”. They don’t convince me either.
Do you have a more substantial, better described alternative to the semantic web? (Yes I’ve read about the synaptic web. I stopped at “neuronization”)

mashermack

“I really don’t care what Leonardo DaVinci’s height was or which Nobel prize winners were born before 1945. I care about how other people feel about last night’s Breaking Bad series finale”
I stopped reading there for how much blood pumped in my head, that’s the biggest ignorant thing I ever read in the last 10 years. Stay in your ignorance

kurtcagle

I think that Dominiek has confused dead with “mostly dead”.

I’ve been working with Semantics regularly for about six years. In most cases, that use has come about in being able to identify and automate the connection of metadata resources within organizations, and lately my clients include a few of the largest companies in the world, along with Federal agencies and universities. RDF in the classical sense hasn’t taken off yet because the primary users of it have traditionally been purists – academics that are heavily focused on inferencing and deep logical modeling – rather than the pragmatists who are focused on finding the relationships between different processes or data systems within their organization.

Once you get away from the acronyms and the four and five syllable words, start talking about the advantages that semantic solutions have from a business perspective, business people ARE interested. This is happening now, because a lot of companies that have been building out databases are now discovering that their databases need databases just to manage their metadata. This is an area where semantics fits wonderfully, but you have to get away from the TLAs and much of the more abstruse academic baggage that surrounds the subject.

dhwood

No, no. We discovered that context is hard.

We would like to be able to record context, including data types, units of measure, who claimed this stuff was true (and when), where the data was collected.

Context requires materialization of assumptions. These assumptions may not even be understood by the creator of the data at the time of creation. This is a major impediment to generalized data sharing on the Web.

This is perhaps why RDF has had a slow uptake: The data we seek is simply not always available and, where available, is not always economically extractable.

This is, I think, the real issue with SemWeb uptake.

However, note that the amount of RDF content on the Web is still growing exponentially. Research into how to tie it together and use it is ongoing, as with the recent Open Data Institute hubs and intent to formalize the loose collection of Linked Open Data.

Semantic search has been a direct result of Semantic Web research, such as Google Knowledge Graph. Schema.org, supported by all major search engines worldwide, deals natively with RDFa 1.1 and RDFa 1.1 Lite, both native Semantic Web serialization formats. GMail is now supporting export of content in JSON-LD format, another native SemWeb format. JSON-LD uptake is large and very international.

So, be cautious before saying that the Semantic Web has somehow failed. I certainly don’t see it that way.

dhwood

No, no. We discovered that context is hard.

We would like to be able to record context, including data types, units of measure, who claimed this stuff was true (and when), where the data was collected.

Context requires materialization of assumptions. These assumptions may not even be understood by the creator of the data at the time of creation. This is a major impediment to generalized data sharing on the Web.

This is perhaps why RDF has had a slow uptake: The data we seek is simply not always available and, where available, is not always economically extractable.

This is, I think, the real issue with SemWeb uptake.

However, RDF content on the Web is still growing exponentially. Someone likes it.

Note also that semantic search is really quite popular, as with Google Knowledge Graph. Schema.org (backed by all major search engines) supports RDFa 1.1 and RDF 1.1 Lite, both of which are native SemWeb serialization languages. Similarly, GMail now supports JSON-LD, another native SemWeb serialization language. It is clearly incorrect to say that the Semantic Web has “failed”. Instead, it is morphing rapidly as research points out the useful and hard bits.

james

The absolutism wrt the Semantic Web failing is an extremist view. The WWW was intended to evolve into the Semantic Web, in Berners-Lee initial efforts. Furthermore, the Semantic Web initiative, per se, started abt 13 years ago (rather than 2 decades ago) and is still evolving imo. One of the main aims behind SW was to get ppl to annotate everything and to link these annotations to ontologies. The current research behind linked data and linked open data, is precisely targeting this linking component. The fact that Google, Microsoft etc resorted to using schema.org is proof that the idea of having online vocabularies (ontologies) available to mitigate some issues re the semantics is enough proof that the SW has not failed. Facebook’s use of RDFa is further proof of this.

bobdc

“Information should be pushed, not pulled”: The idea of “push” over the web was a big buzzword in the late nineties, and it failed. People want to retrieve the information that they want to retrieve; they don’t want vendors like you deciding what to send them (“a web that understands you”? Oh, please), despite any new buzz phrases you make up to claim that your system knows what the end users really want.

bobdc

“Information should be pushed, not pulled”: The idea of “push” over the web was a big buzzword in the late nineties, and it failed. People want to retrieve the information that they want to retrieve; they don’t want vendors like you deciding what to send them, despite any new buzz phrases you make up to claim that your system knows what the end users really want. (Judging from the date on http://synaptify.com/?p=613680, it looks like your buzz phrase has had three years to catch on, and it obviously hasn’t.)

David Byrden

“pushing” information to me could be effective only if the pusher knew MUCH more about me than I’m comfortable with.

Madlyb

I agree with your comments on Lazy Consumption and Curation Latency, but mining social for more than relationships and interaction is basically worthless because you get the sentiment without the underlying bias that drives that sentiment…in essence you *are* judging the book by it’s cover and that is not just fuzzy data, it is dangerous data.

chrisboothroyd

Well, one look at Schema.org and I’d have to say the author here is just another buzzword hack. Synaptic Web – come on!, lets see you move some TVs, Events or Deals with that. Just because Nova couldn’t make it pay, doesn’t mean the rest of us are doomed!

bencomp

I’d like to disagree with most of the article. Your argument “the Semantic Web has failed” does not follow from your “reasons”.
Sure, I’m pretty familiar with the Semantic Web and able to understand RDF (really, it’s not impossible to understand) and (most of) OWL, but that is not why I think a Synaptic Web can live next to a Semantic Web. To start: wouldn’t it be great for your streaming web interpreters to be presented with structured information next to unstructured text? Let it live on top of the Semantic Web (and the rest of the Web).

Do you want to exclude facts from knowledge? I, too, couldn’t care less about Leonardo da Vinci’s height, but if I see the Mona Lisa in Paris, I might want to know what else he painted and did and where I can see that. You need boring facts for that. Boring, but useful facts.
For human consumption “messages” are only part of knowledge. Take science for example. Science doesn’t only live in conversation; loads of scientific knowledge is transferred in documents.

The Semantic Web doesn’t depend on XML. Or JSON – although JSON-LD is gaining lots of ground. Human end users shouldn’t need to see raw facts in any text format, only developers. Turtle is the easiest to read and write by hand, I think, but eventually programmers will do that just as rarely as they read and write JSON.

We’re still a long way from having phones that measure brain activity to decipher our thoughts before they become pieces of knowledge consisting of concepts and, err, facts about things we do, want, and feel. In light of my privacy, I’d like my phone to not push my thoughts and activities to the Synaptic Web. It could ask specific questions to the Web that I would like answered, but those questions are likely to be based around concepts, time and place (“what museums are open around here tomorrow?”). That almost works and looks like keyword search.

I like the vision of a Synaptic Web (I heard the term for ther first time today), but to call the Semantic Web failed because people actually want a Synaptic Web was not proven today.

Linda

“Really? Who searches for that?”

People who aren’t social media crack addicts, like every online journalist under the sun.

charger

The author of this article isn’t a journalist, but the CTO of a company that has a vested interest in making money from people who aren’t all that into ‘knowledge’. Because that group of people are easily the majority of people on the planet, and also the easiest to convince to part with their money; it is a sound business strategy.

Eric Alterman

Love this article. To my mind (and yours) “knowledge” is the contextualization of ideas that may relate to each other in a myriad of ways. Human intelligence learns through by contextualizing what it observes. “This is a lot like that, therefore I can predict the behavior of that”–I may never have seen a brick flying towards my head but I know to duck because I’ve seen other objects flying toward my head. Search is an algorithmic contextualization of information that presents a list of items approximately related to each other–it is implicit contextualization. But in a world where both structured and unstructured information information can generated by any app, any person and “thing”, implicit web search begins to look pretty lame. The problem goes well beyond disambiguation. There are simply too many possible contexts for any idea. My view of the “Synaptic Web” mirrors how I imagine the brain to work, millions (billions, trillions) of contextual streams, with each stream defined by a highly specific context that is explicitly defined (not implicitly defined by natural language processing). A given stream may have structure (“eco-friendly homes for sale in Brooklyn”) or no structure. Another stream may have real-time processing rules that may filter and route information from that stream to many tributary streams defined in even more granular ways. The innumerable streams of this “Synaptic Web” can each be individually shared by thousands of apps and people, with permissions for each stream that define who can add information to a given stream and who may simply read that stream. He who creates a stream defines such permission rules, it’s purpose, its data structure, it’s audience–the more open, the better. This Synaptic Web is a real-time “data exchange” that allows the contextual “flow” of information with the ability to trigger actions when certain pieces of information are identified in particular streams (e.g. when it’s time to “duck”). The Web, algorithmic search and activity streams are all training wheels for this new form of contextualization. My company, Flow, has been building this architecture for almost three years, with initial products available at Flow.net and iFlow.com. But what we want to do now is to open up our “Synaptic Web” to individuals, developers, researchers, businesses, internet of things…information streams defined and shared by anyone for any purpose. I’m looking forward to hearing from those intrigued by this new model of knowledge.

graus

If you mean the semantic web as in RDF/linked data, it has imho failed because of reliance on human editors. We need semantic search to enable a “true” semantic web, i.e. going from unstructured documents to some form of structured representation (be it RDF or any other fancy format). And documents are dead? They’ve changed, what you call messages equals documents, imo :). And finally, extracting and mining “boring” Wikipedia-style facts makes a lot of sense in plenty of fields/use-cases.

Chris

We want obsoledge, and we want interactions.. I am not sure we want the web to care about what is good for us .. Personaly i dont!

dotpeople

> We need a web in which information (both questions and answers) finds you based on how your attention, emotions and thinking interconnects with the rest of the world.

This requires pervasive surveillance of attention, emotions and thinking.

Does pervasive observation change the observed human’s identity/goals/behavior?

Comments are closed.