21 Comments

Summary:

While Powerset unquestionably has some interesting and valuable semantic search technology, there are other semantic search engines that produce equally meaningful and relevant results.
In this post, we compare Powerset results with those of a demo implementation from one such search engine, Cognition Technologies. And we compare them both with the current gold standard in web search, Google.

Powerset, which implements semantic search, recently released a public beta based on the limited data set of Wikipedia. But while there is no question that Powerset has some interesting and valuable semantic search technology — many of their demo queries produce meaningful summary pages and reference pages with information extracted from Wikipedia content — there are other semantic search engines that produce equally meaningful and relevant results.

In this post, we compare Powerset results with those of a demo implementation from one such search engine, Cognition Technologies. And we compare them both with the current gold standard in web search, Google (again, limited to the Wikipedia data set).

Example 1: Powerset

There are some classes of queries in which Powerset shines, such as whenever the query involves extracting concepts or aggregation of data from a given data set.

For example, check out the beautifully presented results for the following queries that extract key information the user is looking for and provide it in summary format:

“military intelligence”

“teams in the NFL”

Example 2: Cognition Technologies

On the other hand, there are other types of queries — especially where hardcore semantic parsing is involved — where the Powerset algorithms get confused, and Cognition gives better results:

“rare wildlife of the Amazon”

“football players who went to jail”

Example 3: Google

There are still queries (especially when semantic parsing is not involved) in which Google results are much better than either Powerset or Cognition:

“helicopter carrier Iwo Jima class”

Here, surprisingly, Google has the best results. Powerset has related results, Cognition gets totally confused, but Google nails it!

Disambiguation

One area where both Powerset and Cognition improve on Google is the disambiguation of query terms. This is always a significant issue for search engines; for example, when a user types in the keyword Java, does she mean the island, the programming language, or the coffee?

Google has recently tried some experiments in this area, but these new search engines go one better.

When Powerset sees an ambiguous topic, it uses tabs to provide both sets of results:

Cognition handles it in a different way, by letting the user select from among different semantic meanings for each term:

User Impact

For most common searches, Google search works just fine. We’ve all gotten used to the ubiquitous “keyword-ese,” currently the universal language of web search. With Google’s unlimited resources, comprehensive index and formidable prowess in finding relevant results using the PageRank algorithm, it’s going to be difficult for any other search engine to match those results. Users may have to work just a little bit harder for unusual queries or specialized searches, but most users will accept that trade-off in return for using their familiar and beloved search engine. Indeed, the word Google has come to represent web search in the same way that the word Xerox had once come to symbolize the process of photocopying.

Future Competition

So what can Powerset (and Cognition) do to gain traction and capture users?

In their recent book, “The Innovator’s Solution,” Clayton Christensen and Michael Raynor discuss how upstart companies challenging market leaders and entrenched incumbents can position new technologies for a reasonable chance of success. One approach that they believe is guaranteed to fail is when these smaller upstarts try to make evolutionary improvements to get and stay ahead of the major players.

Instead, they suggest shaping the new technology into a disruptive innovation, along either of the following two major axes:

1. New-market strategy: Leveraging the innovation to attract users who do not typically participate in using the product or service, and thus growing the market as a whole.

2. Low-end strategy: If there are price-sensitive, over-served users who would be willing to trade some of the advanced functionality in return for a lower price point, then the smaller players have an opportunity to enter the market — that is, if they can figure out a way to make a profit.

In other words, the new players entering the market have to find profitable business opportunities in segments of the market that are not attractive to market leaders.

Using this model, it is apparent that a strategy of challenging Google head-on for control of the mainstream web search market has little hope of success, regardless of the new technologies or search innovations that are applied. Google would have no choice but to fight back with everything it’s got to catch up to or leapfrog this “better search” alternative.

Similarly, since Google search is free for users, there is really no viable low-end strategy, no way to outdo the existing search leader by offering a lower price point.

What about non-participant users? Practically everyone online already uses a web search engine (with Google being the overwhelming favorite). However, Google search follows a specific, consistent set of guidelines: simplicity of UI, speed of response, and relevance based on incoming links. These design parameters take top priority over all other considerations.

By challenging these assumptions, we can discover new use cases in search that are underserved (or not served at all) by Google. Some examples include:

1. UI Simplicity: Google’s minimal UI is trivially simple to use and ideal for a one-size-fits-all model, but it may be less than optimal for complex semantic searches. As Alex Iskold points out in his recent article on the myth and reality of semantic search, a richer user interface would allow power users to express semantically-rich search queries and get back better results. Notably, Powerset and Cognition excel at these types of queries.

2. Speed: For some types of advanced searches, users might be willing to wait, perhaps even as long as a day, in order to get back semantically complex results. Imagine a software agent that acts as a virtual search assistant – once the user specifies a query with multiple levels of complexity and dependency, the agent goes off and returns the next day with a list of possible results/options. Queries that require the coordination of complex tasks fall into this category, such as planning a trip that requires coordinating air travel, hotel and car, and minimizing the cost of the whole trip while taking some additional factors into consideration.

3. Relevance: Although all the mainstream search engines use similar criteria to evaluate relevance (mainly, the evidence of incoming links), other relevance algorithms are certainly feasible and may work better for certain classes of queries. Social relevance is an obvious example; reputable premium content is another.

This post is in no way meant to discredit Powerset — they’re in early beta and are doing a fine job of building semantic search. Instead, the examples above clearly demonstrate that the jury is still out on semantic search; other search engines are also contenders in this space, and the race is far from won.

Nitin Karandikar writes about Web 2.0, Internet search and semantic web on his blog, Software Abstractions

You’re subscribed! If you like, you can update your settings

  1. The Iwo Jima example had results very similar to Google’s, albeit in slightly different order. Incidentally, phrasing the query in a slightly more natural way generates much better results

    http://www.powerset.com/explore/go/Iwo-Jima-class-helicopter-carrier

  2. It would be interesting to observe the results for same query across the different search engines.

  3. I hate the term “semantic search”. It’s just search using a different approach to the problem, albeit a rather cool one, or maybe we should call Google “link search” or something like that.

    One needs to point out that the Powerset results really shine when it searches Freebase. The underlying structure really helps. So the question is that as we add more structure and appropriate markup (RDFa, microformats) to web pages, does the way we search change?

    I am still not convinced that NLP works on general text, at least not well enough or fast enough.

  4. Daniel Tunkelang Sunday, June 8, 2008

    Here’s a nice example for a shoot-out: “Who founded a software company in Massachusetts?” I like that query because it is not answered by a single Wikipedia page (the closest I found is http://en.wikipedia.org/wiki/Category:Companies_based_in_Massachusetts).

    Powerset: http://www.powerset.com/explore/go/Who-founded-a-software-company-in-Massachusetts%3F
    Cognition: http://wikipedia.cognition.com/?num=10&from_val=1&to_val=10&f=simple&sf=130&win=0&fld=-1&search=Who+founded+a+software+company+in+Massachusetts%3F&Submit=&window=0&positional=1&select=select&d=wikipedia1&d=wikipedia2&d=wikipedia3&d=wikipedia4&d=wikipedia5&d=wikipedia6
    Google: http://www.google.com/search?q=site%3Aen.wikipedia.org+Who+founded+a+software+company+in+Massachusetts%3F

    Decide for yourselves, but I give Google the win here, since it at least returns pages containing answers to the question, even if those answers don’t come back in the snippets.

    I’m not a Google fanboy–I make a living beating Google in the enterprise, and I believe that some of what we are doing applies to the open web. But I’m not convinced that today’s natural language “semantic search” options are the answer.

    More at my blog: http://thenoisychannel.blogspot.com/

  5. The Semantic Chimera : Beyond Search Sunday, June 8, 2008

    [...] GigaOM has a very good essay about semantic search. What I liked was the inclusion of screen shots of results of natural language queries–that is, queries without Boolean operators. Two systems indexing Wikipedia are available in semantic garb: Cognition here and Powerset here. (Note: there is another advanced text processing company called Cognition Technologies whose url is http://www.cognitiontech.com. Don’t confuse these two firms’ technologies.) GigaOM does a good job of making posts findable, but I recommend navigating to the Web log immediately. [...]

  6. This was interesting…please keep up your coverage of the “semantic web”. I have my doubts but I’m curious to see what tools are being developed.

  7. Well. I just had to try it myself :)

    Search 1. “Which team was third in the European basketball championship 2007?”
    Search 2. “eurobasket 2007 bronze”

    For Google site:wikipedia.org added.

    Search 1. Cognition – basically nothing. Powerset – keyword search, rather than semantic search (not much different than standard wikipedia search, btw!). Google – 3rd result for “Eurobasket 2007″ page, 6th result is the answer.

    Search 2. Cognition – 2 results, but #1 is the coach name. Powerset – gives something more interesting, with lots of Lithuanian basketball players and includes women’s result as well. Google #1 result pretty much says it all… Wikipedia standard does not even have a clue what I mean :)

    Conclusion. Google is by far number one and looks unbeatable. Powerset is interesting, but I have a slight suspicion they’re cheating a bit. If they’re not – they need to improve a little, but they have a chance. Cognition is useless.

    This is just one search string, of course…

  8. What about taking a different perspectives on these companies and Google as a whole. Instead of looking at Google as a search technology to be beat, perhaps it’s more accurate to say that Google’s business is more closely tied to advertisers buying keywords for their ads to appear relevantly on publishers’ sites (inc. Google’s search site). While most of Google’s revenue comes fm AdWords on its site, affiliate don’t do nearly as well. One reason for this is the poor relevance that AdSense really gets you when keywords are the basis for such. Now, what if one could apply Powerset or Cognition to the problem of determining what an article was about? Then, how about setting up some heuristics for determining how the “aboutness” of an article s/b related to the “aboutness” of an ad?

    If these companies focused less on the public facing Web search objective and more on the relevance issue for advertisers and publishers, I believe they would indeed be applying the Innovators Solution paradigm to attacking Google where it counts, in it’s advertising biz. Still, it will be tough to get to the critical mass of advertisers and sites that Google has attained, but I believe they’re more exposed there than in the search service.

  9. Between the Lines mobile edition Monday, June 9, 2008

    [...] GigaOm: Powerset vs. Cognition: A Semantic Search Shoot-out [...]

Comments have been disabled for this post