5 Comments

Summary:

Google recently launched a new search indexing system called Caffeine, which it says produces results that are 50 percent “fresher.” Why? Because Google needed to respond faster in a world that has become increasingly real-time. Not just because it wanted to, but because it had to.

When Google announced the launch of its new search index, code-named Caffeine, some observers may have wondered why the world’s leading search engine would even need to do such a thing. Doesn’t Google already control a gigantic proportion of the search business, as well as its Siamese twin, the phenomenally lucrative search-related keyword advertising business? So why the pressure to beef up the search index? The biggest clue is in the name of the new system: Caffeine. Just as coffee drinkers hope they can work faster with a jolt of the chemical, so Google needed to respond faster to a world that’s become increasingly real-time. Not just because it wanted to — because it had to.

Google first announced it was working on an update to the index last August. Coincidentally or not, that was right around the time Microsoft and Yahoo announced their search partnership, whereby the software giant’s Bing search engine would power the results at all of Yahoo’s properties. But while Microsoft’s search results have been getting more competitive with Google’s over the past number of years, competing with the software company wasn’t the main impetus behind Google’s desire to re-engineer its index.

The biggest push came from the simple fact that the web is speeding up all around us — thanks largely to the skyrocketing popularity of social media sites like Twitter and Facebook, as well as other real-time publishing tools (such as PubSubHubbub) — and as the central library through which many people gather online information, Google had to speed up in order to catch up. I looked at the background behind these changes and the implications of them in an article for GigaOM Pro, which you can find here (subscription required).

Without getting too technical about the changes, Google’s previous indexing system accumulated large batches of updates for websites and pages, which were “crawled” (by the engine’s automated search bots) every few weeks to detect changes. But one result was that any pages in the update pool couldn’t be accessed by searchers until the entire batch was finished processing. That meant large quantities of results were older than they should have been — up to several weeks old — even though there were newer results in the update. So Google decided to make more frequent, but also smaller, updates to the index — meaning that in aggregate there would be more fresh results. In fact, the company says that the new Caffeine results are 50 percent fresher than the previous system (Stacey has a closer look at the new search indexing technology in this post).

The critical thing Google realized is that, while search results that are a few days old might have been fine even a year or two ago, the web has become far more real-time than it has ever been before — thanks to the volumes of status updates, photos and other information coming from social networks such as Facebook and Twitter. Facebook has more than 500 million users, many of whom are posting updates, links and photos multiple times a day, and Twitter recently said that the social network sees more than 65 million messages posted every day. That kind of deluge of information places increasing pressure on a search engine like Google to become more real-time in its results. Please see the full report here.

Post and thumbnail photos courtesy of Flickr user khrawlings

You’re subscribed! If you like, you can update your settings

  1. so what do you think: will the road be littered with “real time search” startups within 12 months? (e.g. everybody using twitter api’s) will investors walk away from these startups or drop follow-on rounds?

  2. “…competing with the software company wasn’t the main impetus behind Google’s desire to re-engineer its index. The biggest push came from the simple fact that the web is speeding up all around us”

    I don’t think the two are as unrelated as the article makes them sound. Bing announced real-time results from Twitter in October of last year, even before Google launched their “latest results” box that appears on the search results page for current topics. Google’s trying to stay ahead, but it doesn’t look like there’s a lack of competition.

    1. Yes, that’s a good point, Niraj — competition with Bing is definitely part of the picture.

  3. It’s bad for the future!

  4. Real-time May Be Nice For Search Engines, But What About Personal Lives? Monday, July 19, 2010

    [...] “real-time,” whether we like it or not. Just as Google and Microsoft’s Bing are upgrading their search indexes to make them more real time by capturing things as they occur, instead of hours or even days later, we are being forced to [...]

Comments have been disabled for this post