These days, we are all swimming in a sea of data. And the traditional method of search — the one that involves first finding, then consuming the information — isn’t going to work for much longer. The company that can figure out a way to cut through the noise for us stands to make a boatload of money.

It could be my advancing years – I seriously doubt that – but every morning I dread turning on my MacBook Pro, fearful of the data deluge that it will bring and the day-long struggle to find the information I need to get things done that will ensue. And I’m not the only one caught in this quicksand-like avalanche of digital data.

According to market research firm comScore, in May the total number of Internet searches conducted in the U.S. alone was about 10.7 billion — up nearly 20 percent from 9.1 billion searches in May 2007. Those numbers make clear that we’re all searching for more information. What they don’t make clear is that often we don’t find what we’re looking for, and so end up trying again and again.

The problem is that there’s too much data coming online too quickly, and the traditional method of search that involves first finding and then consuming the information is not going to work for much longer. There just won’t be enough time for us to do that and still have a life. It’s a problem, and therefore solving it is an opportunity — a very big opportunity.

Earlier this week I was going through the digital detritus that has accumulated on my computer when I stumbled upon an old slide presentation made by Google back when they were still tiny. One slide estimated that by 2002, there would be about 500 million searches a day and between 3 billion and 8 billion web pages.  Now those numbers seem so last century, for every day the amount of information online continues to grow at an exponential rate. It’s nearly impossible to calculate the exact number of web pages that are out there, but a good yardstick would be data from Netcraft, which tracks the number of servers on the Internet and says that the number of active domains almost quadrupled from 2002 to 2007. The total number of web sites at the end of April stood at over 162 million.

Many of these new sites are courtesy of Web 2.0 technologies that have allowed for the easy creation of digital data. Blogs, social networks, RSS feeds, Flickr feeds, Twitter messages, video clips…the data just keeps growing and growing, much like the proverbial Energizer bunny. And the problem of data overload is going to get even bigger as devices such as the 3G iPhone, with their fast wireless connections, make the on-the-go creation and sending of videos, messages and photos to our friends even easier.

If someone can become the Dolby of the web — remove the noise and give us clear sound — then they are going to make a lot of money. And when I say sound, I mean data that is truly useful. But that would just be the start.

Pip Coburn, who runs his own investment firm in New York, recently pointed out that “It’s not data that’s important, but what you do with it.” A good example of that would be a tiny startup called Summize, which is reportedly being acquired by Twitter.

Summize has come up with a clever way of peering through Twitter’s vast data stream and finding out what’s hot, where and how. The results are essentially keywords — topic-, person- or location-based — and thus can be used to show contextual advertising next to the pages that show these results. In other words, Summize has developed an ability to monetize conversations without being intrusive.

The possibilities of what a similar service could do with this data are endless. Imagine a service that would scroll through all the Flickr photos, Twitter messages and marry them to data on the Internet, such as nearby mass transit stations, Starbucks, movie theaters and grocery stories.

All this information would show up on your phone, but you would only see the options in, say, a 100-meter radius that could be increased by zooming out. It would be the ultimate mash-up of various web data sources offered to you as an application, and such applications would make it possible to find, consume and share information — without even trying. 

Almost like serendipity! How’s that for a business model?

This post was originally published on BusinessWeek.com.

You’re subscribed! If you like, you can update your settings

  1. Well first you would have to know what is Information, or how does data become Information and when. If you take it a step further you get into Knowledge how it relates to Information and how it’s used. And soon you end up writing an equation for consciousness, because this all has to fit together.
    Just throwing terms around believing we all share the same definition is to say at least a stretch and repeated by all the so called Information workers out there.
    BTW, trying to get this even close enough to be right based on a Boolean system or algorithm will also be a stretch, or as far as I can see, impossible.
    Anyway next thing will be the talking about Intelligent systems, but if you have solved the equations for the above problems you will realize there is no one such thing (Intelligence).
    Ok, I’m grumpy today.

  2. I don’t give a damn about what’s hot…I just want to get answers to my searches (often very niche) without fluff.

  3. Brian de Haaff Monday, July 14, 2008

    Yes. Add to this the fact that most people spend at least 8 hours a day at a job trying to make decisions based on data that is unique to their business (and not searchable). Yet, we continue to build and use systems at work that constrain data like the early catalogue systems of the Web. At least in our “personal lives” we can use search engines like Google to try and cut through the clutter. The challenge is that business runs on infrastructure and data and there is an increasing amount of it. There is much work to be done in Search for sure, especially in the enterprise. A few months ago we posted a blog on what we thought the Perfect Search Engine for business would look like – https://community.paglo.com/blog_topic/index/57-perfect-search

    Brian de Haaff, Paglo

  4. Spot on…turning data into information is no longer sufficient. We need actionable insights that are contextual. It is well within reach even using today’s technology to accomplish that but at some point, one has to deal with behavioral and personally identifiable info. Would people make the trade off between richer information and some degree of lost of privacy? Should that be regulated or would users be comfortable with a “do no evil” corporate mission??

  5. Brian Houston Monday, July 14, 2008

    To me your article brings to mind aggregation and meta sites.

    I don’t necessarily want to plow through umpteen different interfaces to get the information contained therein. Sites like PopURLs and (the egregious copy) alltop.com make a decision about what sites you will want grouped and bring it all into a single interface.

    Our site ScrapeUp.com attempts to do the same thing with video aggregators.

  6. That’s interesting! It never occurred to me how confusing it will be to classify those sites as they would be pulling from as many other sites as possible.

    Still the filters would be the determining factor as to how possibly close the info would be with regards to relevance to the query. Guess a few hits that would be as close to the need would be deemed more valuable than getting too many that only touches what we may call as ‘border’ answers.

    Funny, as I seem to be talking about a similar concept called ‘network’. Or maybe the meaning of ‘rich’.


  7. Stacey Higginbotham Tuesday, July 15, 2008

    Isn’t the context you’re calling for part of the hope for the semantic web? Once those ontologies and databases are set up isn’t that the first level filter that determines which data belongs together? The second filter would be some type of recommendation engine that could learn your individual preferences?

  8. Summize and most of the services that focus on tracking one or two media don’t really help when it comes to parsing social media activity as a whole. The conversational one-to-many nature of social media means you need discovery tools that not only bring back results from across the eco-system but also give you extensive analysis capabilities: who is talking, where are they, are they positive/negative and how influential are they? These demographic, sentiment and authority measures help sort the bewildering array of conversations out there.
    Internet= Information
    Social Media= Communication
    Two very different things.

  9. @Stacey
    I don’t know. The semantic web reminds me a lot of OO Programing.
    Problem is, the real world is not as clean. Data is messy, incomplete, inconsistent and the meaning can change over time, cultural boundaries.
    I don’t think it has great advantages over keyword search. Might be good for some things but in general it’s just hype.

  10. Twitter-Summize Deal Confirmed – GigaOM Tuesday, July 15, 2008

    [...] I outlined in my post last Monday, and yesterday, I think it is a super smart move by Twitter, and if they play their cards right is going to pay [...]

Comments have been disabled for this post