Arthur van Hoff helped architect Java at Sun, co-founded Marimba, and engineered the application platform at TiVo. Now he’s identifying trends in Twitter messages at The Ellerdale Project. Is that a fitting project for such a big brain? In this video interview recently shot at the GigaOM office, van Hoff explains why real-time search is one of the most “intellectually challenging” things he’s ever done.
Menlo Park, Calif.-based Ellerdale’s ultimate goal is to help “solve ambiguity in the world,” van Hoff says, by understanding timing, geo-location and sentiment, finding patterns and discovering topics hidden within opaque links. “I love big data,” he says. These artificial intelligence projects and methods have been around for 30 years, but only now is the computing power and relevant volume of data available to make them work, says van Hoff.
“Real-time is very overwhelming,” says van Hoff. “It’s literally drinking from the firehose. We’re trying to make that palatable.” Ellerdale is one of the 15 to 20 companies that Twitter has allowed access to its so-called Firehose of all public tweets in real-time. The company also uses data from RSS feeds and hopes to add other sources such as Facebook updates and Google Wave posts.
As compared to a massive search engine like Google, Ellerdale has bitten off just a small sliver of the net: what happens in real time. That amounts to 20 to 30 million pages crawled per day — all timely, relevant and added to Ellerdale’s dataset in seconds. It’s a rapidly growing dataset; the company indexes 8 billion tweets, out of 16 billion total tweets. (That’s because Ellerdale only got access to the Twitter Firehose in February, and had more limited access beforehand.)
Ellerdale started two years ago as a semantic search engine, but is now looking to be a data provider. It has given a dozen partners access to its API to test applications in financial services, sentiment mining for marketing, and advertising research. van Hoff said he hopes one application will soon surge ahead so the company can build a coherent strategy and take a favorable financing round on top of the angel funding it already has from investors including Ron Conway and Roger Sippl.
The as-yet unconquered challenge of real-time search is making a business out of it. Or perhaps there’s an easier answer: Search on Twitter itself has grown 33 percent in the last three months, but it has a lot of room for improvement. As it did with Summize in 2008, Twitter could just acquire Ellerdale. However, van Hoff’s is hardly the only big brain in real-time; the folks at places like Topsy, Collecta and OneRiot are also very impressive.
Related research from GigaOM Pro (subscription required):