Tweetbeat, which launched earlier this week, is like several other startups: trying to filter and make sense of the real-time stream of data that comes from Twitter and other social networks. However, Founder Anand Rajaraman says Tweetbeat has something those other services don’t have: four years’ worth of semantic analysis and taxonomic structure built up by Kosmix, out of which the service emerged. Tweetbeat is effectively a front end for the data analysis algorithms and processes that Rajaraman — who teaches data-mining at Stanford — and his team developed and have now applied to the Twitter firehose.
“It was like we were waiting for this real-time data flow to come along so we could apply our semantic filter to it,” Rajaraman said in an interview with me at TechCrunch’s Disrupt conference. “The volume of data is continually increasing, but most search services are still dealing at the level of keywords.” The software algorithms that Kosmix has developed, Rajaraman says, “understand real-time data at a more granular level, because of the taxonomy that we built.”
Kosmix has built topic pages based on this taxonomy, as well as a site called RightHealth that the Tweetbeat founder says is the second largest health site in the U.S. and provides a lot of the company’s revenue. The database the company has built “is over 10 million topics, and there’s an order of magnitude more relationships between them,” Rajaraman says. This helps Tweetbeat filter content from the Twitter firehose, which the company got access to six months ago — just in time to use it for a trial of its filtering technology during the World Cup, which allowed users to tune into streams based on specific teams and even players.
The company has raised several rounds of financing totalling $53 million from venture backers such as Accel Partners and Time Warner (s twx), with the last round in 2008. “We don’t really need to raise any more money,” the Kosmix founder said. Prior to founding Kosmix, Rajaraman co-founded Junglee, a database venture that was acquired by Amazon (s amzn) in 1998. While he finished his PhD thesis, he also ran a seed-stage fund called Cambrian Ventures (please see disclosure below) that invested in a number of startups such as Kaltix, a data-focused company started by several Stanford PhDs that created a personalized search system and was later acquired by Google (s goog).
Embedded below is a video interview I did with Rajaraman at Disrupt.
- Why Google Should Fear the Social Web
- Lessons From Twitter: How to Play Nice With Ecosystem Partners
- What We Can Learn From the Guardian’s Open Platform
Disclosure: Cambrian Ventures is an investor in Giga Omni Media, the parent company of the GigaOM blog network.