Tweetbeat, which launched earlier this week, is like several other startups: trying to filter and make sense of the real-time stream of data that comes from Twitter and other social networks. However, Founder Anand Rajaraman says Tweetbeat has something those other services don’t have: four years’ worth of semantic analysis and taxonomic structure built up by Kosmix, out of which the service emerged. Tweetbeat is effectively a front end for the data analysis algorithms and processes that Rajaraman — who teaches data-mining at Stanford — and his team developed and have now applied to the Twitter firehose.
“It was like we were waiting for this real-time data flow to come along so we could apply our semantic filter to it,” Rajaraman said in an interview with me at TechCrunch’s Disrupt conference. “The volume of data is continually increasing, but most search services are still dealing at the level of keywords.” The software algorithms that Kosmix has developed, Rajaraman says, “understand real-time data at a more granular level, because of the taxonomy that we built.”
Kosmix has built topic pages based on this taxonomy, as well as a site called RightHealth that the Tweetbeat founder says is the second largest health site in the U.S. and provides a lot of the company’s revenue. The database the company has built “is over 10 million topics, and there’s an order of magnitude more relationships between them,” Rajaraman says. This helps Tweetbeat filter content from the Twitter firehose, which the company got access to six months ago — just in time to use it for a trial of its filtering technology during the World Cup, which allowed users to tune into streams based on specific teams and even players.
[inline-pro-content] Tweetbeat looks at the stream of tweets and does real-time semantic analysis based on the content of the messages, but also looks at the authority of the people posting them, using a proprietary ranking algorithm that looks at a person’s influence — how many times they get re-tweeted and by whom, for example — within specific topic areas. The service then ranks the most talked-about topics and gives users the ability to drill down into them. You can pause a stream and you can also use a slider to go back in time and see the stream of conversation at a specific point, which Rajaraman describes as “like Tivo (s tivo) for the real-time web.”
The company has raised several rounds of financing totalling $53 million from venture backers such as Accel Partners and Time Warner (s twx), with the last round in 2008. “We don’t really need to raise any more money,” the Kosmix founder said. Prior to founding Kosmix, Rajaraman co-founded Junglee, a database venture that was acquired by Amazon (s amzn) in 1998. While he finished his PhD thesis, he also ran a seed-stage fund called Cambrian Ventures (please see disclosure below) that invested in a number of startups such as Kaltix, a data-focused company started by several Stanford PhDs that created a personalized search system and was later acquired by Google (s goog).
Embedded below is a video interview I did with Rajaraman at Disrupt.
- Why Google Should Fear the Social Web
- Lessons From Twitter: How to Play Nice With Ecosystem Partners
- What We Can Learn From the Guardian’s Open Platform
Disclosure: Cambrian Ventures is an investor in Giga Omni Media, the parent company of the GigaOM blog network.