Tweetbeat, a new service that launched this week from Kosmix, uses the company’s filtering algorithm and semantic-analysis engine to make sense of the Twitter firehose, and then allows viewers to zoom in on topics, pause the live-stream and even rewind to view past events.

tweetbeat screenshot

Tweetbeat, which launched earlier this week, is like several other startups: trying to filter and make sense of the real-time stream of data that comes from Twitter and other social networks. However, Founder Anand Rajaraman says Tweetbeat has something those other services don’t have: four years’ worth of semantic analysis and taxonomic structure built up by Kosmix, out of which the service emerged. Tweetbeat is effectively a front end for the data analysis algorithms and processes that Rajaraman — who teaches data-mining at Stanford — and his team developed and have now applied to the Twitter firehose.

“It was like we were waiting for this real-time data flow to come along so we could apply our semantic filter to it,” Rajaraman said in an interview with me at TechCrunch’s Disrupt conference. “The volume of data is continually increasing, but most search services are still dealing at the level of keywords.” The software algorithms that Kosmix has developed, Rajaraman says, “understand real-time data at a more granular level, because of the taxonomy that we built.”

Kosmix has built topic pages based on this taxonomy, as well as a site called RightHealth that the Tweetbeat founder says is the second largest health site in the U.S. and provides a lot of the company’s revenue. The database the company has built “is over 10 million topics, and there’s an order of magnitude more relationships between them,” Rajaraman says. This helps Tweetbeat filter content from the Twitter firehose, which the company got access to six months ago — just in time to use it for a trial of its filtering technology during the World Cup, which allowed users to tune into streams based on specific teams and even players.

Tweetbeat looks at the stream of tweets and does real-time semantic analysis based on the content of the messages, but also looks at the authority of the people posting them, using a proprietary ranking algorithm that looks at a person’s influence — how many times they get re-tweeted and by whom, for example — within specific topic areas. The service then ranks the most talked-about topics and gives users the ability to drill down into them. You can pause a stream and you can also use a slider to go back in time and see the stream of conversation at a specific point, which Rajaraman describes as “like Tivo  for the real-time web.”

The company has raised several rounds of financing totalling $53 million from venture backers such as Accel Partners and Time Warner, with the last round in 2008. “We don’t really need to raise any more money,” the Kosmix founder said. Prior to founding Kosmix, Rajaraman co-founded Junglee, a database venture that was acquired by Amazon in 1998. While he finished his PhD thesis, he also ran a seed-stage fund called Cambrian Ventures (please see disclosure below) that invested in a number of startups such as Kaltix, a data-focused company started by several Stanford PhDs that created a personalized search system and was later acquired by Google.

Embedded below is a video interview I did with Rajaraman at Disrupt.

Disclosure: Cambrian Ventures is an investor in Giga Omni Media, the parent company of the GigaOM blog network.

  1. There have been many other similar services that basically do the same thing. What makes this one more relevant or last longer remains to be seen. The data history may give them a head start, but the acid test is always how often people use the tools that are out there.

  2. [...] and Radian6, while others — such as Nick Hamstead’s DataSift and Tweetbeat, which was built by the analytical brains behind Kosmix — are trying to filter all that data and find out what the world is talking about in [...]


Comments have been disabled for this post