Swift River: Trying to Filter the Social Web Firehose

Anyone who has tried to track dozens of Twitter streams, hundreds of Facebook updates or thousands of blog posts simultaneously knows that the social web can be an intimidating — and never-ending — ocean of information, one that constantly threatens to swamp us. (There’s a reason  Twitter and other social networks call the APIs they use to provide all their data a “firehose.”) A startup called Swift River is one of a number of new services that are trying to find ways of filtering and understanding that ocean in real time, by using “semantic web” technologies.

The software behind Swift River was originally developed for Ushahidi, a web-based information service that was created in 2008 to help aid groups track text messages, Twitter posts and other real-time data about the political unrest in Kenya (a video of co-founder Erik Hersman describing the genesis of the project at the TED 2009 conference is embedded below). Ushahidi has since been used in other disaster areas to help emergency workers pool information about survivors, victims needing rescue, etc. — including after the earthquake in Haiti earlier this year. One of the principles behind the project is that social-media tools can spread information much faster than traditional communication methods.

One problem that the project confronted in Haiti, however, was how to process, understand, categorize and verify those huge quantities of data coming in from multiple sources. How could Ushahidi workers know, for example, that a Twitter message or text from someone was actually valid — in other words, that it was coming from a place and/or a person that would be likely to know what was actually happening? Volunteers who were tracking messages related to the Haiti earthquake “got more than 100,000 reports in four days,” Swift River developer Jonathan Gosier told the BBC, and after two weeks volunteers had only managed to process half the messages.

That’s when Gosier and the other developers of Swift River realized they needed a semantic processing engine to help shoulder the load. The software they designed is based on the idea that by comparing messages and information from a variety of sources about an event, the system can build an understanding of which are credible and which are not. As the company’s website describes it:

One of the strengths of crowdsourcing is the ability to collect a high volume of information from highly diverse channels like Twitter, email, news sites, blogs, and SMS. SwiftRiver acts as the verifying filter for these different channels and is possible precisely because of the volume of information available from these sources. The more information generated, the more the community interacts with it, and the easier it becomes to identify mutually trusted sources.

The software uses a combination of natural-language processing, machine-learning systems and what Swift River calls “veracity algorithms.” The system is designed to reveal authoritative sources, while suppressing noise such as “duplicate content, irrelevant cross-chatter and inaccuracies.” When users start monitoring a topic, they choose the feeds they want to track — whether they are blogs, Twitter feeds, SMS messages, etc. Over time, Swift River assigns a score to each source, determined partly by the user and partly by the system’s algorithms, and that score changes as more information is taken in. The software’s natural-language computation continually tags the content by extracting relevant keywords, so that it can be indexed properly.

Swift River will be releasing a beta next month. The company says that it sees news organizations using it for the same purposes it was developed — to make sense of large quantities of information in real-time, such as during a disaster. Gosier described the project in a post for UX magazine earlier this year.

Among the other companies that have been working on mining the social web for meaning is The Ellerdale Project, which was acquired last week by Flipboard — the startup founded by former TellMe CEO Mike McCue that launched a much talked-about content reader for the iPad. Arthur van Hoff of The Ellerdale Project described what the group was trying to do in a video interview with Liz Gannes recently at GigaOM’s office in San Francisco.

Whether efforts like Swift River can help us make sense of the ocean of social information in which we’re swimming remains to be seen, but we need all the help we can get. Here’s a video of Ushahidi founder Erik Hersman at TED 2009:

Related content from GigaOM Pro (sub req’d): Twitter Annotations and the Future of the Semantic Web

Post and thumbnail photos courtesy of Flickr user Jurvetson