12 Comments

Summary:

Researchers are busy trying to use Twitter to predict everything from disease outbreaks and financial markets to elections and even revolutions. New research from Topsy Labs shows that Twitter can provide a window into events like the Arab Spring. But can it predict what will happen?

2400635097_c0d3bd7e64

There’s been a lot of talk recently about Twitter trending topics, and how they fail to reflect evolving events such as the Occupy Wall Street movement (although some argue that this is the fault mainly of our inflated expectations, rather than Twitter’s algorithms). But despite those kinds of setbacks, there is an emerging industry aimed at using the tweetstreams of millions of people to help predict the future in some way: disease outbreaks, financial markets, elections and even revolutions. According to new research released today by Topsy Labs — which runs one of the only real-time search engines that has access to Twitter historical data — watching those streams can provide a window into breaking news events. But can it predict what will happen?

The theory behind all of this Twitter-mining is that the network has become such a large-scale, real-time information delivery system (handling more than a quarter of a billion messages every day, according to CEO Dick Costolo at the recent Web 2.0 conference) that it should be possible to analyze those tweets and find patterns that produce some kind of collective intelligence about a topic. It’s the same idea that drives companies to do “data mining” on their customers’ behavior, or compels Google and Facebook to track your browsing activity in the hope that they can generate some aggregate information that will be of value, and predict what you might be interested in.

Predicting markets and the spread of disease

One of the first attempts at doing this with Twitter appeared last year, when a team of researchers published a report that looked at the predictive value of sentiment analysis extracted from Twitter (PDF link) compared to the movements of the Dow Jones Industrial Average. The study said that its system could predict the market index with 87-percent accuracy, and within months a hedge fund called Derwent Capital Markets launched a fund that it said would make stock and fund trades based on a similar kind of analysis of Twitter (so far it seems to be doing pretty well).

Medical researchers have also been trying to use Twitter trends and analysis to predict the outbreak or spread of disease, in much the same way that Google came up with Google Flu Trends, which tracks searches for terms associated with the flu — data that seems to correlate fairly well with actual outbreaks of the flu. Two researchers from Johns Hopkins University recently released a study that looked at more than two billion tweets and analyzed them for medical information, and said that this could be a useful tool for researchers and medical staff.

Could Twitter have predicted revolution in Egypt>

In one of the research reports the company released today, Topsy Labs looked at tweets related to the recent Arab Spring revolutions in Tunisia, Egypt and elsewhere in the Middle East, and tried to correlate the rising and falling trends in hashtags such as #iran, #egypt and #yemen with actual events such as the suicide of Mohammed Bouazizi in Egypt — the 26-year-old food vendor whose death crystallized for many dissidents the problems in their country and the need to take action. Twitter was a key tool for raising awareness of this revolution, and Topsy’s data shows that there was a high correlation between actual events and Twitter-related activity around those topics.

Topsy also looked at what it called the “share of voice” or influence and reach that one specific Twitter user gained over a short period of time: Sohaib Athar, the Pakistani programmer who live-tweeted the U.S. military raid on Osama bin Laden’s compound without even realizing it. According to Topsy’s data, when he first began posting, Athar had very little exposure — he wasn’t being followed or retweeted by many people, and those he was being followed by didn’t have much reach (meaning they weren’t followed by or retweeted by many people either). But that all changed over the next 24 hours:

[A]s his tweets were retweeted and mentioned more than 30,000 times, his exposure grew to a whopping 82.68 million unique tweets within 21 hours. As his tweets became more interestig to the Twittersphere, his exposure and influence grew dramatically. He went from 0 to 20 million in under 10 hours and over 82 million in just under 30 hours.

Topsy’s research certainly shows how quickly a single individual can become hugely influential in a very short space of time, and the correlation of Twitter data with events in Egypt and Tunisia is also interesting. But could someone have predicted that Egypt was going to break into open revolution based on the activity Topsy recorded? Perhaps — which is why the U.S. government’s Intelligence Advanced Research Projects Activity unit or IARPA is looking at using data from social media like Twitter and Facebook as part of its intelligence gathering.

The research that Topsy did is far from conclusive, however. In particular, the company didn’t apply any filters based on language or a Twitter user’s location to its analysis — which means that many of the tweets could have come from outside Egypt and Tunisia — and it didn’t try to use any influence-ranking to determine connections between those who were tweeting about the topic (as researcher Kovas Boguta did to produce this fascinating visualization). But it shows what could be done with that kind of data, and it is likely just the start of an ongoing attempt to understand the giant collective consciousness that is Twitter.

Post and thumbnail photos courtesy of Flickr users totalAldo and timetrax

You’re subscribed! If you like, you can update your settings

  1. ChristianCalais Wednesday, October 19, 2011

    Correction: “suicide of Mohammed Bouazizi in Egypt” Mohammed Bouazizi was Tunisian and commited suicide in Tunisia not Egypt.
    http://en.wikipedia.org/wiki/Mohamed_Bouazizi

  2. “Prediction is very difficult, especially about the future.” – Niels Bohr

    Twitter and other social venues bring interesting information, but they’re better used when completing other signals. For instance at http://www.ValuValu.com, we use social data to better identify stock market reversals, but most of our trading is based on momentum.

  3. This is very interesting, unfortunately I do not believe that there will be any algorithms that can predict the future, because the mind of human being is still an unknown quantity. You can guesstimate but will not get accurate data because of the disconnect between what we think and what we do.

  4. M. Edward Borasky Wednesday, October 19, 2011

    The Twitter Firehose, properly mined, can certainly deliver “intelligence”, if not predictions. Twitter’s publicly-posted Trending Topics, on the other hand, are spammed to the point of total irrelevance.

  5. This is why Google can’t afford to let Google+ fail. http://ike4.me/ote

  6. Interesting point. I think there is another company out there RecordedFuture, a google company is trying to do the same.

  7. We’ve done similar things in the past. We accurately predicted the outcome of the last UK election (http://simpleweb.co.uk/2010/sarcasm-is-the-lowest-form-of-wit-or-is-it/) using simple semantic analysis. I.e. not just basic positive, negative, neutral sentiment, not just on Twitter I might add. Predicting the future is not really accurate as a statement, although it can most definitely appear that way. A lot like SIRI really. It appears much cleverer than it actually is. Through learning algorithms, pattern matching and crowd sourcing input/test data this area will become more and more accurate and challenge existing systems. We have some very interesting developments coming in this area soon.

  8. Given the many articles about “big data”, this will be a rich vein of data and it will be mined. Having run across some interesting semantic analysis on a TED presentation, I can certainly foresee this being utilized. Will it be 100% accurate for everything? No. Will it be a significant factor along with some others? Certainly. It is easy to imagine a sentiment indicator as a companion to a stock trading program, as someone has already mentioned.

  9. Twitter data predicts what happens on Twitter and not necessarily the rest of the world.

  10. Q3 technologies Thursday, October 20, 2011

    These are surely some far-reaching implications of Social Media.. Twitter has also recently become the second most popular search engine after Google with its vast information ocean!

    Product Engineering

Comments have been disabled for this post