Blog Post

Predicting Twitter popularity is all about probability

Tweets have the power to decimate markets, but they also have users and companies seeing dollar signs. With huge marketing, political, and social mobilization potential, how can you predict which tweets will get more views, and which retweets will go viral? A new study developed a statistical model that attempts to estimate the popularity of tweets, and thus how memes spread.

Starting with 52 “root” tweets from users both famous and obscure, the researchers first analyzed the dynamics of retweeting, like the speed and spread of a tweet from a user to followers and then their followers. The researchers, from the University of Washington, MIT, and Penn, used the Twitter API to collect all the retweet information and found that most retweets occurred within one hour of the original tweet. Not surprisingly, they also found that root tweets are retweeted more than the retweets themselves.

They then plugged the important variables –- number of followers, retweet speed, retweets of other tweets –- into a Bayesian model, a statistical approach that uses prior evidence (the root tweets) to calculate how the retweet graph evolves. They experimented with feeding the model different amounts of prior evidence to see how much was needed to make an accurate prediction. Using only 10 percent of the retweets to guide the model, they were able to reasonably accurately predict retweet time and volume, and the error decreased the more retweet data they included. The average retweet time was only 4.4 minutes.


Throwing more information into the prediction engine (like whether a particular follower has a large numbers of followers of his or her own) could improve the accuracy. Their model was thrown off, it seems, by a few anomalous tweets with a very rapid onset and termination of retweets that didn’t follow the same pattern as the other tweets. (Though they don’t identify who sent those tweets, my bet is on @KimKardashian, whose followers’ actual and predicted retweet timecourse is pictured above.) The researchers didn’t even consider the time of day a tweet was posted, nor its content; there is likely huge potential to mine in those domains for what, and when, leads to trending.

With the abundance of the Twitterverse open to developers via API, this study represents just the tip of the iceberg in predicting tweeting behavior, something that startups like Blab are busily pursuing. It also shows that robust methods like Bayesian statistics can predict if a tweet has any retweet life left, and thus whether it can gather more eyeballs and clicks, something that is sure to prove very lucrative.

6 Responses to “Predicting Twitter popularity is all about probability”

  1. Blab is an interesting study. However, the world is constantly changing and so is the Twitterverse. Data on what it takes to trend on Twitter today may be out of date next month or even next week.

  2. This seems to imply we’ll get to a point where we’ll be able to know, what to tweet and when based on knowing the type of response we’re likely to get. I’m trying to look at this from a brand’s perspective. Does that in the future they’ll know the optimal time to tweet an offer and at what to say about it? As Amanda says there’s a way to go on this study yet, by adding in many more factors like follower volume, following volume of re-tweeters, time of day, what’s trending and the noise that creates as a distraction. Just telling us that re-tweets are more likely to happen in the first hour after the tweet isn’t insight really is it? Isn’t it a product of the majority of us just ‘scrolling to the top’ to read the latest tweets, because we don’t have time to read them all?

    As for the Blab monetization plan – good question. Perhaps media outlets would be interested to know what stories are more likely to be ‘big’ in order to get bragging rights on being the first to report on them through mass media channels? Not just the big media outlets either – it may be interesting for companies like Newsflare as well ( which incentivises the public to provide reporting footage at ‘newsworthy events’. So for me it’s about paying for the right to break a story. We seem to have this growing obsession for the ‘here and now’ so anything that gets ahead of that, is probably worth something to those who want to be seen to be on the pulse. Just depends on how accurate it will be.