The Internet is one of the first places many people go to research symptoms or illnesses they have (or think they might have), and social networks like Twitter have also become a hotbed of symptom-sharing and health advice. But can doctors and other researchers discover any useful information from all of this background noise? Two researchers at Johns Hopkins University say they can — with a little bit of effort. They analyzed more than two billion tweets for health-related terms and say their research shows Twitter can be a valuable source of public-health information about a wide range of ailments.
The study, entitled “A Model for Mining Public Health Topics from Twitter” (a PDF version of the research is here), started with two billion tweets that were posted to the network between May 2009 and October 2010. The two men — Mark Dredze, a researcher at the university’s Human Language Technology Center and Michael J. Paul, a doctoral student — then used a software algorithm to filter out approximately 1.5 million messages that referred to health-related issues, by focusing on a variety of terms related to medical issues and illnesses.
The researchers didn’t record the names or personal details of any of the users who posted the information, but did record their location if available. Said Dredze:
Our goal was to find out whether Twitter posts could be a useful source of public health information. We determined that indeed they could. In some cases, we probably learned some things that even the tweeters’ doctors were not aware of, like which over-the-counter medicines the posters were using to treat their symptoms at home.
Although the study was intended primarily as a “proof of concept,” in order to show that filtering information from Twitter could produce valuable data, the researchers said they uncovered some intriguing patterns about everything from allergies and the flu to other illnesses and ailments such as cancer, obesity and depression. And because many people posted details of the medications they were using to treat themselves, they discovered a number of users were taking antibiotics to treat the flu, even though antibiotics don’t work on the flu — something that could be a potential public-health issue.
The Johns Hopkins study isn’t the first to try to pull public-health information out of the online activity of millions of users. One of the first examples of such an effort was Google’s Flu Trends , which showed there was some predictive value in watching the almost real-time searches for information about the flu and symptoms. When this data was mapped geographically, it showed in some cases how the influenza was spreading, as more and more people started using Google to search for information about the illness. There have been other studies that focused on using Twitter to track and model flu symptoms as well.
The Center for Disease Control has been monitoring some of these tools for several years, including working with Google Flu Trends (which published some research here), and others have been trying to apply the same process for different medical issues. HealthMap is a site designed to monitor — and potentially predict — disease outbreaks, and has been working with the CDC to track health trends. The site recently started an early-warning system aimed at tracking diseases that move between animals and people.
Meanwhile, a startup called Sickweather is trying to apply these same tools to local health issues, by using data from Twitter and Facebook to create “weather maps” of illnesses and predict where medical issues might arise.
This is part of a trend we’ve talked a lot about at GigaOM: the rise of “big data” and the potential for analyzing massive data sets for valuable information, whether it’s Twitter or the clickstream of people using Google search. The ability to watch human behavior in almost real-time and detect patterns has huge implications across a wide variety of disciplines. And if we continue sharing information at the rate Facebook CEO Mark Zuckerberg thinks we will, there will be plenty of data to choose from.