Researchers Mine Your Web Data for Profits and the Public Good

The most valuable use for the ever-increasing amount of information we put online through our Facebook pages, our Flickr accounts and even through our web searches may not be for targeted advertising (though that may be the most profitable use), but for public science and research. Inspired by an article today in the New Scientist about researchers at Cornell using geotagged Flickr photos to make maps of the world, I started thinking about how our web-based content may be used for more than selling diet pills and travel packages. Google (s GOOG)  is using information on symptom searches by IP address to track the flu (something we can all relate to as the swine flu continues its advance) while in a more obvious example, Harvard is following a class of Facebook users throughout their college careers to track social interactions. As Google’s approach to tracking flu cases proves, as we put more information online, there’s an opportunity to reshuffle that data in ways that lead to new understandings. On the web, where we perceive ourselves to be alone or are driven solely by self-interest (such as finding a better job or correctly diagnosing our symptoms), we may offer up more accurate and honest assessments of ourselves, helping eliminate the problem of people lying to researchers to look better than they are.

Conducting this sort of research has privacy implications because anonymized data can still lead back to a person, and our very honesty on the web can make people squeamish about seeing that data used. There’s also a huge instance of selection bias toward technophiles and first world countries, since a mere 23 percent of the world used the Internet last year, according to a report out in March from the United Nations. You can actually see it clearly in the results of the Flickr research as the Apple (s aapl) store is the fifth most photographed monument in New York City, and photos tagged with a variation of “sxsw” are the third through fifth most popular monuments in Austin, Texas.

However, such research benefits might also be one of the best arguments for getting the rest of the country (and world) online. If we can devise ways to aggregate this information while respecting privacy, uncovering crucial flu trend information or offering companies data as to where the largest source of a certain class of worker lives based on LinkedIn data represent just the beginning of what we can do.