The tricky business of acting on live data before it’s too late

3 Comments

Credit: GDELT

For all the talk about big data and how it can help us track down needles in haystacks, there’s still a lot of work to when it comes to issues like public health. When successful intervention might require timelines of minutes or hours rather than days, it takes a might keen eye to monitor lots of needles in lots of haystacks and, more importantly, spot new and important ones as they pop up.

We’ve been following news out of the Global Database of Events, Languages and Tones (GDELT) project for the past several months, and it’s very impressive as a tool for historical analysis of the world’s happenings. It takes and indexes real-time streams from news sources around the world, and now includes hundreds of millions data points spanning the past 35 years. It has been used for all sorts of analyses so far, ranging from tracking the spread of terrorist groups to comparing how activity patterns of today’s political uprisings compare to those of decades past.

A map of the Ebola outbreak, as of early July, using GDELT.

A map of the Ebola outbreak, as of early July, using GDELT. Source: GDELT

But in a blog post published on Saturday, GDELT project leader Kalev Leetaru points out a major limitation of the database: It’s only as useful as scope of data it includes and the analysts using it. Using analysis of the Ebola outbreak as an example, he explains how GDELT actually ingested an international news article referencing the Guinea government’s concern over hemorrhagic fever one day before Harvard’s HealthMap signaled an alert based on local social media activity. Only, without someone monitoring for that type of news in that part of the world, the single reference was very easy to miss.

Indeed, GDELT, like HealthMap, only picked up on the growing epidemic a day later as mentions ramped up on news and social media alike. Leetaru suggests some technical improvements, including broader machine-translation capabilities, that might help GDELT serve as a better real-time alerting system by letting it track even hyper-local sources within remote countries. The more events it picks up, and the earlier it picks them up, the harder they are to miss by anyone paying attention.

Source: Dataminr

Source: Dataminr

An expanded coverage footprint certainly would be a big help, both for GDELT and commercially available services such as Dataminr. Dataminr monitors activity on Twitter and, when it identifies a meaningful situation taking place, sends alerts to journalists, government agents and first responders. It has already helped identify breaking news domestically but, as with GDELT, Dataminr could be even more valuable if it were able to monitor more social networks and more language around the world.

However, if we want to make progress in responding to potential emergencies or other situations, we also just need more people buying into the promise of these types of databases and alerting systems. More news sources should help GDELT, Dataminr and other services identify more trends, but each one is still just a needle in a haystack that’s expanding, as well. Databases can ingest more data faster and algorithms can identify more trends faster, but reacting faster means we need more people paying attention to what the data is saying, as it’s being said.

For more thoughts from GDELT’s Leetaru on the promise of collecting and analyzing so much data, check out this Structure Show podcast interview with him.

3 Comments

Amit Sheth

To convert data into alerts, and even further “actionable information,” one needs to incorporate model (ontology, background knowledge) of the domain you are serving. For example, if you add knowledge of terms/concepts and relationships that are important for health crisis, the system will be able to go beyond statistical information processing and identify more meaningful information useful to health crisis responders or public health responder. Look at the specialization for “Need-to-Rescue” done over the Twitris (whose analysis can be overwhelming to those who need to focus on their specific tasks such as making rescue related coordination as the tweets bearing SOS info arrives). For more, see last two bullets in this story:
http://news.oneindia.in/india/digital-volunteers-use-social-media-for-rescue-efforts-during-jammu-floods-1524180.html

jjj

Mildly related, just yesterday i was annoyed that Google can’t translate the terms and show results in all languages. Would add a bit of complexity but their data would be more valuable.

Comments are closed.