The files J. Edgar Hoover kept are nothing compared to the data collected by the Republican and Democratic campaigns for the coming 2012 elections. Thanks to tools such as Hadoop and Hive, campaigns are now able to aggregate their many databases and then query them to discover who will vote, how campaigns should reach people and even the likelihood that you would respond better to a tweet or an email.
Speaking at a panel on big data and politics at South by Southwest in Austin, five D.C. data geeks explained some of the plans for data in the 2012 election, the challenges associated with data and how data is changing politics.
The panel started with a breakdown on the hot trends in data collection and analytics for this campaign cycle. The big trends appear to be unstructured data analysis (the Obama campaign calls this Project Dreamcatcher) and breaking down the silos between different data stores (called Project Narwhal by the Obama campaign.) (There were Republicans on the panel, too, but the strategies there don’t have cool code names yet, at least that anyone mentioned.)
Using big data to break down barriers
Complementing these two trends in data analysis is the idea of collecting far more data, mostly by being able to tie voter information the parties already have with those voters’ online personas. If you are a registered Republican, the party has data on you, but it might not know who you are on Twitter or Facebook.
Politicians (or the data nerds working with them) really want to know this, and are using tools such as Facebook apps to get access to data through Facebook’s Connect program. However, Alex Lundry, VP and director of research at TargetPoint Consulting, noted that Facebook is very specific about what data a campaign can have and download. Still, he said, canvassing social media allows for a “seamless integration of your online profile and your offline voter file.”
So for 2012, we can expect campaigns to make use of aggregated structured data from their websites, apps, records of volunteers canvassing and other traditional collection methods. They will also be collecting and analyzing unstructured data from interviews conducted with voters, social media and other sources to get a sense of how the public feels about issues. At the same time, they will try to get a more complete picture of the voter by merging offline and online identities.
So what will this mean for the campaign?
With terabytes of data at their fingertips, the campaigns are preparing to get more complete results and build better models of their voters. This means better targeting of campaigns to the voter and, hopefully, more signups on websites, more email addresses collected and more money and/or votes collected.
Additionally, it could help candidates figure out where to schedule speeches and meet-and-greets. Once at those speeches, the candidate can target their messages to either make sure supporters actually vote or undecided voters move to the candidate’s side.
The aggregating of tweets and other real-time data, plus the ability to batch process data cheaply using Hadoop and commodity hardware, means campaigns are able to write new models of voter behavior on a daily, weekly or monthly basis instead of once or twice during a campaign.
Mobile apps and bringing it all together
Cooler ideas about gathering data also emerged. Not only will campaigns attempt to mine your social media, but Kristen Soltis, VP at The Winston Group, which is a polling organization that does work for the Gingrich campaign, hopes to use mobile apps to change the way polling is done. The death of landlines has hit the polling industry hard, but Soltis notes that polling is still far more accurate than looking at tweets and other online options to measure sentiment. However, adding a mobile app and getting a representative sample of users to download it could help pollsters change their business for the mobile age, as well as give politicians the ability to refine and make their polls more representative and less binary.
Still, Soltis believes online sentiment analysis will get better over time and become more important as data scientists figure out how to use the results from online media to make more accurate predictions. That — combined with the use of mobile apps — has the power to change politics even more as candidates can respond to changes in sentiment in real time.
To me, all this begs the question of how this could change politics, or, more accurately, the rhetoric of a campaign. And as we enter into what pundits claim will be the “billion-dollar election,” I’m eager to see what the data will do. For more on data and how it will change our lives, come to our Structure:Data event March 21 and 22 in New York.
Images courtesy of the Romney campaign and the White House.