1 Comment

Summary:

The Sunlight Foundation, a non-profit aimed at showing how corporate interests influence government released a pretty sweet tool for citizens and big data nerds on Monday. Looking at Capitol Words, it’s easy to see how big data and cheap computing could improve government accountability.

sunlight foundation thumb

The Sunlight Foundation, a non-profit aimed at showing how corporate interests influence government released a pretty sweet tool for citizens and big data nerds on Monday. The tool, called Capitol Words, monitors how often, and which, legislators said certain phrases in an effort to track how those phrases enter and influence the political debate. Capitol Words is one of those random tools that gives us a glimpse of how cheap computing and better data analytics can change the business as usual in politics.

Already, the combo of cheap computing and big data are changing how retail firms set prices, offering insights into healthcare, and helping investors maximize rental income, so the notion it could lead to more government transparency isn’t all that crazy. In the case of Capitol Words and the Sunlight Foundation, the goal is to analyze what legislators say on the floor of the House and Senate to track how an idea can filter through a political party, a region or a debate by parsing the text data generated daily by the Congressional Record.

Tom Lee, the Director of Sunlight Labs at the Sunlight Foundation, said in an interview that the amount of data isn’t huge — about 50 or 60 gigabytes a day — but the text does need to be parsed so it can be made into something useful. So the Sunlight Foundation has developed algorithms and techniques, many of which it releases on Github, for using the data. It does the calculating and analysis on Amazon’s Elastic Map Reduce service and then uses Solr, an open-source search platform, to process people’s queries against the records. The database supporting the tool has upwards of 20 million records.

“The speech used by legislators is used to advance causes and manipulate the public,” Lee said. “And how their speech is similar or different can show how particular terms originate from some political messaging memo.”

Lee said the original version of the project in 2008 ran the search and the data parsing in parallel, but that approach was too compute-intensive and didn’t allow for the richness of the results the project can offer today by splitting the two steps up. However, he didn’t rule out coming back to running the job in parallel eventually as the data stores become larger and queries became more complex.

The Sunlight Foundation makes its findings and data available via a JSON API so others can build on it. It’s also hoping to expand beyond floor speeches to politician’s appearances on talk shows and other venues. It hopes to create other services by tying these political sound bytes to its repository of funding data, which tracks what lobbying groups and individuals politicians accept money from. It has over a terabyte-and-a-half of data on hand to work from.

And for those eagerly watching how our government attempts to become more transparent and share data, the Foundation is also working with the Government Printing Office, which published the Congressional Record to get the document in a more web-friendly, structured format. That would help the Capitol Words project become more useful and help others build their own data analytics based on what’s said in Congress. Right now, much of the esoteric (somewhat stilted) debate most often hits the general public when The Daily Show  mocks it. While Jon Stewart may be funny, he doesn’t offer the ability to track ideas over time or in any broad fashion.

“For us, this is trying to expand the way Sunlight tracks influence,” Lee said. “We track the way the money flows around Washington and it’s not enough. The ways the system is affected are too subtle and deliberate, so we’re making an investment in tracking not just the flow of money but also the flow of ideas.” And when you’re trying to track something as nebulous as ideas, analyzing a lot of data using cheap compute is perhaps the only way for a non-profit to do it.

  1. [DocDocc] Non-profit uses big data to track big government http://t.co/0UnAYSoJ via gigaOM

    Share
  2. Non-profit uses big data to track big government http://t.co/X4AQNumW

    Share
  3. Non-profit uses big data to track big government http://t.co/cYYwFYvN #Cloudcomputing

    Share
  4. Non-profit uses big data to track big government http://t.co/NibfBtrU #cloud #gigaom

    Share
  5. Non-profit uses big data to track big government http://t.co/aguLWI4w

    Share
  6. Non-profit uses big data to track big government http://t.co/tRix1hPZ

    Share
  7. Non-profit uses big data to track big government http://t.co/ewnenu9b

    Share
  8. Non-profit uses big data to track big government http://t.co/FLez61JO @GigaOM

    Share
  9. Non-profit uses big data to track big government: The Sunlight Foundation, a non-profit aimed at showing how… http://t.co/GrKU9pUW

    Share
  10. Non-profit uses big data to track big government http://t.co/xTp0XutW

    Share

Comments have been disabled for this post