9 Comments

Summary:

The explosion of “big data”–much of it in complex and unstructured formats–has presented companies with a tremendous opportunity to leverage their data for better business insights through analytics. Here are examples of how big data analysis occurs in the real world.

Toy soldiers line up for battle

Toy soldiers line up for battleIn the past year, big data has emerged as one of the most closely watched trends in IT. Organizations today are generating more data in a single day than that the entire Internet was generated as recently as 2000. The explosion of “big data”–much of it in complex and unstructured formats–has presented companies with a tremendous opportunity to leverage their data for better business insights through analytics.

Wal-Mart was one of the early pioneers in this field, using predictive analytics to better identify customer preferences on a regional basis and stock their branch locations accordingly. It was an incredibly effective tactic that yielded strong ROI and allowed them to separate themselves from the retail pack. Other industries took notice of Wal-Mart’s tactics — and the success they gleaned from processing and analyzing their data — and began to employ the same tactics.

While data analytics was once considered a competitive advantage, it’s increasingly being seen as a necessity for enterprises–to the point that those that aren’t employing some kind of analytics are seen to be at a competitive disadvantage. Driven by the rise of modern statistical languages like R, there’s been a surge in enterprises hiring data analysts–which has in turn given rise to the larger data science movement. Data is a huge asset for enterprises, and they’re beginning to treat it accordingly.

For all the talk about the need to effectively analyze your data, though, there’s been relatively little written about how organizations are using data to achieve actionable results. With that in mind, here are five use cases involving analyses of large data sets that brought about valuable new insight:

  • NYU Ph.D. student conducts comprehensive analysis of Wikileaks data for greater insight into the Afghanistan conflict: Drew Conway is a Ph.D. student at New York University who also runs the popular, data-centric Zero Intelligence Agents blog. Last year, he analyzed several terabytes worth of Wikileaks data to determine key trends around U.S. and coalition troop activity in Afghanistan. Conway used the R statistics language first to sort the overall flow of information in the five Afghanistan regions, categorized by type of activity (enemy, neutral, ally), and then to identify key patterns from the data. His findings gave credence to a number of popular theories on troop activity there–that there were seasonal spikes in conflict with the Taliban and most coalition activity stemmed from the “Ring Road” that surrounds the capitol, Kabul, to name a few. Through this work, Conway helped the public glean additional insight into the state of affairs for American troops in Afghanistan and the high degree of combat they experienced there.
  • International non-profit organization uses data science to confirm Guatemalan genocide: Benetech is a non-profit organization that has been contracted by the likes of Amnesty International and Human Rights Watch to address controversial geopolitical issues through data science. Several years ago, they were contracted to analyze a massive trove of secret files from Guatemala’s National Police that were discovered in an abandoned munitions depot. The documents, of which there were over 80 million, detailed state-sanctioned arrests and disappearances that occurred during the country’s decades-long civil conflict that occurred between 1960 and 1996. There had long been whispers of a genocide against the country’s Mayan population during that period, but no hard evidence had previously emerged to verify these claims. Benetech’s scientists set up a random sample of the data to analyze its content for details on missing victims from the decades-long conflict. After exhaustive analysis, Benetech was able to come to the grim conclusion that genocide had in fact occurred in Guatemala. In the process, they were able to give closure to grieving relatives that had wondered about the fate of their loved ones for decades.

 

  • Statistician develops innovative metrics tracking for baseball players, gains widespread recognition and a job with the Boston Red Sox: Bill James (he of Moneyball fame) is a well-known figure in the world of both baseball and statistics at this point, but that has not always been the case. James, a classically trained statistician and avid baseball fan, began publishing research in the early 1970s that took a more quantitative approach to analyzing the performance of baseball players. His work focused on providing specific metrics that could empirically support or refute specific claims about players, be it the amount of runs they contributed to in a given season or how their defensive abilities contributed to or detracted from a team’s success. James’ approach became known as sabermetrics and has since expanded to incorporate a wide range of quantitative analyses for measuring baseball metrics. Over time, sabermetrics has gained wide recognition in baseball to the point that it’s now employed by all 30 Major League Baseball teams for tracking player metrics. In 2003, James was named Senior Advisor of Baseball Operations by the Boston Red Sox, a position he holds to this day.

 

  • U.S. government uses R to coordinate disaster response to BP oil spill: In the early days of last year’s Deepwater Horizon disaster, the flow of oil rate from the spill was of primary concern; estimating it accurately was key to coordinating the scale and scope of the U.S. government’s response to the emergency. The National Institute of Science and Technology (NIST) was charged with making sense of the varying estimates that existed from both BP and independent third-parties. To do so, NIST used the open source R language to run an uncertainty analysis that harmonized the estimates from various sources to come up with actionable intelligence around which disaster response efforts could be coordinated.

 

  • Medical diagnostics company analyzes millions of lines of data to develop first non-intrusive test for predicting coronary artery disease: CardioDX is a relatively small, Palo Alto, Calif.-based company that performs genomic research. One of their major initiatives over the past several years was developing a predictive test that could identify coronary artery disease in its most nascent stages. To do so, researchers at the company analyzed over 100 million gene samples to ultimately identify the 23 primary predictive genes for coronary artery disease. The resulting test, known as the “Corus CAD Test,” was recognized as on of the “Top Ten Medical Breakthroughs of 2010” by TIME Magazine.

These are but a few brief examples of the exciting work that’s being undertaken in the rapidly growing discipline of data science. More and more, data analysis is being relied on to provide context for critical business decisions, a trend that promises to increase as data sets grow larger and more complex and scientists continue to push the limits of statistical innovation.

David Smith is vice president of community at Revolution Analytics, a company founded in 2007 to foster R analytics by creating programs to make it easier for data scientists to analyze large amounts of data.

  1. Seeing as the owners of the Boston Red Sox have just bought Liverpool a club in the UK that are one of our Rivals I am really hoping that their use of Big Data there was a fluke not to be repeated!!

    Share
    1. Kirk Mettler Tuesday, July 19, 2011

      I have meet with the analytics guys at the Red Sox. They will use the same concepts and Liverpool and they will get better. It is how they do business.

      Share
  2. “More and more, data analysis is being relied on to provide context for critical business decisions…” – well, i think this is a good thing as far the “data” they rely on is real.

    Share
  3. FYI, NIST stands for National Institute of Standards (not Science) and Technology.

    Share
  4. Rohan Verma Monday, July 18, 2011

    Always interesting to see real-world examples of the power of Big Data

    Share
  5. But what will happen when big data is used not just for data mining, but actually to run the company systems with. Instead of having a number of different systems in use, just one big data implementation is used to run everything.
    That will make big data even more relevant and mainstream.

    Share
  6. There were two user of big data before everyone else caught on: geophysical, the way oil companies predict where they might find oil based on the analysis of massive amounts of seismic data, and the HEPP (High Energy Particle Physics) crowd at places like CERN, where they generate massive data in tiny fractions of a second and then analyze the Dickens out of it to discover new particles, physical constants, dark matter, and the like.

    Share
  7. I cannot agree more in the impact of Big Data both in corporate and also of DATA for consumer. I think that the abundance of data will also have tremendous impact in how internet users will interact with data. Just check out companies such as Junar, BuzzData or Factual among others.

    Share
  8. Irshad Raihan Wednesday, July 20, 2011

    Great article. “Big Data” is a bit of a misnomer because it implies that the challenge is size when really size is only one aspect of the challenge. Read more at http://h30507.www3.hp.com/t5/Around-the-Storage-Block-Blog/The-Big-Data-discussion-continues-It-s-the-data-silly/ba-p/95719

    Share
  9. 5 Real-world uses of #BigData http://t.co/rJIwU5oL #DecisionsBasedOnAnalysis #Analytics #Data #SharingKnowledge

    Share
  10. 5 Real-world uses of #BigData http://t.co/rJIwU5oL #DecisionsBasedOnAnalysis #Analytics #Data #SharingKnowledge

    Share

Comments have been disabled for this post