While it doesn’t produce the kind of instant name recognition that some other software platforms do, you’ve probably already used Hadoop many times. Hadoop is a free, open-source software framework, designed to work on clusters of computers, that can mine huge sets of data very quickly. It was inspired by components in Google’s search platform, particularly MapReduce, and is best known for powering fast, distributed searches at sites including Yahoo! and Facebook. But Hadoop’s transition from powering search tools at web sites to many other types of applications is already well underway.
As Cloudera CEO Mike Olson recently told GigaOM in an interview, “Hadoop is going to find potential markets in any industry where there are large data sets that need complex analysis.” Olson, whose company commercially supports Hadoop and recently raised $6 million in Series B venture funding to continue its efforts, said point-of-sale data and genomics data are promising areas for Hadoop. He also pointed to analyzing large data sets in the bioinformatics and pharmaceutical industries as emerging opportunities. The common thread between these industries is that they’re fueled by the production and analysis of huge data sets, sometimes petabytes in size. Hadoop shines at making quick work of analysis of giant data sets.
In the pharmaceutical industry, for example, companies drive their drug pipelines through drug discovery, which often requires modeling drugs on supercomputers using huge numbers of inputs and combinations. Officials in the pharmaceutical industry are often vocal about how they could speed up discovery of meaningful, sometimes life-saving drugs if they just had faster supercomputers. What if the software doing the analysis became much faster, instead? That’s part of the promise of the Hadoop platform.
Web search was one of the first big data sets for which high-speed analysis seemed valuable — it’s bread and butter for companies like Google and Yahoo. But increasingly, companies are recognizing that there are large data sets everywhere. Developer and open-source enthusiast Reuven Lerner has suggested some additional applications for which large data are sets involved, and Hadoop’s ability to do fast queries and analysis might shine. These include: translation of large texts from one language to another, price calculations based on large price catalogs, stock history analysis, and more.