When Microsoft and VMware latch onto a technology, you know it’s for real. VMware is now pushing a project called Spring Hadoop that lets developers use the popular Spring Java framework to write big data applications atop Apache Hadoop. It’s estimated that Spring has a developer base of more than 5 million, and giving them the ability to write Hadoop applications could be a big deal both for Hadoop adoption and for Spring’s stickiness.
Hadoop, of course, is the Apache Software Foundation project for storing and processing large volumes of unstructured data, often referred to as “big data.” The project gained popularity within large web companies such as Yahoo and Facebook that had to find a way to deal with and analyze mountains of user data in the form of logfiles, photos, clickthroughs and other new formats. Hadoop was inspired by Google’s work on the MapReduce parallel-processing framework and its distributed Google File System
There is now a whole ecosystem of companies selling Hadoop-based products, from commercial distributions to application-specific analytics software, and many mainstream organizations are using it to analyze their own data. However, the Hadoop MapReduce framework is notoriously difficult to program, which has helped keep mainstream developers away from the valuable data stored within Hadoop.
According to VMware’s press release:
Key aspects of Spring Hadoop include:
- Support for configuration, creation and execution of MapReduce, Streaming, Hive, Pig and Cascading jobs via the Spring container
- Comprehensive HDFS data access support through JVM scripting languages (Groovy, JRuby, Jython, Rhino, etc.)
- Declarative configuration support for HBase
- Dedicated Spring Batch support for developing powerful workflow solutions incorporating HDFS operations and all types of Hadoop jobs
- Support for use with Spring Integration that provides easy access to a wide range of existing systems using an extensible event-driven pipes and filters architecture
- Powerful Hadoop configuration options and a templating mechanism for client connections to Hadoop
- Declarative and programmatic support for Hadoop Tools, including FsShell and DistCp
Spring Hadoop is also open source, available under the Apache 2.0 license. More information on programming Hadoop jobs using Spring is available on the SpringSource blog.
This isn’t the first time VMware has given Spring developers access to new types of data stores. In 2010, VMware bought GemStone for its low-latency distributed data grid technology. That’s now part of the Spring Data project, which also features tie-ins to NoSQL databases MongoDB, Riak, Redis and Neo4j
Making Hadoop accessible to more developers and, thus, more application types will go along way toward making it the de facto big data platform for future applications, a development we’ll discuss in more detail at our Structure:Data conference March 21-22 in New York.