Cloudera has acquired a data-visualization startup called DataPad in order to bring DataPad’s employees, rather than its technology, into its fold. News of the acquisition first hit on Monday when DataPad users, myself included, received an email noting an acquisition and the service’s impending shutdown. On Tuesday, VentureBeat reported a rumor that Cloudera was the mystery buyer, which Gigaom has since confirmed.
However, more important than the acquisition of fledgling startup (DataPad launched publicly in May with $1.7 million in seed funding) is what this seems to say about Cloudera’s plans for putting its Hadoop technologies into the hands of more users. Hadoop is often dinged as being designed with engineers in mind, thus the mad rush over the past few years to build applications on top of it, and Cloudera knows that one way to do that is to reach out to data scientists and developers in languages they understand.
Cloudera’s embrace of Apache Spark as a framework for running a majority of future big data jobs speaks to this strategy, as well. Users don’t just like Spark because it’s faster than MapReduce, they also like it because it’s easy to program and supports the Java, Scala and Python languages. This will be especially beneficial for projects such as Cloudera Oryx, a set of machine learning libraries currently being rewritten on top of Spark, and almost certain to be adopted primarily by data science types.
None of this should be surprising considering the billions of dollars up for play in the commercial Hadoop market. Cloudera, Hortonworks, MapR, Pivotal and more are all trying to win over as many users as they can for their respective flavors of Hadoop and general big data infrastructure. Spreading the cheerleading base beyond IT staff and systems architects, to include the people actually developing applications and doing data analysis within the company, is a good way to help ensure your stuff is the stuff that gets used.
Feature image courtesy of Shutterstock user fivespots.