Cloudera bought DataPad because data scientists need tooling, too

1 Comment

Cloudera has acquired a data-visualization startup called DataPad in order to bring DataPad’s employees, rather than its technology, into its fold. News of the acquisition first hit on Monday when DataPad users, myself included, received an email noting an acquisition and the service’s impending shutdown. On Tuesday, VentureBeat reported a rumor that Cloudera was the mystery buyer, which Gigaom has since confirmed.

However, more important than the acquisition of fledgling startup (DataPad launched publicly in May with $1.7 million in seed funding) is what this seems to say about Cloudera’s plans for putting its Hadoop technologies into the hands of more users. Hadoop is often dinged as being designed with engineers in mind, thus the mad rush over the past few years to build applications on top of it, and Cloudera knows that one way to do that is to reach out to data scientists and developers in languages they understand.

Wes McKinney. Source: Wes McKinney on Twitter

Wes McKinney. Source: Wes McKinney on Twitter

Pandas Mortar Data wrapped the MapReduce Java commands in a mixture of Pig and Python released a Python client for its Impala SQL-on-Hadoop engine

Cloudera’s embrace of Apache Spark as a framework for running a majority of future big data jobs speaks to this strategy, as well. Users don’t just like Spark because it’s faster than MapReduce, they also like it because it’s easy to program and supports the Java, Scala and Python languages. This will be especially beneficial for projects such as Cloudera Oryx, a set of machine learning libraries currently being rewritten on top of Spark, and almost certain to be adopted primarily by data science types.

None of this should be surprising considering the billions of dollars up for play in the commercial Hadoop market. Cloudera, Hortonworks, MapR, Pivotal and more are all trying to win over as many users as they can for their respective flavors of Hadoop and general big data infrastructure. Spreading the cheerleading base beyond IT staff and systems architects, to include the people actually developing applications and doing data analysis within the company, is a good way to help ensure your stuff is the stuff that gets used.

Feature image courtesy of Shutterstock user fivespots.

1 Comment

Peter Fretty

Supporting the data scientist is one of the aspects that is far too often overlooked as organizations hope to glean benefits from everything they collect. Unfortunately as a recent IDG SAS survey shows, few can afford to do so since most lack the capabilities to highly perform in many of the key tasks crucial in data analytic environments.

Peter Fretty

Comments are closed.