3 Comments

Summary:

Concurrent, the company providing the Cascading data workflow API, has raised a $900,000 seed round to capitalize on the newfound excitement around Hadoop. Cascading is an open-source API for creating and running data workflows atop Hadoop clusters.

handing over money

Concurrent, the company providing the Cascading data workflow API, has raised a $900,000 seed round to capitalize on the newfound excitement around Hadoop. The funding came from Rembrandt Venture Partners, True Ventures (see disclosure below) and several angel investors.

Cascading, which Concurrent Founder and CEO Chris Wensel created, is an open-source API for creating and running data workflows atop Hadoop clusters. It’s an alternative to MapReduce, the standard framework for writing Hadoop applications, as well as Hive, the Facebook-created Apache project that provides data warehouse features for Hadoop environments. The Concurrent web site describes Cascading like this:

The processing API lets the developer quickly assemble complex distributed processes without having to “think” in MapReduce. And to efficiently schedule them based on their dependencies and other available meta-data.

Concurrent has been around since 2007, but only now is there enough activity around Hadoop and big data to justify putting much effort into building new products and hiring a team of engineers, said Wensel.

Certainly, Hadoop is at its pinnacle right now, with EMC, MapR and Hortonworks all making very public entrances into the distribution space lately to join incumbents such as Cloudera, IBM and Amazon Web Services (with Elastic MapReduce). Now that companies are comfortable with the prospect of Hadoop, and possibly using it to some degree, Wensel thinks they’re ready to start hearing about MapReduce alternatives.

Looking forward, Wensel thinks there’s an opportunity to expand Cascading support beyond Hadoop distributions (it’s currently certified for Apache Hadoop, MapR, EMC and Elastic MapReduce) and into new Hadoop-based “forks, derivatives and re-imaginings” that gain enough traction. Longer term, he sees an opportunity for a common API to support analytic workflows across a variety of distributed systems, Hadoop-based or not.

In the near future, though, Cascading users can look forward to version 2.0 in the fall, which includes a number of significant improvements, including the ability to use system memory for faster analysis of small datasets. He also said Concurrent plans to create products complementary to the Cascading framework that will help monitor monitor workflows and let users make better decisions by giving them more insights.

Although Concurrent’s seed funding is relatively small compared some of the other big data investments we’ve seen lately, it’s significant. I predicted in my second-quarter wrap-up for GigaOM Pro that we’ll start seeing more investment in higher-level Hadoop tools, and Cascading is one of them.

With the distribution layer locked down, there’s plenty of room for alternative data-processing frameworks such as Cascading and turnkey analytics products such as Zettaset, which just raised $3 million itself, to steal some of the spotlight and make it easier to take advantage of Hadoop’s parallel-processing prowess.

Disclosure: Concurrent is backed by True Ventures, a venture capital firm that is an investor in the parent company of this blog, Giga Omni Media. Om Malik, the founder of Giga Omni Media, is also a venture partner at True.

  1. Cascading is great, and its super-simple to use. Check out Cascading Multi-tool, its a great set of tools to make life easier on Hadoop.

  2. Amr Awadallah Tuesday, July 26, 2011

    Congrats to Chris W and co, keep up the great work.

  3. the hadoop “market” is way oversaturated.

Comments have been disabled for this post