Nodeable gives Hadoop a real-time boost with StreamReduce

Just when complaining about how slow Hadoop is starting to become a popular pastime, here comes someone that says it can help solve the problem. Nodeable (see disclosure), the company that launched last year as a Twitter-for-systems-management play, has made a shift in its business strategy and is now offering a cloud service for processing and analyzing streams of data in real time. Its new flagship service, called StreamReduce, is built atop Twitter’s open source Storm framework and acts as Hadoop’s faster, nimbler front-end partner.

To understand how StreamReduce works, it’s helpful to take a look back at how it came to be. As it turns out, Nodeable Founder and CEO Dave Rosenberg told me, the company realized quickly it needed to do something different if it wanted to add value in the systems management space, and that something was analytics. Rather than just produce a stream of tweet-like alerts to sysadmins, Nodeable would actually alert them to anomalies and emerging patterns that might signify a bigger problem to come. In doing that, Rosenberg said, the company realized it had actually created a real-time complement for Hadoop.

Fed up with systems management (it’s hard to do a repeatable cloud service in that space, and no one wants to pay for systems management, Rosenberg said), Nodeable, with support from its customers and investors, decided StreamReduce was its real business. “Between¬†Cloudera [whose CEO Mike Olson is on Nodeable’s board] and Hortonworks, it took us five muntes to find more customers willing to pay for this than we we thought we could find managing AWS,” Rosenberg said.

It works similarly to the original Nodeable product — and the UI will be familiar to legacy users — but the use cases are as broad as customers imaginations. Clickstream analysis, systems monitoring, fraud detection, you name it.¬†Essentially, users define the metrics they want to monitor, everything hits StreamReduce as a JSON file, and the system analyzes it and delivers alerts around counts, patterns and anomalies in real time. Once that’s done, it can feed data into Hadoop for more in-depth batch analysis later on.

One beta user that happens to be a major web retailer has been using StreamReduce to try and determine why customers are abandoning their online shopping carts. It’s tactic was to analyze shopping cart abandonment against slow-loading product images from Amazon S3 and negative comments on Twitter and try to determine any correlations. Doing all this after the fact using Hadoop only wouldn’t be much use as the problems were occurring.

For the web-stack aficionados out there, StreamReduce runs in the Amazon Web Services (s amzn) cloud, using a collection of AWS tools, as well as MongoDB and Amazon’s DynamoDB. Although, Rosenberg said, MongoDB — the most-mature NoSQL option at the time Nodeable started building — might get swapped with Cassandra later this year. For Nodeable’s high-scale use case, MongoDB just isn’t the right fit.

Disclosure: Nodeable is backed by True Ventures, a venture capital firm that is an investor in the parent company of this blog, Giga Omni Media. Om Malik, the founder of Giga Omni Media, is also a venture partner at True.