5 Comments

Summary:

HPCC Systems has released the open source code of its data-processing software that it’s positioning as a better version of Hadoop. The code is available on Github, and it marks the commencement of HPCC Systems’ quest to build a community of developers underneath Hadoop’s expansive shadow.

open source

HPCC Systems, the division of LexisNexis Risk Solutions dedicated to big data, has released the open source code of its data-processing-and-delivery software it’s positioning as a better version of Hadoop. The High Performance Computing Cluster code is available on Github, and it marks the commencement of HPCC Systems’ quest to build a community of developers underneath Hadoop’s expansive shadow.

“We’re now really open source,” LexisNexis CTO Armando Escalante told me, responding to early criticism that the company was dragging its feet on releasing the code. He said he’s excited, but nervous because the code is now exposed to reviews and comments after years of operating privately within LexisNexis Risk Solutions.

The HPCC architecture includes the Thor Data Refinery Cluster and the Roxy Rapid Data Delivery Cluster. As I explained when covering the HPCC Systems launch in June, “Thor — so named for its hammer-like approach to solving the problem — crunches, analyzes and indexes huge amounts of data a la Hadoop. Roxie, on the other hand, is more like a traditional relational database or database warehouse that even can serve transactions to a web front end.” Both tools leverage the company’s Enterprise Control Language, which Escalante describes as easier, faster and more efficient than Hadoop MapReduce.

Aside from the open source Community version, HPCC Systems also offers a paid Enterprise version of the HPCC product. The core code is the same, Escalante explained, with the major differences being additional enterprise-grade capabilities such as management tools and support and services.

It will be a tall order to displace Hadoop — which has growing vendor, project and developer ecosystems — but Escalante is confident HPCC can do it. According to Escalante, Hadoop needs a large community because it’s a growing project, whereas HPCC is already mature because it has been serving large customers for a decade. It’s like trying to evolve a microbe into a human being instead of just starting with a human being off the bat. The challenge, he thinks, will be spreading that message to web startups already sold on and experienced with Hadoop.

However, Escalante doesn’t think most enterprises are locked into Hadoop at this point, if they’ve even used it at all. And with its track record and Enterprise Edition features, HPCC is arguably more geared toward enterprises anyhow. For companies spending big money on traditional hardware systems, Escalante says HPCC has to look even better.

“We haven’t killed Hadoop [yet] … but we have killed mainframes,” he explained. By mainframes, he means all the remnant legacy data centers, such as large, expensive storage systems, data warehouses and OLAP systems. Because of Roxie’s capabilities running on commodity hardware, Escalante said LexisNexis was able to get rid of millions of dollars worth of legacy gear. As large enterprise’s data volumes keep growing, he said, they’ll have to pay through the nose to buy traditional systems big enough to handle the load.

With Hadoop, companies must maintain separate data warehouse environments, although startups Hadapt, and to some degree, Platfora, aim to change that.

HPCC Systems, as well as Microsoft with its Dryad project, has an outside chance to steal some of Hadoop’s thunder with developers, but as Escalante acknowledged, its best chance is probably with large customers that will be moved by its enterprise-readiness. HPCC Systems is touting Sandia National Laboratories and the Georgia Tech Research Institute as two big-data-savvy users already sold on HPCC, and Escalante promises some big-name customers wins in the next few months.

Feature image courtesy of Flickr user opensourceway.

  1. So what you mean to say is that LexisNexis is getting nervous about the possibilities of startups in a garage having the opportunity to run with Hadoop on, say, AWS, and creating a threat to LexisNexis. Great news, it would be cool to see someone give them a run for their money!

    Share
  2. Cloud computing is huge. Just look at the numbers and all this open sourcing will make sudden sense.

    http://statspotting.com/2011/05/cloud-computing-statistics-how-big-is-the-cloud-exactly/

    Share
  3. Nice one, HPCC is cool stuff indeed and IMO it will be serious competition to hadoop. Big data is all the rage these days. I did a cool video the other day on deploying HPCC to the cloud on ubuntu using ensemble. You can check it out here: http://cloud.ubuntu.com/2011/08/crunching-bigdata-with-hpcc-and-ensemble/ You can also read all the gory details at: http://cloud.ubuntu.com/2011/08/hpcc-with-ubuntu-server-and-ensemble-2/

    Share
  4. Is putting this out into the open a good idea, for me it is great, and I can tailor some of my thought processes around what I see there, still it is not realtime which is my concern at present, Best George Ingram

    Share
    1. Flavio Villanustre Thursday, September 29, 2011

      Roxie, the HPCC massive data delivery engine, handles real-time query delivery. Roxie can deliver query responses in sub-second predictable latencies to thousands of concurrent users (depending, of course, on the size of the cluster and the complexity of the queries).

      Share

Comments have been disabled for this post