Pivotal open sources its Hadoop and Greenplum tech, and then some

1 Comment

Credit: Pivotal CEO Paul Maritz / Jakub Mosur

Pivotal, the cloud computing and big data company that spun out from EMC and VMware in 2013, is open sourcing its entire portfolio of big data technologies and is teaming up with Hortonworks, IBM, GE, and several other companies on a Hadoop effort called the Open Data Platform.

Rumors about the fate of the company’s data business have been circulating since a round of layoffs began in November, but, according to Pivotal, the situation isn’t as dire as some initial reports suggested.

There is a lot of information coming out of the company about this, but here are the key parts:

  • Pivotal is still selling licenses and support for its Greenplum, HAWQ and GemFire database products, but it is also releasing the core code bases for those technologies as open source.
  • Pivotal is still offering its own Hadoop distribution, Pivotal HD, but has slowed development on core components of MapReduce, YARN, Ambari and the Hadoop Distributed File System. Those four pieces are the starting point for a new association called the Open Data Platform, which includes Pivotal, [company]GE[/company], [company]Hortonworks[/company], [company]IBM[/company], Infosys, Pivotal, SAS, Altiscale, [company]EMC[/company], [company]Verizon[/company] Enterprise Solutions, [company]VMware[/company], [company]Teradata[/company] and “a large international telecommunications firm,” and which promises to build its Hadoop technologies using a standard core of code.
  • Pivotal is working with Hortonworks to make Pivotal’s big data technologies run on the Hortonworks Data Platform, and eventually on the Open Data Platform core. Pivotal will continue offering enterprise support for Pivotal HD, although it will outsource to Hortonworks support requests involving the guts of Hadoop (e.g., MapReduce and HDFS).

Sunny Madra, vice president of the data and mobile product group at Pivotal, said the company has a relatively successful big data business already — $100 million overall, $40 million of which came from the Big Data Suite license bundle it announced last year — but suggested that it sees the writing on the wall. Open source software is a huge industry trend, and he thinks pushing against it is as fruitless as pushing against cloud computing several years ago.

“We’re starting to see open source pop up as an RFP within enterprises,” he said. “. . . If you’re picking software [today] . . . you’d look to open source.”

pivotalbds

The Pivotal Big Data Suite.

Madra pointed to Pivotal’s revenue numbers as proof the company didn’t open source its software because no one wanted to pay for it. “We wouldn’t have a $100 million business . . . if we couldn’t sell this,” he said. Maybe, but maybe not: Hortonworks isn’t doing $100 million a year, but word was that Cloudera was doing it years ago (on Tuesday, Cloudera did claim more than $100 million in revenue in 2014). Depending how one defines “big data,” companies like Microsoft and Oracle are probably making much more money.

However, there were some layoffs late last year, which Madra attributed to consolidation of people, offices and efforts rather than a failing business. Pivotal wanted to close some global offices and bring the data team and Cloud Foundry teams under the same leadership, and to focus its development resources on its own intellectual property around Hadoop. “Do we really need a team going and testing our own distribution?” he asked, troubleshooting it, certifying it against technologies and all that goes along with that?

EMC first launched the Pivotal HD Hadoop distribution, as well as the HAWQ SQL-on-Hadoop engine, with much ado just over two years ago.

The deal with Hortonworks helps alleviate that engineering burden in the short term, and the Open Data Platform is supposed to help solve it over a longer period. Madra explained the goal of the organization as Linux-like, meaning that customers should be able to switch from one Hadoop distribution to the next and know the kernel will be the same, just like they do with the various flavors of the Linux operating system.

Mike Olson, Cloudera’s chief strategy officer and founding CEO, offered a harsh rebuttal to the Open Data Platform in a blog post on Tuesday, questioning the utility and politics of vendor-led consortia like this. He simultaneously praised Hortonworks for its commitment to open source Hadoop and bashed Pivotal on the same issue, but wrote, among other things, of the Open Data Platform: “The Pivotal and Hortonworks alliance, notwithstanding the marketing, is antithetical to the open source model and the Apache way.”

The Pivotal HD and Hawq architecture

Much of this has been open sourced or replaced.

As part of Pivotal’s Tuesday news, the company also announced additions to its Big Data Suite package, including the Redis key-value store, RabbitMQ messaging queue and Spring XD data pipeline framework, as well as the ability to run the various components on the company’s Cloud Foundry platform. Madra actually attributes a lot of Pivotal’s decision to open source its data technologies, as well as its execution, to the relative success the company has had with Cloud Foundry, which has always involved an open source foundation as well as a commercial offering.

“Had we not had the learnings that we had in Cloud Foundry, then I think it would have been a lot more challenging,” he said.

Whether or not one believes Pivotal’s spin on the situation, though, the company is right in realizing that it’s open source or bust in the big data space right now. They have different philosophies and strategies around it, but major Hadoop vendors Cloudera, Hortonworks and MapR are all largely focused on open-source technology. The most popular Hadoop-ecosystem technologies, including Spark, Storm and Kafka, are open source, as well. (CEOs, founders and creators from many of these companies and projects will be speaking at our Structure Data conference next month in New York.)

Pivotal might eventually sell billions of dollars worth of software licenses for its suite of big data products — there’s certainly a good story there if it can align the big data and Cloud Foundry businesses into a cohesive platform — but it probably has reached its plateau without having an open source story to tell.

Update: This post was updated at 12:22 p.m. PT to add information about Cloudera’s revenue.

1 Comment

Hari Sekhon

Mike Olson is correct that this announcement adds little value – the market wants a standard open source platform, which is the gap in the market Hortonworks fills regardless. ODP is simply a way of big vendors using HDP but renaming it ODP to avoid admitting they can’t compete with Hortonworks. Cloudera can compete with Hortonworks today because they were first and have a great platform but this may shift the balance of power in the long run. Cloudera should open source and contribute their remaining components to Apache to avoid them being excluded from the standard architecture of the new world order since Cloudera engineering is very good and it would be a shame to see so many good engineers work become a dead end.

Comments are closed.