Open source software is about to blow up the database industry, and Hadoop is the nitroglycerin. This is not a good thing for the legacy software vendors that have built that industry over the past few decades. Facing an ever-mounting attack on their business models, database stalwarts have to hunker down and protect the status quo, or try to ride the open source shockwaves into the next generation of enterprise software.
At a broad scale, the advent of open source software as a legimate option for enterprise workloads has been a long time coming. Linux helped lead the charge during the 1990s and early 2000s, and now there’s open source software everywhere you look, from the operating systems up to the applications. Some large IT shops — including at buttoned-up institutions such as Goldman Sachs — make a product’s open source status a primary consideration in choosing what new technologies to adopt.
NoSQL makes its market
Perhaps nowhere are the disruptive effects of open source software felt as strongly as in the database industry — and they go a lot deeper than MySQL. Sure, that relational database management system has been hugely popular, but it has never really been considered the cream of the crop in terms of capabilities. Historically, if you wanted “enterprise” features, you went with Oracle Database, IBM DB2 or Microsoft SQL Server. For specialty workloads, you could choose from any number of other proprietary systems.
But then something happened — something called NoSQL. People can quibble about the label, but its effects have been undeniable. Largely open source, NoSQL databases aren’t eliminating relational databases in many cases, but they’re certainly capturing a lot of new applications that deal in fast, big and/or semi-structured data. Web companies were the early adopters, but we’ve also covered NoSQL deployments at places including Disney (it has deployed both MongoDB and Cassandra), MetLife and Comcast, among others.
For certain legacy applications, NoSQL databases are displacing relational and other proprietary databases. Travel conglomerate Orbitz recently replaced its Oracle Coherence implementation with Couchbase, claiming significant improvements in performance and reductions in cost. LinkedIn has built a system called Espresso that’s supposed to remove the last bastion of proprietary technology — its Oracle Database installation — from the company’s data environment. LinkedIn plans to open source Espresso later this year.
Embracing change to keep customers happy
The big question for legacy database vendors then, is how to react to the threat open source technologies
pose to their lucrative software-licensing businesses. The prudent approach appears to be publicly embracing and courting the NoSQL community, particularly the popular MongoDB technology. IBM is pushing MongoDB as a standard for next-generation applications and is building strong connections between it and the company’s existing database offerings. Microsoft is using its Windows Azure cloud computing platform to lure MongoDB developers as well as traditional SQL Servers developers.
Oracle is taking a different tack. It’s actually offering its own open source Oracle NoSQL Database that’s based on the key-value BerkeleyDB technology the company acquired years ago. Oracle doesn’t appear to have a clear strategy around MongoDB, however, even though MongoDB creator 10gen is publicly targeting Oracle.
Whatever the NoSQL strategy, though, the message from the incumbents seems clear: We know users want these capabilities and we’ll support them. Users building certain applications on MongoDB or putting certain data there might mean less licensing revenue, but integration with existing technologies means users don’t have to go elsewhere to find a vendor who’ll support all they want to do.
Hadoop wants it all
But it’s Hadoop that’s really shaking things up in database land. That’s because while the NoSQL movement is a collection of largely disparate, open-source technologies that might or might not directly challenge big-tickets database products, Hadoop is a movement unto itself. And it wants to engulf every piece of data in its path.
It’s not just a platform for running batch MapReduce jobs anymore, but is fast taking on additional capabilities such as interactive queries, enterprise search and stream processing. There are graph databases and HBase built atop Hadoop, and that’s just the beginning. A company like Cloudera, once considered a strategic partner for analytic database and data warehouse vendors, suddenly looks a lot more like a competitor as continutes its evolution into full-fledged data-management vendor.
Take, for example, a recent quote by Cloudera Co-founder and (now) Chief Strategy Officer Mike Olson when I asked him about falling revenues at data warehouse pioneer Teradata: “It is true to say folks are looking at what they’re running on Teradata and rationalizing those decisions. … [They’re trying to] concentrate first-class spend on a first-class workload.” Translation: Hadoop as a bit bucket and general-purpose data platform can handle a lot of workloads (analytic SQL among them) for a lot less money than the legacy technologies.
If it’s not Hadoop distribution vendors directly, it’s higher-level startups such as Platfora, Continuuity, Datameer and others that are trying to make Hadoop a better platform for building applications and running serious analytics. Partnerships with popular third-party applications such as Tableau and Splunk make it even easier to make use of data indiscriminately dumped into Hadoop.
Legacy vendors are concerned because Hadoop signals a future where customers might not want to buy a dozen specialized systems for a dozen different workloads. Hadoop lets users consolidate all their data in one place and, as the platform becomes more capable, analyze that data where it resides. As Olson said, the data that really needs Teradata, Exadata or Netezza will still go there, but the rest can probably stay in Hadoop.
If you can’t beat ‘em …
And those legacy vendors are reacting quickly and decisively. I’ve heard from some reliable folks that Microsoft tried to buy Hortonworks for a few hundred million dollars a while back. The most recent rumor is new Hadoop entrant Intel offering $700 million for Hortonworks. Think about that: two companies who’ve faced antitrust lawsuits in recent memory trying to buy up a company based entirely around open source technology.
Hortonworks can refuse those offers because Hadoop has so much potential and so much demand. Companies such as Microsoft, Teradata and Rackspace are willing to pay Hortonworks big money to help them build their own Hadoop product offerings. Teradata — purveyor of big, expensive appliances custom-built for analytics — is now offering a reference architecture for Hadoop running on commodity Dell servers and even is in the business of helping customers deploy the Hortonworks Data Platform on whatever machines they choose. Microsoft, which famously spurned Hadoop (HBase, specifically) as the foundation of its Bing platform and was recently pitching its own big data stack, has seen the light in the form of customer demand for Hadoop on Windows and needed help building it.
EMC-VMware spinoff Pivotal has hundreds of engineers dedicated to its Hadoop-meets-analytic-database technology. IBM is building out a full complement of Hadoop offerings, as well. They can either accept Hadoop and the lower margins it might bring, or they can risk losing lots of deals altogether.
It’s no wonder that Hortonworks already has raised $100 million, half of which came via a $50 million round announced earlier this week. Or that Cloudera — with a new CEO on board — is, I’m told, eyeing up an initial public offering. For what it’s worth as a comparison (if Cloudera were to go public even within the next two years), it took Oracle nearly 10 years to go public after launching in 1977; Cloudera turned 5 years old on June 27.
No, legacy database companies and technologies aren’t going away anytime soon, but it’s hard to see how their revenue streams don’t undergo a serious shift in direction over the years to come. At some point, it might only the most-demanding, most mission-critical applications and data that really demand high-end proprietary software. The rest? Well, there’s something open source for that.
Feature image courtesy of Shutterstock user Fer Gregory.