Blog Post

Prepare for change! This is not your father’s database industry

Open source software is about to blow up the database industry, and Hadoop is the nitroglycerin. This is not a good thing for the legacy software vendors that have built that industry over the past few decades. Facing an ever-mounting attack on their business models, database stalwarts have to hunker down and protect the status quo, or try to ride the open source shockwaves into the next generation of enterprise software.

At a broad scale, the advent of open source software as a legimate option for enterprise workloads has been a long time coming. Linux helped lead the charge during the 1990s and early 2000s, and now there’s open source software everywhere you look, from the operating systems up to the applications. Some large IT shops — including at buttoned-up institutions such as Goldman Sachs — make a product’s open source status a primary consideration in choosing what new technologies to adopt.

NoSQL makes its market

Perhaps nowhere are the disruptive effects of open source software felt as strongly as in the database industry — and they go a lot deeper than MySQL. Sure, that relational database management system has been hugely popular, but it has never really been considered the cream of the crop in terms of capabilities. Historically, if you wanted “enterprise” features, you went with Oracle Database, IBM DB2 or Microsoft SQL Server. For specialty workloads, you could choose from any number of other proprietary systems.

But then something happened — something called NoSQL. People can quibble about the label, but its effects have been undeniable. Largely open source, NoSQL databases aren’t eliminating relational databases in many cases, but they’re certainly capturing a lot of new applications that deal in fast, big and/or semi-structured data. Web companies were the early adopters, but we’ve also covered NoSQL deployments at places including Disney (it has deployed both MongoDB and Cassandra), MetLife and Comcast, among others.

For certain legacy applications, NoSQL databases are displacing relational and other proprietary databases. Travel conglomerate Orbitz recently replaced its Oracle Coherence implementation with Couchbase, claiming significant improvements in performance and reductions in cost. LinkedIn has built a system called Espresso that’s supposed to remove the last bastion of proprietary technology — its Oracle Database installation — from the company’s data environment. LinkedIn plans to open source Espresso later this year.

The NoSQL databases -- largely open source -- are no joke. Source: 451 Group
The NoSQL databases — largely open source — are no joke. Source: 451 Group

Embracing change to keep customers happy

The big question for legacy database vendors then, is how to react to the threat open source technologies pose to their lucrative software-licensing businesses. The prudent approach appears to be publicly embracing and courting the NoSQL community, particularly the popular MongoDB technology. IBM is pushing MongoDB as a standard for next-generation applications and is building strong connections between it and the company’s existing database offerings. Microsoft is using its Windows Azure cloud computing platform to lure MongoDB developers as well as traditional SQL Servers developers.

Oracle is taking a different tack. It’s actually offering its own open source Oracle NoSQL Database that’s based on the key-value BerkeleyDB technology the company acquired years ago. Oracle doesn’t appear to have a clear strategy around MongoDB, however, even though MongoDB creator 10gen is publicly targeting Oracle.

Whatever the NoSQL strategy, though, the message from the incumbents seems clear: We know users want these capabilities and we’ll support them. Users building certain applications on MongoDB or putting certain data there might mean less licensing revenue, but integration with existing technologies means users don’t have to go elsewhere to find a vendor who’ll support all they want to do.

Hadoop wants it all

But it’s Hadoop that’s really shaking things up in database land. That’s because while the NoSQL movement is a collection of largely disparate, open-source technologies that might or might not directly challenge big-tickets database products, Hadoop is a movement unto itself. And it wants to engulf every piece of data in its path.

It’s not just a platform for running batch MapReduce jobs anymore, but is fast taking on additional capabilities such as interactive queries, enterprise search and stream processing. There are graph databases and HBase built atop Hadoop, and that’s just the beginning. A company like Cloudera, once considered a strategic partner for analytic database and data warehouse vendors, suddenly looks a lot more like a competitor as continutes its evolution into full-fledged data-management vendor.

Structure Data 2012: Michael Olson – CEO, Cloudera
Cloudera’s Mike Olson at Structure: Data 2012
(c) 2012 Pinar Ozger [email protected]

Take, for example, a recent quote by Cloudera Co-founder and (now) Chief Strategy Officer Mike Olson when I asked him about falling revenues at data warehouse pioneer Teradata: “It is true to say folks are looking at what they’re running on Teradata and rationalizing those decisions. … [They’re trying to] concentrate first-class spend on a first-class workload.” Translation: Hadoop as a bit bucket and general-purpose data platform can handle a lot of workloads (analytic SQL among them) for a lot less money than the legacy technologies.

If it’s not Hadoop distribution vendors directly, it’s higher-level startups such as Platfora, Continuuity, Datameer and others that are trying to make Hadoop a better platform for building applications and running serious analytics. Partnerships with popular third-party applications such as Tableau and Splunk make it even easier to make use of data indiscriminately dumped into Hadoop.

Legacy vendors are concerned because Hadoop signals a future where customers might not want to buy a dozen specialized systems for a dozen different workloads. Hadoop lets users consolidate all their data in one place and, as the platform becomes more capable, analyze that data where it resides. As Olson said, the data that really needs Teradata, Exadata or Netezza will still go there, but the rest can probably stay in Hadoop.

If you can’t beat ’em …

And those legacy vendors are reacting quickly and decisively. I’ve heard from some reliable folks that Microsoft tried to buy Hortonworks for a few hundred million dollars a while back. The most recent rumor is new Hadoop entrant Intel offering $700 million for Hortonworks. Think about that: two companies who’ve faced antitrust lawsuits in recent memory trying to buy up a company based entirely around open source technology.

Hortonworks can refuse those offers because Hadoop has so much potential and so much demand. Companies such as Microsoft, Teradata and Rackspace are willing to pay Hortonworks big money to help them build their own Hadoop product offerings. Teradata — purveyor of big, expensive appliances custom-built for analytics — is now offering a reference architecture for Hadoop running on commodity Dell servers and even is in the business of helping customers deploy the Hortonworks Data Platform on whatever machines they choose. Microsoft, which famously spurned Hadoop (HBase, specifically) as the foundation of its Bing platform and was recently pitching its own big data stack, has seen the light in the form of customer demand for Hadoop on Windows and needed help building it.

EMC-VMware spinoff Pivotal has hundreds of engineers dedicated to its Hadoop-meets-analytic-database technology. IBM is building out a full complement of Hadoop offerings, as well. They can either accept Hadoop and the lower margins it might bring, or they can risk losing lots of deals altogether.

That top layer will only expand.
That top layer will only expand.

It’s no wonder that Hortonworks already has raised $100 million, half of which came via a $50 million round announced earlier this week. Or that Cloudera — with a new CEO on board — is, I’m told, eyeing up an initial public offering. For what it’s worth as a comparison (if Cloudera were to go public even within the next two years), it took Oracle nearly 10 years to go public after launching in 1977; Cloudera turned 5 years old on June 27.

No, legacy database companies and technologies aren’t going away anytime soon, but it’s hard to see how their revenue streams don’t undergo a serious shift in direction over the years to come. At some point, it might only the most-demanding, most mission-critical applications and data that really demand high-end proprietary software. The rest? Well, there’s something open source for that.

Feature image courtesy of Shutterstock user Fer Gregory.

4 Responses to “Prepare for change! This is not your father’s database industry”

  1. Jean Michel LeTennier

    Hello There

    Introduction to NEW SCIENCE…

    Imagine being able to do anything your mind can imagine, well now it can..
    The 6th Normal Form is not a TERM.. it is a GOAL.. the “holy Grail” if you will of data management and more importantly “Information” management.. Imagine being able to Store IDEAS as opposed to disconnected bits of data..

    A brief “INCORRECT” comment on WIKI.. “A relvar R [table] is in sixth normal form (abbreviated 6NF) if and only if it satisfies no nontrivial join dependencies at all — where, as before, a join dependency is trivial if and only if at least one of the projections (possibly U_projections) involved is taken over the set of all attributes of the relvar [table] concerned.[Date et al.][”

    The TRUE definition of 6th Normal form is OBJECT Database.. where each and every piece of information/data is ATOMIC in nature and can be associated with any other piece of data/information, and thus NO restrictions.. NO constraints.. NO tables, NO Rows, NO VIEWS, NO CUBES… the correct term is “ASSOCIATIVE DATABASE” or Information system as it is in 3 dimensions by default.. and technically (N) dimensions

    the advantages are thus:
    100x SQL/ROW/TABLE speed
    1/3 disk space
    1 EXABYTE capacity – Single instance storage.. (no piece of data is ever duplicated)
    Security – Un-hackable – there is nothing to hack into
    NO QUERIES – we use filtering
    NO TABLES – thus no indexing to worry about
    Automated data aggregation – as many sources as required..

    and that is just the start.. ;-)

    let me know if you are interested in seeing it.. ? as a scientist. I think you would find it fascinating.. basically a 10 year old can now be taught to build data warehouses. ;-)

    send me your external email and I will send you more info if you like, and yes this is going to market as we speak ..

  2. I couldn’t agree more with the points you make. Consider this however. NoSQL may initially have been adopted by web companies but increasingly EVERY company is a web company. By that I mean that data-centric web and mobile applications are increasingly important to virtually EVERY business and many industries are being disrupting by new business models and approaches that have web and mobile applications at their core. Think about what is happening in the education industry (Coursera, Khan), the rental car industry (Zipcar), the CRM/HCM industries (Salesforce, Workday), even the retail industry where companies like Walmart (enterprise company) have to compete against the likes of Amazon (web company). These web and mobile applications have vastly different requirements than those of the past and developers and operations people are quickly coming to the conclusion that NoSQL technologies are a better fit for these requirements. The NoSQL adoption that started with “web companies” is very rapidly spreading to “enterprises” for whom web and mobile application strategies are increasingly crucial. 15 years from now most of the applications in world will be of this type — and NoSQL will grow right along with this transition. By is why we are seeing a rapidly increasing number of Oracle, IBM, and Microsoft customers switching to NoSQL/Couchbase.

    If you are interested, you can read more about my thoughts on this at