Weekly Update

Webscale Databases: Is Open Source Really Necessary?

I read with interest this week an article suggesting that when it comes to deploying databases – or any infrastructural pieces, really – at web scale, many large sites opt to “go cheap, go custom or go home.” Given their unique needs, this credo makes sense, but I wonder if the companies following it aren’t making more work for themselves than is necessary. As once-uncommon requirements become commonplace, proprietary products are emerging to address them. Might the resources spent developing open-source projects or building tools from scratch not become extraneous if companies could buy solutions that would work just fine?

Isn’t it plausible that a proprietary vendor – Oracle, let’s say – could launch a webscale database or analytics solution that would do the trick for a company like Facebook? If there’s one thing Larry Ellison knows better than relational databases, it’s how to make a buck. Hypothetically speaking, Oracle (especially staffed with former Sun and Cloudera engineers) could offer database and data-analysis solutions that could save a company like Facebook from having to act like a software company itself. It certainly hasn’t hesitated to buy its way into alternative markets in the past.

Look at its 2007 purchase of in-memory-data-grid vendor Tangosol, whose product now is called Oracle Coherence. The online transaction-processing (OLTP) industry was moving to lower-latency, more-scalable architectures than Oracle’s tiered database could provide, so Oracle swooped in and bought the market leader. On the analytics front, there already are murmurings (from one reporter, at least), that Oracle might snatch up Cloudera to buy its way into the Hadoop market. This possibility isn’t so outlandish considering that Oracle bought Cloudera CEO Mike Olson’s previous company, Sleepycat Software (and still offers its open-source and non-SQL Berkeley DB).

Another consideration is where web companies draw the line regarding commercial solutions: Is an open-source, but subscription-based vendor like Red Hat, out of the question? Red Hat is working on at least two projects – Cloud Filesystem and Infinispan – that speak directly to the data needs of webscale companies. And speaking of file systems, the market for webscale file systems is filling up fast, thanks in part to startup companies like Appistry, which this week launched its CloudIQ Storage product, complete with a Hadoop edition. What about the growing list of startup memcached vendors, like NorthScale, built from the ground up to meet the data-serving needs of demanding web applications? What about the aforementioned Cloudera, should it decide to follow its proprietary plans? Certainly, it can’t be said the teams behind these companies don’t understand the needs of web-based companies like Facebook or Twitter.

I’m not suggesting that Facebook et al. are wrong in their open-source-or-homegrown-only approaches, or that there is a glut of proprietary products on the market, only that it’s not inconceivable that vendors could meet their needs. In the name of integration and customization, maybe this means non-open-source vendors will have to open up certain parts of the code, or maybe it means standards must emerge to prevent lock-in. Whatever the case, though, this isn’t the early days of the web, and web companies should not unnecessarily commit themselves to rolling their own code.

As eBay’s Paul told Computerworld back in 2006, “For the long run, we don’t believe that building IT tools and management tools is our core competency. Today, it is something we have to do because we have no choice. In the long run, we would rather be buying off-the-shelf solutions.”

Question of the week

Which webscale IT vendors might find homes within large web companies like Facebook?