Blog Post

Can Facebook or Twitter Spin Off the Next Hadoop?

Stay on Top of Enterprise Technology Trends

Get updates impacting your industry from our GigaOm Research Community
Join the Community!

Like most people, I suspect, I wasn’t too surprised to find out that Hadoop-focused startup Karmasphere has secured a $5 million initial funding round. After all, if Hadoop catches on like the evidence suggests it will, Karmasphere’s desktop-based Hadoop-management tools could pay off investors many times over. In some ways, though, the fact that Hadoop is mature enough to inspire commercial products means it’s yesterday’s news. Now, I’m wondering, which open-source, big-data-inspired product will be the next to launch a wave of startups and drive tens of millions in VC spending?

Big data has narrowed the gap between the needs of bleeding-edge web companies, their offspring and even traditional businesses. Hadoop has caught on across industry boundaries as an analytics tool for unstructured data sets, and it seems logical that other web-based tools will catch on in other parts of the data layer. In my weekly column over at GigaOM Pro (sub req’d) today, I took a look at the potential for Cassandra, which grew out of Facebook, and Gizzard, Twitter’s ill-named big-data baby.

Given its growing popularity and expanding functionality, Cassandra right now seems like a prime candidate. Rackspace has taken over its development reins, and its found varied applications within Digg, Twitter, Reddit, Cloudkick and Cisco to name a few. This diversity illustrates Cassandra’s versatility; it’s not just for the social media crowd. Furthermore, Cassandra graduated to a top-level Apache project in February, signifying the quality of the work done on it thus far and, most likely, a groundswell of new developers.

Twitter’s newly open-sourced Gizzard tool seems to have promise, as well. By eliminating some pain from the often difficult sharding process, Gizzard makes it easier to build and manage distributed data stores that can handle ultra-high query volumes without getting bogged down. Like Google, Yahoo and Facebook before it, Twitter has played a role in evolving how we use the web, and software developed within its walls should be a hot commodity for present and future Twitter-inspired sites and products.

Which do you think will take off?

Read the full article here.

Photo courtesy Flickr user zzzack

7 Responses to “Can Facebook or Twitter Spin Off the Next Hadoop?”

  1. Sorry, don’t call me stupid, but your article is not very easy to understand! OK, I stop exagerating (I quite got it in fact), but here’s the point I’m trying to make: Think about someone who’s just familiar with Facebook, Digg, Twitter and maybe Cisco digging into your article. They’ll try to figure out the meaning of all this by going through names like Hadoop, Reddit, Cloudkick, Gizzard, Karmasphere and Apache server. It would only make Cassandra more obscure in their mind. Believe me! We need a SIMPLE definiion (in 1 phrase) of the followings: 1) What is Hadoop? 2)How does this concept apply to Facebook and Twitter?

  2. Even if Hadoop does “catch on like the evidence suggest it will” (debatable), will investors really make much money on this? Hadoop is free and super-niche. You think there’s a big market for paid tools or support?

    • I think there’s potentially a very big market for paid Hadoop tools. Cloudera has been getting interest (and users) across a wide range of industries, and apparently has seen enough demand to be planning a proprietary offering. IBM is involved in at least six engagements for its Hadoop-based BigSheets product, too. And while the number isn’t too high, customers are in big-money institutions like pharma and financial services. Not to mention how Hadoop support is increasingly tied into large-scale analytics and data-warehousing products.

      These are industries and customers that don’t mind paying for high-performance tools, so why should Hadoop be any different? There are free cluster-management tools, too, but cluster software companies still make plenty of sales in these industries. Commercial solutions just make life easier.

      And although it’s early, Hadoop has potentially has a far broader market base than do other tools of its ilk. Most businesses will never need high-performance computing tools, for example, but everyone is getting slammed with increasing data volumes, and everyone can make use of free data stores made available on the web. If they want to build scalable file systems and/or run analyses against these data volumes and don’t want to learn Hadoop from scratch, commercial solutions are ideal. I think there is a lot of value to be derived from Hadoop-based tools, especially if the complexity is hidden from the user.

      Short question, long answer.