Weekly Update

Don’t hold your breath for a single big data stack

The industry is eager to coin a LAMP stack equivalent for big data, but that’s not going to happen, and here’s why.

LAMP is the popular open-source software stack for creating Web apps and Web servers. It includes Linux, Apache, MySQL and PHP, and many of the largest Web applications in the world run on it today.

In the big data world, meanwhile, there are literally dozens of different open-source projects, all taking a shot at solving a piece of the big data stack. Hadoop, HBase, Cassandra, Hive, Pig, Sqoop, Oozie and ZooKeeper are just a few.

The commercial world would like it if everyone would settle on a standard way of doing things instead of creating an endless number of projects under the Apache Software Foundation. That way, it could sell more product, as customers would be less worried about buying the wrong thing.

But unfortunately big data is way too varied and complex to support a single architecture. And LAMP, by the way, was also an oversimplification. Users had many options: MySQL, Postgres, PHP, Python, Ruby, etc.

That said, there is value to the analogy of several open-source technologies at different layers that work well together and have traction. The difference today, versus when the LAMP stack was created, is that there are many more technologies to choose from that can be dynamically mashed together as we need them.

If you’re interested in this topic and other big data trends come along to GigaOM’s Structure:Data conference in New York City next week, on March 21 and 22.

Question of the week

Do you think there will be a single big data stack?