2 Comments

Summary:

Open source data warehousing models have a lot of advantages, the ability to scale horizontally and cheaply among them, but traditional warehousing techniques have their strengths as well, said Vipul Sharma, principle software engineer and engineering manager at Eventbrite, at Structure:Data.

Vipul Sharma of Eventbrite at Structure:Data 2012

Vipul Sharma of Eventbrite at Structure:Data 2012

(c) 2012 Pinar Ozger. pinar@pinarozger.com

Open source data warehousing models have a lot of advantages, the ability to scale horizontally and cheaply among them, but traditional warehousing techniques have their strengths as well, said Vipul Sharma, principal software engineer and engineering manager at Eventbrite, which itself manages terabytes of event and user information and uses a combination of MySQL and Hadoop databases, at GigaOM’s Structure:Data.

Open source solutions like the Hadoop distributed file system and HBase NoSQL database are justifiably the hot platforms in data warehousing, but MySQL and other more traditional enterprise data warehousing may still be optimal if a company has a good understanding of where it needs to scale and has big focus on security and reporting.

Hadoop’s open source community still hasn’t worked out the security kinks, and building the dashboards around reporting tools in Hadoop and Hbase is quite difficult, Sharma said. Security and reporting in enterprise data warehouse platforms are much better developed, he said.

Sharma also cautioned companies that they must be willing to commit manpower to support their Hadoop clusters, otherwise they’re better off using a traditional warehouse solution.

“Since [Hadoop and Hbase] are very new, the entire code base has to mature a lot,” Sharma said. “So you end up having to do a lot of debugging yourself. You end up getting into the code base to understand what’s going on. … If you’re not willing to invest in a team around it, then probably Hadoop is not the best for you because the ecosystem is still maturing.”

Related research

Subscriber Content

Subscriber content comes from Gigaom Research, bridging the gap between breaking news and long-tail research. Visit any of our reports to learn more and subscribe.

You're subscribed! If you like, you can update your settings

Related stories

  1. The Hadoop and EDW worlds are merging: there are ‘traditional enterprise’ data warehouse tools and there are next-generation analytic tools.

    Arguably Netezza was the first DB optimized for analytics. Then came row-column hybrids like Aster Data and Greenplum. Then came true column-stores like Vertica and InfiniDB, which are an order-of-magnitude faster.

    Over the next year or two, look for the Hadoop and column-store worlds to become even more tightly integrated.

    @akibalogh

    Share
  2. Eventbrite? You mean the company whose servers couldn’t handle the load when selling tickets to a small festival (Googa Mooga)?

    Share

Comments have been disabled for this post