Open source data warehousing models have a lot of advantages, the ability to scale horizontally and cheaply among them, but traditional warehousing techniques have their strengths as well, said Vipul Sharma, principal software engineer and engineering manager at Eventbrite, which itself manages terabytes of event and user information and uses a combination of MySQL and Hadoop databases, at GigaOM’s Structure:Data.
Open source solutions like the Hadoop distributed file system and HBase NoSQL database are justifiably the hot platforms in data warehousing, but MySQL and other more traditional enterprise data warehousing may still be optimal if a company has a good understanding of where it needs to scale and has big focus on security and reporting.
Hadoop’s open source community still hasn’t worked out the security kinks, and building the dashboards around reporting tools in Hadoop and Hbase is quite difficult, Sharma said. Security and reporting in enterprise data warehouse platforms are much better developed, he said.
Sharma also cautioned companies that they must be willing to commit manpower to support their Hadoop clusters, otherwise they’re better off using a traditional warehouse solution.
“Since [Hadoop and Hbase] are very new, the entire code base has to mature a lot,” Sharma said. “So you end up having to do a lot of debugging yourself. You end up getting into the code base to understand what’s going on. … If you’re not willing to invest in a team around it, then probably Hadoop is not the best for you because the ecosystem is still maturing.”