Summary:

HBase is a great option for developing big data applications, but it’s not necessarily easy to use. WibiData is addressing this by open sourcing a portion of its predictive analytics infrastructure that adds structure to data, followed eventually by a whole HBase development framework called Kiji.

Kiji resides in the lower left section

WibiData, the Hadoop-based user analytics startup from Cloudera co-founder Christophe Bisciglia, has open sourced part of its software stack that’s designed to make it easier for developers build big data apps on the HBase NoSQL database. Called KijiSchema, the technology is a Java API for adding schema to data flowing into HBase so that applications needing to analyze the data can actually know something about it.

As WibiData product manager Devjit Chakravarti told me during a recent call, KijiSchema essentially “takes the ‘No’ out of NoSQL.” What he means is that although NoSQL databases such as HBase are lauded in part because they can store unstructured data and don’t require rigid rules for data formatting like relational databases do, having some structure is actually necessary once you want to do meaningful analysis on it. That’s why some commercial products, such as Drawn to Scale’s Spire and Splice Machine’s Splice SQL Engine, already have built functional SQL databases on top of HBase.

Kimball speaking at Structure: Data in 2012<br />(c) 2012 Pinar Ozger. pinar@pinarozger.com

“If you can’t store data in an organized way, you can’t analyze it effectively,” WibiData Co-Founder and CTO Aaron Kimball explained. KijiSchema isn’t part of WibiData’s secret sauce around predictive analytics for user data, he added, but nothing gets done without it.

Here’s how Kimball describes how KijiSchema manages data in a blog post announcing the project:

“KijiSchema gives developers the ability to easily store both structured and unstructured data within HBase using Avro serialization. It supports a variety of rich schema features, including complex, compound data types, HBase column key and time-series indexing, as well cell-level evolving schemas that dynamically encode version information.

“KijiSchema promotes the use of entity-centric data modeling, where all information about a given entity (user, mobile device, ad, product, etc.), including dimensional and transaction data, is encoded within the same row. This approach is particularly valuable for user-based analytics such as targeting, recommendations, and personalization.”

Kiji resides in the lower left section

The coolest part for HBase developers or prospective HBase developers, however, might be that KijiSchema isn’t just code but is already pre-packaged any ready to deploy. WibiData has created what it calls the Kiji BentoBox — “a fully-functional HBase mini-cluster with KijiSchema on your machine with minimal configuration in under 15 minutes” — that’s available for download on Github.

KijiSchema is also part of a broader Kiji framework for HBase that WibiData plans to open source over the next year or so. People perceive HBase as being complicated to set up and having a steep learning curve, Kimball said, and his teams wants to make it more accessible and lower the barrier for getting started. The ultimate goal is to make the types of HBase applications that folks at Facebook, eBay and other large web shops are building something that any developer can do.

WibiData’s Omer Trajman, formerly VP of technology solutions at Cloudera, describes the ultimate Kiji framework as being akin what the Spring framework if for Java. Despite its complexity, “there are also tens of thousands of developers who have been able to figure [HBase] out,” he said, but learning it might take weeks of intensive training on learning the low-level guts of the Hadoop Distributed File System and other stuff. Why learn to build an enterprise Java application from scratch, Trajman asked, when you can just use Spring?

You’re subscribed! If you like, you can update your settings

Comments have been disabled for this post