Summary:

Although it’s still a work in progress, 0xdata thinks it has the answer to the problem of doing advanced statistical analysis at scale: Build on HDFS for scale, use the widely known R programming language and hide it all under a simple interface.

shutterstock_107081264

There’s a trend afoot in the big data space to turn data science from black magic into child’s play, and one of the newest companies trying to pull of this technological alchemy is 0xdata. The bootstrapped startup, pronounced “hexadata,” is the brainchild of former DataStax engineer, and Platfora co-founder, SriSatish Ambati, and it’s trying to blend Hadoop, R and Google BigQuery into the ultimate tool for statistical analysis. Scientists, data analysts or whoever ultimately uses the product only need to be experts in their domains, not in statistics.

At its core, oxdata’s flagship product, called H2O, is a statistical analysis engine that uses the Hadoop Distributed File System (HDFS) as its storage platform, but the goal is to make it as simple as using a Google service such as BigQuery. Users will interact with H2O via a simple web-search-like bar and standard R statistical-analysis syntax, but H2O will run machine-learning algorithms behind the scenes. Alternatively, users can call out to H2O from Microsoft Excel or the RStudio integrated development environment using a REST API.

Although BigQuery is a SQL service hosted by Google, 0xdata follows a similar theory on simplicity.

However they choose to leverage the product, Ambati said, the scale of the underlying data and the complexity of running advanced analysis are details that need to be hidden. It’s the same theory that underlies Platfora, the company Ambati co-founded last year with his former DataStax colleague Ben Werther, although their approaches appear to be different. Whereas Platfora is trying to disrupt the data warehouse market by building a next-generation user experience atop Hadoop, 0xdata is trying to change the way users interact with popular statistical software such as R.

But either way, Ambati says of new data-analysis products, “[There are] no bragging rights for making it simple. If you don’t do that, you won’t be able to go forward.”

oxdata is also putting a focus on speed, both in terms of how fast it processes data and how fast it lets users react. Google search changed our thinking around how many questions people can ask successively, Ambati explained, and data analysts should have the same experience. That’s why H2O provides approximate results at every step in the analysis process. Rather than wait for the entire job to run and the exact results to be computed, users can get a general idea of results and kill the job and start over quicker if they’re completely outside the expected range.

But it will be a while before the public gets a chance to see whether H2O lives up to its promises. Ambati said the product is just four months into development and won’t have its first set of algorithms available for another few months. His team of eight engineers has “built a lot of cool stuff,” but now it needs to round out the process and turn its code for H2O into an actual product.

Still, having decided to tackle data as a system, Ambati and his team are having a lot of fun. “We are live-and-die-with-infrastructure people,” he said, but for a bunch of folks who spent a lot of time learning math, it’s like going back to the their days as computer science students.

Feature image courtesy of Shutterstock user Bruce Rolff.

Comments have been disabled for this post