“Hadoop is stupid.” There you have it, from the mouth of Precog Founder and CEO John De Goes. No, he’s not suggesting companies abandon the world’s de facto big data platform, just that they avoid a whole lot of complexity by using Hadoop where it’s useful — relatively simple analysis of very large data sets — and give something else a try for other analytic tasks. Something like his company’s new cloud service, perhaps.
Having built the data environment for LivingSocial in his prior life, De Goes speaks from experience about the complexity of doing big data. Right now, he explained, companies have to use an array of disparate tools with overlapping capabilities — Hadoop, relational databases, data warehouses and business intelligence to name a few — and then cobble them together into a system that tries to be everything to everyone. Analytics becomes an engineering problem, new tools are forced to fit into old systems and everything suffers because of it.
Precog, on the other hand, is a custom-built platform designed to make child’s play of complex analytics. It accepts data from anywhere — Hadoop, databases, APIs, you name it — then lets users enrich the data and analyze it using either a REST API or the company’s interactive development environment called Labcoat. The latter is based on an open source programming language called Quirrel, and includes a tutorial on how to use the language, including how to perform relatively advanced functions such as sentiment analysis, predictive models and machine learning.
No Hadoop here
It would be easy to mischaracterize Precog as a company trying to do away with Hadoop, but that’s not quite the case. Hadoop is great at performing simple analyses in a batch process, De Goes said, and it’s a fantastic data store if you have loads of data (Precog will gladly ingest data from Hadoop). He just sees a problem with the vision of Hadoop as the foundation for all big data efforts, upon which all other tools are built and with which all other products must integrate.
People might want a kitchen-sink type of platform, he said, but big data, like life, is about tradeoffs. For example, he noted, one device hasn’t emerged to dominate our personal computing needs, so we’re still carrying around tablets, smartphones and laptops because they all excel at certain jobs.
Carr points to the reliance on Hadoop as a big point of distinction between Precog and many other startups trying to democratize data science skills. “There’s very good money to be made making incremental improvements on the status quo,” he said, and that’s what others are trying to do by essentially putting a friendly veneer on top of Hadoop and other open source projects, such as R for statistical analysis.
The uphill battle
You can’t fault their chutzpah, but betting against Hadoop as a viable big data platform isn’t necessarily wise. It has a large community of developers, and investors and software vendors alike have both spent a lot of money on projects to turn Hadoop into more than just MapReduce. Yes, there’s a little anti-Hadoop sentiment floating around, but Silicon Valley is still full of really smart people with designs on productizing Hadoop and its ecosystem of open source projects even further.
It’s a bit crazy to think a 2011 TechStars alum that just entered public beta (and is currently only effective on single-terabyte data sets) and is headquartered 1,000 miles from Silicon Valley, in Boulder, Colo., will be the one to sheperd the world toward a new big data platform. Heck, even Microsoft bent to the will of the people and abandoned its Dryad big-data platform in favor of Hadoop.
But then again, the Precog team seems to like a challenge. “We’re swinging for the fences,” COO Carr said. “We’re not trying to hit that good, solid double.”
Feature image courtesy of Shutterstock user kangshutters.