6 reasons why 2012 could be the year of Hadoop
Hadoop gets plenty of attention from investors and the IT press, but it’s very possible we haven’t seen anything yet. All the action of the last year has just set the stage for what should be a big year full of new companies, new users and new techniques for analyzing big data. That’s not to say there isn’t room for alternative platforms, but with even Microsoft abandoning its competitive effort and pinning its big data hopes on Hadoop, it’s difficult to see the project’s growth slowing down.
Here are six big things Hadoop has going for it as 2012 approaches.
1. Investors love it
Cloudera has raised $76 million since 2009. Newcomers MapR and Hortonworks have raised $29 million and $50 million (according to multiple sources), respectively. And that’s just at the distribution layer, which is the foundation of any Hadoop deployment. Up the stack, Datameer, Karmasphere and Hadapt have each raised around $10 million, and then are newer funded companies such as Zettaset, Odiago and Platfora. Accel Partners has started a $100 million big data fund to feed applications utilizing Hadoop and other core big data technologies. If anything, funding around Hadoop should increase in 2012, or at least cover a lot more startups.
2. Competition breeds success
Whatever reasons companies had to not use Hadoop should be fading fast, especially when it comes to operational concerns such as performance and cluster management. This is because MapR, Cloudera and Hortonworks are in a heated competition to win customers’ business. Whereas the former two utilize open-source Apache Hadoop code for their distributions, MapR is pushing them on the performance front with its semi-proprietary version of Hadoop. This means an increased pace of innovation within Apache, and a major focus on management tools and support to make Hadoop easier to deploy and monitor. These three companies have lots of money, and it’s all going toward honing their offerings, which makes customers the real winners.
3. What learning curve?
Aside from the improved management and support capabilities at the distribution layer, those aforementioned up-the-stack companies are already starting to make Hadoop easier to use. Already, Karmasphere and Concurrent are helping customers write Hadoop workflows and applications, while Datameer and IBM are among the companies trying to make Hadoop usable by business users rather than just data scientists. As more Hadoop startups begin emerging from stealth mode, or at least releasing products, we should see even more innovative approaches to making analytics child’s play, so to speak.
4. Users are talking
It might not sound like a big deal, but the shared experiences of early Hadoop adopters could go a long way toward spreading Hadoop’s utility across the corporate landscape. It’s often said that knowing how to manage Hadoop clusters and write Hadoop applications is one thing, but knowing what questions to ask is something else altogether. At conferences such as Hadoop World, and on blogs across the web, companies including Walt Disney, Orbitz, LinkedIn, Etsy and others are telling their stories about what they have been able to discover since they began analyzing their data with Hadoop. With all these use cases abound, future adopters should have an easier time knowing where to get started and what types of insights they might want to go after.
5. It’s becoming less noteworthy
This point is critical, actually, to the long-term success of any core technology: at some point, it has to become so ubiquitous that using it’s no longer noteworthy. Think about relational databases in legacy applications — everyone knows Oracle, MySQL or SQL Server are lurking beneath the covers, but no one really cares anymore. We’re hardly there yet with Hadoop, but we’re getting there. Now, when you come across applications that involve capturing and processing lots of unstructured data, there’s a good chance they’re using Hadoop to do it. I’ve come across a couple of companies, however, that don’t bring up Hadoop unless they’re prodded because they’re not interested in talking about how their applications work, just the end result of better security, targeted ads or whatever it is they’re doing.
6. It’s not just Hadoop
If Hadoop were just Hadoop — that is, Apache MapReduce and the Hadoop Distributed File System — it still would be popular. But the reality is that it’s a collection of Apache projects that include everything from the SQL-like Hive query language to the NoSQL HBase database to machine-learning library Mahout. HBase, in particular, has proven particularly popular on its own, including at Facebook. Cloudera, Hortonworks and MapR all incorporate the gamut of Hadoop projects within their distributions, and Cloudera recently formed the Bigtop project within Apache, which is a central location for integrating all Hadoop-related projects within the foundation. The more use cases Hadoop as a whole addresses, the better it looks.
Disclosure: Concurrent is backed by True Ventures, a venture capital firm that is an investor in the parent company of this blog, Giga Omni Media. Om Malik, the founder of Giga Omni Media, is also a venture partner at True.
Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
Still early for Hadoop. Go to any job-board and contrast keyword search “Hadoop” with anything or pick Apache, Ruby, whatever. Hadoop is coming but ’12 feels early.
It’s definitely early, but I really do think the stage is set for a groundbreaking year. Apart from the reasons I listed, there’s also the large-vendor support: IBM, Microsoft, EMC, Oracle, NetApp, Dell … . Hadoop won’t hit 100 percent penetration or anything next year, but I think we’ll really see the market and the core technologies shape up into something enduring.
I guess it also depends on the new technology apatite of the company. My company still struggling with small data performance on oracle like systems. they are scared of trying new systems. Until they do not have much risk. I would love to see Hadoop rocking in coming few years.
Despite all the investor’s love i haven’t seen it loved so much by the developer communities. But you are right lately i have heard some good things about deploying applications that requires large clusters on Hadoop. It’s kind of mixed reaction from my end.
While Hadoop solves a real analytics problem…there is a danger that Hadoop is getting over-hyped and the VC investor mania gets too ahead of the technology capabilities. We have seen this way too many times in the past. In 2012, the Hadoop ecosystem needs to mature, stabilize and deliver for it to be more than a great science project. The good news is that big vendors are killing their competitive offerings and basically embracing Hadoop.
2012 being the “Year of Hadoop” doesn’t necessarily mean that it becomes ubiquitous. But is a reasonable expectation that during 2012 it’ll ramp so far up the J curve that it’s trajectory towards achieving ubiquity becomes realiably plottable.
Derek, while your point about the current hype of Hadoop is valid, some of the other metrics that you seem to be using to define success are just consequences of that hype, which creates the dangerous potential for circular logic.
Hadoop is still vastly immature, requires too many moving parts from other projects and third parties to create something remotely usable in a critical production environment, and it’s expensive to operate and maintain, requiring an army of Java MapReduce developers (not the typical Java developer, by the way) and significant external instrumentation, to make it reliable enough. Its real time data delivery capabilities are non-existent (although HBase could be a beginning, it is by no means comparable to a full fledged RDBMS system when it comes to capabilities).
You should really take a look at LexisNexis HPCC system: a very mature and critical production ready data intensive ETL system (Thor) pivoting on the beautiful ECL high level data oriented language. It comes with all the required instrumentation out of the box to run in 24/7 critical operations and it has an extremely efficient horizontally scalable shared nothing real-time delivery system (Roxie) which allows for highly sophisticated queries (comparable to the best RDBMS systems in the market). What is it not to like about it?
Flavio
Flavio,
I’m well aware of HPCC Systems. In fact, I’m doing a webinar with Armando on Wednesday. But as good as HPCC might be — and I’ve heard good things — Hadoop has all the momentum right now, and a large ecosystem in place.
Early but we’re in an age of big data which will accelerate it’s support and success. 100% penetration isn’t necessary – but getting more and more big players using it will push things along nicely for Hadoop and it’s users.
The server team here at Kik is loving it – loving it so much we’re looking for a sys engineer. Check it out – http://www.kik.com/careers or email erika @ kik . com
I have been really struck by how strong the third broader ecosystem support is: from the emergence of specialized Hadoop solutions companies like ours – Think Big Analytics, through to NoSQL, MPP database, ETL and BI vendors the enterprise integration needed to make Hadoop work is coming together very quickly. We are seeing a lot of companies kicking off production projects as the start of a data platform strategy.
Now-a-days, early and late has very narrow differences as on boarding infrastructure is available immediately with cloud.
Technologies like AWS MR along with reachable (big) data in cloud makes it even more feasible.
Now it is only the business users who feels that the huge data has some value and short implementation time to try out.
Already this started leading from bio ($), social ($$$) and user engagement ($$$$$) and enterprise adaption path along with private cloud infrastructure. For maturity, it requires few common packaged use cases where business users can learn and invest rather than dependency of critical path for implementation.
Hadoop as it stands today does not solve all problems it good for a selects set of problems which traditional DB may not be best suited for. The problem set that hadoop and mapreduce solves would be analytics of “Big Data”(very large data sets headge funds, meta data taging, Bio-Medical)
Reading the point/counter-point here reminds me of what people said about Linux in around 1997.
“Not quite yet”, “Too many moving parts”, “Needs to mature”, “Don’t count out $LEGACYPLAYER just yet…”
Sound familiar ?
There’s also the question of “Year of…” for who ?
Developers ? System Admins/Architects ? Salespeople ? VCs? Pundits who like to write “This is the year of $MYFAVORITETECHNOLOGY!” articles ?
If you are a technical person, this seems like the right time to pick up something like Hadoop, so that in two or three years when the chattering classes say “No, for sure this time, this really is The Year of Hadoop” and recruiters start carpet bombing job boards with positions requiring “2-3 years working with Hadoop”, you’ll actually have it.