4 Comments

Summary:

It’s no secret that Facebook stores a lot of data in Hadoop, but how it keeps that data available whenever it needs it isn’t necessarily common knowledge. Today at the Hadoop Summit Facebook Engineer Andrew Ryan highlighted that solution, which Facebook calls AvatarNode.

avatarnode

It’s no secret that Facebook stores a lot of data — 100 petabytes, in fact — in Hadoop, but how it keeps that data available whenever it needs it isn’t necessarily common knowledge. Today at the Hadoop Summit, however, Facebook Engineer Andrew Ryan highlighted that solution, which Facebook calls AvatarNode. (I’m at Hadoop Summit, but didn’t attend Ryan’s talk; thankfully, he also summarized it in a blog post.)

For those unfamiliar with the availability problem Facebook solved with AvatarNode, here’s the 10,000-foot explanation: The NameNode service in Hadoop’s architecture handles all metadata operations with the Hadoop Distributed File System, but it also just runs on a single node. If that node goes down, so does, for all intents and purposes, Hadoop because nothing that relies on HDFS will run properly.

As Ryan explains, Facebook began building AvatarNode about two years ago (hence its James Cameron-inspired name) and it’s now in production. Put simply, AvatarNode replaces the NameNode with a two-node architecture in which one acts as a standby version if the other goes down. Currently, the failover process is manual but, Ryan writes, “we’re working to improve AvatarNode further and integrate it with a general high-availability framework that will permit unattended, automated, and safe failover.”

AvatarNode isn’t a panacea for Hadoop availability, however. Ryan notes that only 10 percent of Facebook’s unplanned downtime would have been preventable with AvatarNode in place, but the architecture will allow Facebook to eliminate an estimated 50 percent of future planned downtime.

Facebook isn’t the only company to solve this problem, by the way. Appistry (which has since changed its business focus) released a fully distributed file system a couple years ago, and MapR’s Hadoop distribution also provides a highly available file system. In Apache Hadoop version 2.0, which underpins the latest version of Cloudera’s distribution, the NameNode is also eliminated as a single point of failure.

You’re subscribed! If you like, you can update your settings

  1. Avatar was a creation by James Cameron? Are you really that dense?

  2. Jason Edwards Thursday, June 14, 2012

    Why would they develop something that scales to that level with just a single point of failure and then leave it to end users to provide a work around? Surely this would be feature requested and developed as a core feature?

  3. Q3 technologies Thursday, June 14, 2012

    Amazing to see this technology is full use. Ryan’s done a great job at the conference clearly.

    1. Would you even have a clue what Ryan is talking about or are you leaving this comment just for the sake of it?

Comments have been disabled for this post