Netflix open sources its Hadoop manager for AWS

GenieLogo

Netflix runs a lot of Hadoop jobs on the Amazon Web Services cloud computing platform, and on Friday the video-streaming leader open sourced its software to make running those jobs as easy as possible. Called Genie, it’s a RESTful API that makes it easy for developers to launch new MapReduce, Hive and Pig jobs and to monitor longer-running jobs on transient cloud resources.

In the blog post detailing Genie, Netflix’s Sriram Krishnan makes clear a lot more about what Genie is and is not. Essentially, Genie is a platform as a service running on top of Amazon’s Elastic MapReduce Hadoop service. It’s part of a larger suite of tools that handles everything from diagnostics to service registration.

It is not a cluster manager or workflow scheduler for building ETL processes (e.g., processing unstructured data from a web source, adding structure and loading into a relational database system). Netflix uses a product called UC4 for the latter, but it built the other components of the Genie system.

genie-arch

Netflix first discussed Genie in January, when it showed off the company’s overall Hadoop architecture within the AWS cloud. While Genie is near the top of the overall stack, the foundation is interesting, as well. Rather than maintaining a massive set of instances (or multiple separate ones) running the Hadoop Distributed File System, Netflix uses Amazon’s S3’s object storage service as its big data bit bucket, so all of its Hadoop jobs access the common, reliable data store.

nflxhadoop

As with the rest of Netflix’s numerous open source projects on top of AWS — it runs the entire streaming business on the platform — it’s hard to gauge how much traction they’ll pick up or what kinds of products they might inspire. Netflix Cloud Architect Adrian Cockroft has told me he’s fielding inquiries from quite a few large companies and organizations that want to build their own internal Netflix cloud platform as a service, essentially. Smaller companies are adopting these tools, too, although it can be difficult to track who exactly is accessing the code from Github and what they’re doing with it.

AWS might get inspired to build on the Netflix code, or at least take a lesson from it. In the Hadoop space alone, Elastic MapReduce is a pretty low-level services, but Netflix’s Genie makes it more akin to higher-level offerings such as Altiscale, Qubole, Infochimps, Continuuity and Mortar Data. AWS might be fine selling standard Lego blocks, as Cockroft described most AWS services (in fact, some of the aforementioned services run on AWS), but there’s a lot of money to be made selling those Stars Wars kits that add polish to the original.

loading

Comments have been disabled for this post