Hadoop vendor MapR and data center automation startup Mesosphere have created an open source technology called Myriad, which is supposed to make it easier to run Hadoop workloads on top of the popular Mesos cluster-management software. More specifically, Myriad allows the YARN resource scheduler — the linchpin of Hadoop 2.0 that lets the platform run processing frameworks other than MapReduce — to run on top of Mesos, effectively creating an auto-scaling Hadoop cluster that’s relatively future-proof.
“Before, you had to make a choice, and now you can just run YARN on Mesos,” explained Mesosphere founder and CEO Florian Leibert. “… I think the goal here is to have more workloads in a shared environment.”
What he means is that companies will no longer have to run Hadoop on one set of resources, while running the web servers, Spark and any other number of workloads on other resources managed by Mesos. Essentially, all of these things will now be available as data center services residing on the same set of machines. Mesos has always supported Hadoop as a workload type — and companies including Twitter and Airbnb have taken advantage of this — but YARN has appeal as the default resource manager for newer distributions of Hadoop because it’s designed specifically for that platform and, well, is one of the foundations of those newer distributions.
With Myriad, YARN can still manage the resource allocation to Hadoop jobs, while Mesos handles other tasks as well as the task of scaling out the YARN cluster itself. So instead of the current state of affairs, where YARN clusters are statically defined and new nodes must be manually configured, Mesos can spin up new YARN nodes automatically based on the policies in place and the available resources of the cluster.
Mesosphere engineer Adam Bordelon said Myriad works now and that eBay and Twitter have been testing it out. eBay actually contributed quite a lot to the first version of the code. However, he noted, Myriad still early in its development and needs quite a few more features, including around security.
“I imagine within a month or two,” he said, “it should be in production somewhere.”
Despite the fact that two commercial companies are driving Myriad at this point, Bordelon said the goal is definitely to build a community around the project. It’s currently hosted in the Mesosphere GitHub repository, but the team is currently working on a proposal to make it an Apache Incubator project.
“It is definitely a community effort,” he said.
Jim Scott, MapR’s director of enterprise strategy and architecture, said that Hadoop was pitched in part as a tool for eliminating data silos. However, he added, “As we start see those data silo walls come down, we’re starting to see other walls come up.” One of those walls is the relegation of Hadoop to its own dedicated cluster far away, logically at least, from everything else.
“This is the enabling function, in my mind,” he said, “that makes it so people can tear that wall down.”
MapR CEO John Schroeder will be among many speakers talking about the evolution of Hadoop and big data architectures at our Structure Data conference in New York next month. Others include Cloudera CEO Tom Reilly, Hortonworks CEO Rob Bearden, Google VP of Infrastructure Eric Brewer, Databricks CEO Ion Stoica and Amazon Web Services GM of Data Science Matt Wood.
And for more on Mesos, Mesosphere and why they have some engineers so excited, check out our May 2014 Structure Show podcast interview with Mesosphere CEO Leibert.