The big data world is full of small, scrappy startups using their ingenuity to build complex systems out of open source software, but the Walt Disney Company is not one of them. Here’s what goes into building a big data platform in a Fortune 100 company.

Disney is a massive company, but when it comes to its big data platform, the entertainment conglomerate looks a lot like a startup. Kind of, that is. By the sheer power of its will (and ingenuity), a small team has been able to craft a large custom platform out of Hadoop, NoSQL databases and other open-source technologies. But for better or for worse, doing big data at such a large company means playing by a different set of rules.

When it came to putting a big data platform in place, Arun Jacob, director of data solutions in the Disney Technology Solutions & Services group, told a room at the IE Group Big Data Innovation conference in Boston on Thursday that Disney chose to build something from scratch rather than buy software from a large vendor. Cost certainly played in a role, but really it was flexibility that made the decision.

Reduce, reuse, recycle

In order to provide the most value to the company, Disney’s big data platform has to be everything to everyone, which it turns out is a tall order. Initially, Jacob said, “We treated ourself like a small consulting organization and we had something to sell.” When a division wanted it to use the platform for a particular function, Jacob would say yes and then get busy actually figuring out how to build it.

Architecturally, it’s all about being able to recompose the path data takes through the platform and the components that are used for each particular purpose, or being able to easily replace pieces altogether if something better comes along. The Disney platform has a foundation of Hadoop, Cassandra and MongoDB complemented by a suite of other tools for particular use cases. The operations team uses the platform to view, analyze and index error messages, while another division runs a recommendation engine on top of it. Application developers get the high-throughput, low-latency data access they need, while the analytics team has the higher-latency data access it requires.

However, although Jacob wanted to keep costs down with open source software, he did have a luxury that most startups don’t — a budget for outsourcing and the occasional product. When he needed support with a Hadoop cluster, he could call Cloudera. When an implementation of Solandra (an open source search engine built atop Solr and Cassandra) tipped over under the weight of Disney’s scale, he bought the enterprise edition of DataStax’s Cassandra-based product (Solandra’s creator had since taken a job with DataStax and was expanding upon Solandra’s capabilities in DataStax Enterprise).

Flexibility isn’t free

The Solandra incident actually underscores the tradeoffs that come when you use free open-source software and don’t reach for the checkbook at any sign of trouble. “You pay for [open-source projects] late at night, you pay for them by learning to run them, you pay for them by reading people’s source code who even if you could read it, it still doesn’t make any sense,” Jacob said. But those things can be overcome if you’re willing to put in the time.

And at a company the size of Disney, those problems — and whole lot more — have to be overcome. For example, Jacob explained, you can fudge your way around things like fault tolerance, high availability and security when you’re standing up a deployment, but you do have figure out a way to achieve those things eventually.

Ready for mass consumption

You also have to make systems built on open-source software consumable by everyone who needs to use them. That means it’s not enough to just build a scalable and stable system; the system also has to be easy enough for thousands of internal developers of all types and all skill levels to use. In a six-person startup, Jacob said, it’s easy enough for everyone to just learn Hadoop in a month and then start using it, but that’s not the case in a large enterprise.

So his team made it easy.

In order to “remove the excuses” for business users not loading their data into the system, they just need to point the custom-built user interface at their files. (Disney’s platform is growing at 5TB a day, and there are still many other types of data it needs to house, Jacob said.) Because they’ve built wrappers around the technology, Jacob’s team doesn’t talk about Hadoop and MongoDB to internal users, only about analytics and queries. It built client frameworks in a bunch of programming languages so developers can interact with the platform without writing RESTful API calls.

In some cases, the team decided to hide the platform’s complexity from users; not to facilitate its use, but to keep loose-cannon developers from doing something crazy that could take down the whole cluster. It could show them all the controls and knobs in a NoSQL database, but “they tend to shoot each other,” Jacob said. “First they shoot themselves, then they shoot each other.”

Still, after all the work he put into building Disney’s big data platform, it’s not exactly a process Jacob is hoping to repeat as the platform evolves. The tools for managing big data are getting better, he said, so he still does a build-versus-buy analysis when it’s time to make a change. Building custom tools is fine when you don’t have a choice, but it’s not always wise when buying something could save untold man-hours and headaches.

Update: DataStax has informed me that the slides previously linked to here have been removed. If you want more technical details on Disney’s big data platform, a slide deck Jacob’s recent presentation at the Cassandra Summit is available here.

Feature image courtesy of Shutterstock user Scott Cornell.

You’re subscribed! If you like, you can update your settings

  1. Mal – Short Breaks Sunday, September 16, 2012

    This ain’t no mickey mouse company

  2. Just as I expected….

  3. It takes right management decision to decide which products, when developed indigenous can yield profit :) Good one Disney!

  4. I know Disney is in Development/Deployment of OpenStack using Ubuntu Servers. Not sure if that is related to what is described in this article though.

  5. A very enjoyable read.

  6. Maristela Guimaraes Monday, September 17, 2012

    Nice reading…it is always good to know how things really are…


    1. What you mean?

  7. Nice read

  8. I think it is funny to call this a start up budget. When you have like 50 engineers working on this, and another 200 people throughout the company also doing cloud initiatives that the “architecture” team can focus on, this is NOT a start up. Give me the resources of Disney, and I am quite sure I could get a system created in less than the three years it too them :). I do not mean to downplay the end results – it is clever, and I wish Disney nothing but success. But let’s not get carried away with the low cost start up talk, shall we?

    1. Actualky, closer to <10% of what you suggest as resources. It was a skunk works program that fought for its life regularly and only grew organically… A rare success is a large company.

      1. Considerably more really – they had a dedicated Ops team to manage, so folks could focus on dev – you also had multiple, very large teams through Disney (Read: ESPN) doing a ton in the cloud, and informing them of results, and also contributing organically. I understand this was a cool success – all for it – but to think in start up terms for budget just does not ring true

  9. Great article… but the link to slides isn’t working?

Comments have been disabled for this post