Hadoop: “It’s damn hard to use”

Session Name: Solving Big Data App Developers Biggest Pains.


Todd Papaioannou


Good morning. So, been quite dry talks so far this morning. I’m going to try and liven up a little bit. Thanks for turning out so early on the–after probably a big night for everybody. So, why–the developer is having a problem. Let’s take a look at some of the pain points I’m going to give you some ideas of what I’ve learnt over the course of my career. Let’s get this thing working. So, who am I? Some of you have been coming here for a couple of years. You’ve heard me talk a little bit before. I’m a recovering Cloudaholic. I used to be the Chef Architect at Yahoo. And I learned a lot from there. So, it’s a pretty darn big Cloud, right. 400, 000 notes in the Cloud. We’ve only got 45, 000 Hadoop service, and my team that I worked in Cloud Platform group. We were the potential technology group that really provided the technology platform for the whole company.


There’s like 1, 200 people in that group. And 120 were working on Hadoop. That’s a lot of people working on Hadoop, right. Just to actually getting Hadoop working for the company. We made a strategic decision that we were going to do all of our data stuff on top of Hadoop. But, it still meant 5, 000 other developers out there who were trying to use Hadoop, trying to build applications on top of Hadoop, trying to deploy them. And so, through the course of that experience I learned a few lessons I want to share with you guys this morning, right. What are they? One, Hadoop is hard, right. This makes no bones about it, we’ve heard a lot about Hadoop over the last few years. Specifically here we’ve been hearing a lot more about it. It’s dam hard to use. It’s low-level infrastructure software.


Most people out there, they’re not used to using low-level infrastructure software. You don’t program against the language coded in any more. There’s a bunch of liabilities that go around it, right. So, what does that mean for us? Well, what we want to do is we wanted a stable platform that we could have all of our application developers develop applications against, and what did we get? Actually a bunch of redneck architecture, what happen every team out there basically took some bits from here and there and cobbled it together, right? And that was actually a problem for us as a company, right.


What wanted to do is deliver a slick user experience to all of our developers out there, the 5, 000 people out there. But, what happened, they really got the home brew computing kind of experience, where they had to cobble something from here, take a library from there, something from there, put it together. And that had a big impact on us a company, right. Where did that actually lead us to? Alright, well, it took forever to launch an application. It was a whole process, three, four, five, six months to actually get something from concept out. Now, that’s a big problem. Because what it meant was innovation stalled out for us, and in case you guys haven’t been paying attention, Yahoo may have had a few product issues from time to time, from an innovation stand point, right. I think you’ve probably been reading [inaudible] just like we all have.


So, why did that happen? Well, there’s key kind of set of pain points that I’ve really thought about, from that experience. One, just getting and setting up your big data infrastructure is hard. And then go through all of the [inaudible] vendors and they’re going to make it easier to install it. It’s not that simple, you still have to pay them through the nose for some consulting. Assuming that you can get through that point, then what? You got a program against those low-level API’s, developer API’s, infrastructure API’s not developer API’s. And so, that’s tough.


And let’s say you managed to build an application, you want to get it on to Hadoop, you want to actually run it scale it, monitor it. There’s really no good tools out there for that. And you don’t even people who are able to do that either, there’s a huge lack of development talent out there. Everybody in the valley and everybody around the country scrambling around for those few very rare Hadoop kind of experts, or quote unquote experts who’ve taken the certification program for a day. That’s a big ole problem, right?


So, in the last part, this is really idea for Continuuity came. Continuuity’s one the–I’m one founders here at Continuuity. And it’s based out of the ideas of how do we actually make the life of the developer a lot simpler? How do we enable development of applications? So, our focus as a company is maniacally looking at developers. With our applications and Hadoop echo system, the echo systems going to die. It’s just going to become a storage engine that sits underneath Teradata or Oracle. We need apps to actually enable the whole thing to survive. So, what we’ve done is created the first–the industry’s first application hosting fabric, you can think of that as the next generations apps server. The layers over the top of Hadoop and Hspace, right?


We give you fantastic interfaces and fantastic tools for building apps. So, they say a pictures worth a thousand words, well I’ve got a video. Well, I’ve got a video so that must be worth a million. So, we’re going to take through a little life cycle here of what it’s like to be developer with Continuuity. We offer a Cloud service, you can come to our website, and you can sign up on the website. You can go there now. All of you sitting on your laptops go to it right now. [inaudible] And you can download the developer’s suite which is a free to use software tool kit for building apps. And you can provision the sand box in the Cloud. All right this is a full apps stack. This is running, Hadoop, and H base, and all of our application fabric on top of it, and spin that out.


It takes a little while. So, let’s take a look what you’ll be like as a developer. I’m actually running up the local version now, the single node version the app fabric. And I can run that locally, so I can run it on my laptop. We’ve built an emulation there for the Hadoop and H base that sits underneath our API. So, super, super lightweight, you run the app. Then what I can do, is can connect to the user interface locally. We have a very, very slick user interface that really focuses on how do you actually understand what’s going on with your application. So, here I’m going to create an app, like all good cooking shows I’ve got a couple I created earlier. These are actually one of the samples, simple drag and drop deployment of your application. No messing around with getting stuff onto Hadoop, just drag and drop, right. And I’ve actually deployed my application now onto the local node. And I can actually look at the application see what’s going on with it. This is a very simple example. It just came out of the SDK.


So, assume I’m actually working on something real, right. I’m a debunker. I’m a developer. I’m sitting in my IDE, and I want to use my IDE to actually write code and actually understand what’s going on with it. And actually debug my application. Think about how different this is to the normal layer of debugging application on Hadoop, which is deploy it to Hadoop, let it run get a bunch of log files, bring it back and hopefully get through the log files and see what happened. Developers don’t work like that. They weren’t working like that in the J2E world, what they were doing they were actually connecting the debugger and stepping through it. That’s how you actually how you want to be able to debug your applications. So, that’s what we’re going to do now. Got a different application, deployed it out there. And then I actually start sending in some sample data, so you can see the data start to flow through the application.


The user interface that we’ve developed allows you to introspect inside your app, to understand what’s going on with it, to understand the performance characteristics of it, and drill into it. And what I want to do is debug. So, I’m just going to get into my Eclipse ID, like you do all day, every day, and set a debug point. And at some point we’re going to hit that debug point. And I can go introspect what’s going on with my application, prove that’s its working correctly. There’s no sending it off to Hadoop, there’s no hoping that it’s going to work. Now, when my applications down, what do I want to do with it, I actually want to push it into the Cloud. I don’t want it to run on my laptop, I want to run it up into the Cloud. We have a very, very simple mechanism for doing that. I can hit– push the Cloud button and it’s actually going to launch that up into the Cloud. Think about how different that is from application life cycle management. Traditionally you want to go from testing–development to test to production. And that’s a big, big pain with Hadoop right now.


There’s a whole bunch of stuff that you need to actually do. What we’ve created is a very, very simple mechanism that just allows you to hit a button and push to Cloud. And we shoot your entire application up and promote it through the life cycle. So, I’m running this in my sand box. But, look the sand box is not powerful, right? And so, there’s actually a bunch of issues going on with my app. So, what I want to do, is I want to put an into production application. Come to our website. Decide how big a production cluster do, I want here. And then we go for he medium one. I’m actually–I should be use–big enough for my application. So, you can come in, swipe your credit card, you’ve already preloaded there. And deploy a full production cluster, all in the Cloud, self-service.


It takes a little while for that, so that’s what we had the grey out. Now, the same application that I was just working on in my developer sand box, I want to be able to promote that into my production, my production Cloud. Again think about how different this is from moving your applications around in your environment, how you do that now. So, now I’ve pushed my application up to my production Cloud. And you can see that it’s actually running there. What I’ve showed you so far in this video is how simple it is to actually build an application to debug it locally, to be able to move it up and down through production tiers.


Now, in the production environment I can also go in and say, you know what? This applications running a little bit slow, I want to scale up resources. And what I did there was click on a little push button to increase processing power. And it’s elastic scalability at run time. I never too the application down, didn’t have to do anything to the app. It’s actually running in real time. Think about how different that is from a developer experience. And so, what we focus on as a company is how do, we actually make it super, super simple, for people to actually build applications. Before and after, right? You’re not spending months of time with dollar and opaque’s to setup Hadoop from–it’s actually self-service. And you can be up and running in minutes. Just click on the user interface, instead of worrying about low-level API’s we’re actually implementing it. It’s high-level API’s and their SDK.


We pressure ourselves in allowing people to be able to build applications in an afternoon or a day. And deploy them to the Cloud, and see some value out of them. We support drag and drop deployment. Very, very different experience, we support online [inaudible] in debugger, debugging. Pretty much at this point our goal is to make any Java developer, big data application developer. And so, we’ve got a job the developers say, that ruby on rails train passed you by, don’t worry. We’ll turn into a big data developer. And you’re career is resurrected. So, what do we offer, you can go to our website, you can sign up now for the developer’s suite. And the developer’s sand box, please go and hit the site. And then coming in Q2, where in private availability right now. But, it’s the private Cloud versions where we run and host and operate everything for you in the Cloud. Or the on premise version, of the fabric we can and deploy it behind the firewall. And you can scale it out across your existing Hadoop and H based cluster. So, you can actually make all of your programmers and all of your existing Java programmers, Java developers have become much more productive, with how you actually monetize your big data applications. Any questions? You sir wake up. I know it’s a late night last night, but… [chuckles]. Alright good, well I’m over time. Thank you very much for your time this morning.