How Netflix built its OpenConnect cache to speed up your video streams

Transcription details:
Date:
21-Jun-2013
Input sound file:
1006.Day 2 Batch 4

Transcription results:
Session Name: Hacking the Cloud: Rethink Hardware, Rethink APP Delivery, Turn off your instances and save money/Closing Remarks

Joe Weinman
Derrick Harris
Gleb Budman
Stacey Higginbotham
Om Malik

Joe Weinman 00:02
This is good. I think this means a sooner cocktail hour perpetually. We’re going to wrap up the day with still a killer presentation because we have two major though leaders in the cloud. Andrian Cockcroft and Gleb Budman and for those you who don’t know Adrian in particular he’s a modest guy, but if weren’t for AWS I’m not sure the cloud would be where it is today. And if it weren’t for Netflix’s I’m not sure that AWS would be where it is. I’d to credit Adrian with a substantial portion of that. I think this is going to fascinating; maybe nobody has more put more thought into that. We’ve got a great panel. Glib has some great insights. Derrick is always full of tough questions for them. We’ll finish on a high note. Let’s bring out the panel. Derrick.

[applause]
Derrick Harris 00:58
So I think you look at the title, “Hacking the Cloud” you guys are kind of the– two of the preeminent cloud hackers – that sounds criminal in some way, if you will. I don’t know. Coming at it from– but I just wanted to give everyone knowledge doesn’t know you guys can both come at it from– just give your takes on this, because you both come at it from seemingly different angles. One of you looked at Amazon and said, “Hell no.” One of you said, all in. Gleb we’re going to start with you on what Backblaze is doing in your cloud hack if you will?
Gleb Budman 01:33
We started an online backup company about five or six years ago. For five bucks a month we do unlimited data. Our plan was to use Gleb Budman – we did the math it wasn’t going to work. So we built our own server, cloud the whole bit, and we figured we’d saved about a $100 million over the five-year period on cloud storage over using Gleb Budman. Derrick Harris 01:54 Just out of curiosity how much– what’s the Backblaze capacity at this point? Gleb Budman 02:00 Today we just crossed 60 petabytes of data stored on the cloud. We add about three petabytes every month. Derrick Harris 02:09 Adrian, Netflix’s? Adrian Cockcroft 02:11 We have a similar story on the CDN side, and we’re were actually inspired by Backblaze. We actually outgrew the commercial CDN vendors for the number of terabits we could deliver. We don’t have as many petabytes, but we have more terabits than anybody else. We’ve actually built our own hardware and shipped that out. We just outgrew the ability to buy the public capacity. The 60 petabytes is really more than you want to put on Gleb Budman anyway. You’re big enough that you need to do it yourself. There are certainly cases where you outgrow the ability to go buy stuff and you just have to DIY on it. Adrian Cockcroft 02:54 The infrastructure to that for us is, we make these boxes and ship them out to ISPs, and we have a few of them– there aren’t that many of them. We don’t have to go build$100 million data centers to put them in, that’s really a smallish number of boxes that can deliver the terabits of bandwidth that we need.
Derrick Harris 03:12
But in your case you say that you out grow the public services. We see other largish companies – Facebook’s of the world, right. Twitter there not using Amazon, right, are you at risk of outgrowing that?
I think you’re in the 100, 000 plus machines instances level, then yeah, you have enough capacity you need to run your own thing. Some of the big banks, Facebook, Amazon itself, Google, it all makes sense when you’re in that scale. We’re in the 10’s of 10, 000’s size thing, so we in sort of a grey area where we could run stuff ourselves, but we decide not to. It really comes down to if I have a $100 million I don’t want to spend it on a data center, which might make my life a bit cheaper in 6-10 months to a years’ time. I want to spend$100 million finding another house of cards to fund, or opening in a new country. We just announced yesterday we’re launching in Holland this fall. It cost quite a bit to go into a new country, if I have a $100 million I’ll go take over more of the world that’s a much better return to the business than making our computers slightly cheaper. Derrick Harris 04:25 IT as a savior as an innovation center, that’s unheard of. Gleb you guys have been doing this open storage since 2009, well before I think we ever head of open compute or some of the stuff Facebook is doing on– what was the thinking, aside from the cost at that point? Why open source? Why not just go we do storage really cheap and keep that the trade secret? Gleb Budman 04:53 Initially that was the plan. When we started in 2007 we built it for ourselves. We kept it very secret. We had a bunch of red boxes in the data center, and no one knew that except for the people who worked at Backblaze. A couple years later we started talking about whether we should share anything about that infrastructure and these huge 45 hard drive boxes that we designed. Initially we thought we might just tease it out a little, but we kind of had two goals. One was we didn’t intend to be in the box building business. We really intended to buy boxes from Dell or [inaudible] Hitachi or EMC or somebody. Gleb Budman 05:34 And all of those boxes where somewhere in the 1000 to$3000 per terabyte range. A drive was 100 bucks a terabyte and no one would sell us just an inexpensive chasse to take a hard drive and plug an ether net jack into the drive. We needed up building that, but we were hoping that the industry would evolve over time, and other vendors would do that. And at some point we would potentially get out of the business of designing and building and evolving this box at all. That was part of the idea of open sourcing, was hoping that a bunch of people would go, Hey this is kind of interesting maybe we’ll build it ourselves. We were hard to believe that that would ever happen.
Gleb Budman 06:22
We thought there aren’t that many people who about big red boxes, but a million people read that blog post now they’re probably half a dozen companies that build these boxes for various people.
Derrick Harris 06:38
Are you going to buy it from them?
Gleb Budman 06:39
We have a contract manufacturer that now builds the boxes. Originally we literary built these boxes on a table in the office – one-by-one. That doesn’t scale fantastically well. So, we got a contract manufacturer and said here’s a design build it for us. A number of people – a number of vendors do, and we buy them from one of the contract manufacturers. Partially we were hoping the community would pick it up and start running with it, partially it was a way to give back a little bit. We used Lenox we used Open SSO we use EXP4, we use a number of open source projects. We don’t contribute our code back, that’s prohibitory IP, but we felt the hardware we could contribute back to the community.
Derrick Harris 07:27
And Adrian you guys are open source?
Yeah. We were directly influenced. The team that built this– this actually illustrates something interesting, which is that you can go out and buy a generic box from major manufacturer, and they’ve built a box that they have to sell to lots of people, so they have to generalize it, they have to test it lots of different ways. We wanted a box which would do one thing. It had one workload it was much quicker and easier to build that ourselves. So we’re actually getting the building blocks to it – contract manufacturing is available, the motherboards are available.
We ended up with a team of about four of five people over a period a few months went from, “Yeah, we’re going to start doing it to having a finished product that running in production.” I used to work at Sun Micro Systems, and you can’t do anything. You’re still discussing the product review – the PRM whatever it is – you’re still trying to work through the waterfall. We were done with it. Then ended up hiring one of the– this is a networking box so it was inspired by the Backblaze box but a little different. It’s got much more bandwidth, and it runs BSD because a lot of networking gears runs BSD. We ended up hiring one of the BSD commenters to debug the device drive with stuff that you tend to run into when you’ve got SSDs and disks and 10-geared networks everywhere. But it’s a 100 terabytes of disk and a terabit of flash and a couple of 10-geared ports in a box. If you got to netflix/openconnect it’s right there it’s open source. People can go build their own CDNs.
Derrick Harris 08:59
I think it’s incredible to think about Sun in this regard through companies that got just– made money beyond belief, like in the .com era. And now it’s like you guys kind have gone, “If we’re going to buy hardware we’re going to build it, but by in large we’re not going to buy anything we’re just going to go to Amazon.” Do you guys see an era? Is there a way for hardware– is there a way for servers to make money at this point?
There’s no reason for us to have any enterprise software or hardware in our environment. I think we have a support licenses for Java maybe – occasionally. We call up Oracle and say, There’s a Java bug we’d liked fixed or something, but they take forever to do that probably. It’s basically we fix it ourselves, build it ourselves and you can do that now. Earlier this week we announced [inaudible] Ice, which is a cost-monitoring tool. It’s basically takes all of the detailed Amazon billing data and gives you a very fine grain analysis of where your costs are going. We’ve been using it internally for over a year.
One developer – she built that in two months I think, maybe three months. We just hired her. We said, Can you go build this? She copied some code we had for something else, A Groovy Girls Project. A year or so later we’ve open sourced it and the investment in that– there are startups and there are bigger companies doing these things We don’t mind doing the undifferentiated heavy lifting, in this case it was differentiated by the fact that it solves exactly the problem we had in exactly the way we wanted to solve it. It’s light lifting, it’s one engineer for a few months – that’s not a big deal.
We don’t want to do the undifferentiated heavy lifting of owning large buildings full of computers and hardware and having to hire people that know how to do air conditioning and things.
Derrick Harris 10:56
Adrian what do you think? I’m loving the storage.
Gleb Budman 10:58
I think one of the things we talked about we shouldn’t have had to build it. And 10 years ago 15-20 years ago there were a lot of people making their own desktop computers. You’d go and you’d look in the newspaper, and you’d find the various parts–
Derrick Harris 11:14
I was just in Aries the other day.
Gleb Budman 11:16
And I guess some people still do, but it’s rare. Right now if you’re buying a laptop or a desktop you don’t make it yourself. And part of the reason for that is that the manufacturers have driven down the prices of all the components in the assembly of it to where the margins are very thin, and everybody buys Macs and PC’s and they’re just done.
Gleb Budman 11:36
The same thing isn’t true in servers. The server margins are still 50, 60, 70% and the price is relative you can build for yourself if you assemble it is incredibly high. So we always talk about if there’s a data center tax, or a server tax or an enterprise tax that the vendors impose. At this point as a company a lot of times you can just cut that out and build it yourself. At some point someone potentially come in and say, I will do this mass market, thin margins and ship it.
Derrick Harris 12:13
I think I just saw a company– Sage cloud or something, I forget the name exactly. But that seemed to be doing something very similar, it was kind of comparing itself to Amazon Glazier, although it was like this is cheap disk for backup and its super scale. It stated in the petabyte range I think. So maybe that is the new model, right? If you’re just willing to do at that scale and take that low margin I guess you could do it. You have a lot of people now building Backblaze systems – Netflix’s kind of inspired by. But you have actual companies and organizations right?
Gleb Budman 12:49
Yes. Harvard puts a lot of their medical imaginary on it. University of Alaska puts all their Gio location data on it. Red Bull puts a lot of their video footage on it – Crispin Porter which is one of the largest ad agencies. They used to put all of their footage onto tape and designers would request it back at some point and go, There was that one video I wanted to work on. And at some point it would get pulled off of tape when they needed it and they said, You know for the cost of one of these boxes with spinning drives it’s a no-brainer to just leave this stuff online all the time. So there are, there are hundreds of these use cases all over the world.
We have a few petabytes on Gleb Budman that’s our archives. So we use Cassandra as a primary storage system. And continuously archive that into Gleb Budman, we make more copies of it. We do a cross-country copy to have an archive for emergencies and things like that. The ability to just write stuff and worry about whether you have enough disk space has been very useful. Our big data systems are all stored in Gleb Budman and we originally got into– the first thing we put into the cloud was we rented a disk space in the data center and it was taking too long to get more so we just pointed all of our logging into Gleb Budman buckets and just started collecting data there.
And then we started using PMR, Hadoop to process it and that was in 2008, 2009 where we were basically going from scratch. That was the first thing we tried, and that seemed to work, and we kept going from there.
Derrick Harris 14:23
The summary of it seems like– as diametrically opposed as building your own stuff versus moving to the cloud–the cloud kind of lets you hack together your own thing. When you decide you want to do something you’re not constrained by–
Exactly. We’ve been infrastructure experiments. Let’s try, let’s see what happens if we have 50 machines in Brazil – kind of okay, but it’s really not paying for itself; all right let’s turn this off again. If anyone’s tried to ship hardware to Brazil knows that that’s next to impossible. If you go to your CIO and say, “Hey I need 50 machines in Brazil for a few weeks to try something out – get out of my office kind of… [chuckles] Don’t be crazy. We just did it. We just did it. Tried it, shut it down, and learned some things that were useful. We’re doing East, West things.” We did a benchmark recently where basically for on a four-way conversation we used 96 of the solid-state disk instances for a week or so – to benchmark. That’s 200 terabytes of SSD off of the whole way conversation. Started it out that afternoon moving 18 terabytes of data at 9 gigabits across the country.
Derrick Harris 15:30
That’s like the old days.
Yeah. But you learn something so quickly by being able to just go and play with the infrastructure. This is why public cloud– we’ve got to the point now were even thinking about touching real physical hardware is just annoying. It’s so slow and awkward. I went to the highlighter – our host for this event – hopefully all of you have got the cloud economics book. You should study the formulas on page 218. Which basically prove that public clouds the only way to go.

[laughter]
The statistical multiplexing loads, everything averages out. You want to be a small fish in a big pond. Because then I can say, “Hey I need a 100 SSD machines for a week,” and if you’re in your own infrastructure – now actually we don’t really have that many – I know we’re a cloud but the cloud’s got these limits and you keep bumping into those limits. We don’t want to have to keep bumping into those limits, it slows us down – everything is about speed for us.
Derrick Harris 16:27
I’m just thinking on the hardware side alone, you’re talking about– Backblaze does the storage pods. Amazon does open connect based on the Backblaze theory of [chuckles] right. Facebook and Open Computer are building all sorts of things. It seems like a lot of work, but it kind of goes back to the last talk with Gary Grider about HPC communities doing these things and the web communities’ kind of doing similar things. It seems there are some parallel efforts in building open source hardware let’s say. Is there any communication among the organizations or companies trying to do this?
Gleb Budman 17:03
Certainly has. Facebook has it, Open Connected has it, Open Advisory – community stuff – our groups have chatted at various points. At some level we focus on our own needs at some level. So for us the storage appliance that we need for Backblaze is very, very dense, very, very cheap, very power efficient – not very fast. What Adrian needs from flux is a little different than that. What Facebook needs is little different than that. So, we go and build our own things.
Gleb Budman 17:39
I think that out in the community then lots and lots of people have needs similar to ours, so they go and use our design. Lots of people have needs similar Adrian’s and they’re probably using Adrian’s design. Various people have needs similar to Facebook’s, and they’re use… So I think if you look at the history of Linux it’s somewhat similar. There are multiple threads and different people work on those different threads, there might be a stable thread there might be an innovative thread, there might be a performance thread, there might be a big data thread. But it will evolve and it’s not going to be one single track.
We’ve been open sourcing our software platform very aggressively over the last year. We’re going approximately one completely new project every week. Last week was a traffic routing management layer. This week was the Ice software I mentioned. And next week we’re open sourcing some things related Hadoop summit – our big data platform. We’re just working through. We’ve got over 30 software projects that are fairly substantial out there now.
So that’s us sharing. We’re building a community around Netflix’s OSS – be happy chat to anybody about what is this thing and how can we use. But it’s basically an on-ramp for people who want to build cloud native applications.
Derrick Harris 18:56
Last question here Adrian. I was curious you said you said– I think you said, “Being a little fish in a big pond is a good idea.” How much sway? With all of this stuff you do on Amazon how much sway do you actually have over how these features get rolled out? Are you a big fish in that pond to?
We have a great relationship with AWS where we’ll ask for something and they’ll go – they’ll think about it a bit. Then I’ll go I’ve talked to a few other people and say, “Okay maybe that is a good idea and eventually after much conjouling and twisting of arms they come out with something.” So there’s some things like the solid-state disks some of the IPV 6 support and the hardware security modules are some good examples were we probably lead the initial conversation, but once they stated to talking to other customers they said, “Oh yeah, we need that to.” So they won’t do anything really substantial just for us, but getting them to think about new ways of using the cloud has been a good driver for them.
Derrick Harris 19:50
All right and the hook is here.
Gleb Budman 19:50
It will explain the details of page 218.

[applause]
Joe Weinman 19:56
Guys thanks.

[applause]
Joe Weinman 19:59
Thank you guys for hanging in there. It’s been a pretty two days a lot of great presentations and panels. What I did is I wrote down the 14 major points from the two days. So what I would like to do is review them. But what I’d like to do even more is welcome back Om Malik and Stacey Higginbotham to the stage and they’ll do some closing remarks. So thank you it’s been great.

[applause]

[music]
Stacey Higginbotham 20:31
Hey we picked up this guys backstage so we thought we’d bring him on to. Give a warm welcome to Derrick as well.

[applause]
Derrick Harris 20:36
Thank you.
Om Malik 20:39
Wow. You guys are really brave to stay here when you know there is a cocktail about to kick off. So thank you for making Structure so awesome this year.
Stacey Higginbotham 20:51
This is probably the most number of people we’ve hard. So thank you to Derrick and your awesome panel right before.
Derrick Harris 20:56
I would think Adrian he puts in – the name recognition.
Stacey Higginbotham 21:02
I was just not going to stand in the way, but thank you guys for coming. Thank the entire team for putting this on and thank our sponsors. Sponsors. More sponsors.
Om Malik 21:14