In this episode, Enrico Signoretti talks with Tom Lyon about infrastructure composability, growth, and network speed, and how DriveScale works to provide scalable solutions.
Guest
Tom Lyon is a computing systems architect, a serial entrepreneur, and a kernel hacker. Prior to founding DriveScale, Tom was founder and Chief Scientist of Nuova Systems, a start-up that led a new architectural approach to systems and networking. Nuova was acquired in 2008 by Cisco, whose highly successful UCS servers and Nexus switches are based on Nuova’s technology. He was also founder and CTO of two other technology companies. Netillion, Inc. was an early promoter of memory-over-network technology. At Ipsilon Networks, Tom invented IP Switching. Ipsilon was acquired by Nokia and provided the IP routing technology for many mobile network backbones.
As employee #8 at Sun Microsystems, Tom was there from the beginning, where he contributed to the UNIX kernel, created the SunLink product family, and was one of the NFS and SPARC architects. He started his Silicon Valley career at Amdahl Corp., where he was a software architect responsible for creating Amdahl’s UNIX for mainframes technology.
Tom holds numerous U.S. patents in system interconnects, memory systems, and storage. He received a B.S. in Electrical Engineering and Computer Science from Princeton University.
Transcript
Enrico Signoretti: Welcome everybody. This is Voices in Data Storage brought to you by GigaOm. I’m your host Enrico Signoretti and today we will talk about infrastructure composability. It is a paradigm that is not really new, but at the same time, it suffered a lot in the past because the technology was not ready and the marketing didn't help either. Things are changing though, and pretty quickly I would add. New players are now entering the market, technology is maturing and end users have been asking for infrastructure that is easier to provision and manage with better efficiency.
My guest for this episode is Tom Lyon, CSO, and co-founder of DriveScale. Tom's career in the IT industry is incredibly long and full of success. He started at Amdahl, and then joined Sun Microsystems where he contributed to the UNIX cabinet among other things, to NFS and SPARC development. Later he had many other pioneering experiences creating technology startups that had successful exits and was awarded several patents. Now with DriveScale, Tom and his team want to disrupt traditional data center architectures and improve them with its composable infrastructure platform. The goal of this episode is to go through several aspects of composability. What it is, what it does, the benefits and so on. So let me introduce my guest. Hi Tom, how are you?
Tom Lyon: I'm great Enrico. How are you today?
I'm fantastic and thank you for joining me.
Glad to be here.
So did I miss something in the introduction?
Oh, I don't know. I've been around so long I can't remember my own bio, let alone anything else.
No actually, I read it on your website so I'm pretty sure it's accurate, but maybe you want to add something about your job at DriveScale and about the company?
Yeah. So I'm a founder and chief scientist at DriveScale. We've been around almost five years and doing composable infrastructure. And this is really a reaction to what's going on in the server business. My co-founder and I were both in the Cisco UCS Server Group. We were both founding members of Nuova systems that Cisco bought to create that server line and just looking at what's going on in the future of servers and how servers were getting on one end more and more complex with more and more features. But on the other end getting totally commoditized and being used in enormous clusters as very, very cheap computing. And so trying to reconcile all these things going on in the server business—that is what got us to start DriveScale.
Okay, you studied the problem a few years back and that there are these companies that need to somehow improve the way they think [about] daily infrastructure. You mentioned that in your last experience you had this large cluster that needed to be configured and in fact, it brought you to designed architecture and then became UCS but then you wanted to go a step further with DriveScale. What's [are] the major pinpoints for big infrastructure today?
Well, the big thing we see is that servers are too rigidly configured. When you buy them you're stuck with whatever you bought. And if you look, you could go on the Dell website or the HP website and choose from hundreds of different servers and then apply 100 different configuration options to each server that you buy. But once you've bought it, you're stuck with it.
And of course you buy it based on what you think your workload is, but the workload changes all the time, things usually grow. And so you usually massively over-provision things in order to accommodate for growth and the end result is you end up with this huge zoo of different types of server configurations that each represent a separate silo for a separate application. And it's horribly inefficient because you can't move resources around it anymore.
In the kind of mainstream VMware world, this has been resolved with virtual machines and you can buy a pretty homogenous infrastructure to run your virtual machines, but in the scale-out world where you're using tens or hundreds or thousands of nodes for one application, you don't have the benefits of VMware and you can't afford the cost or the performance limitations of SANs. And so people use servers with direct attached storage, but they come with a lot of limitations.
Right. When you mention VMware, usually we mention rigid and enterprise environments from that perspective. So traditional workloads and ways to do things, but actually, on the other end of the spectrum, we have hyper scalability. They have full control of this stack: hardware, software for how they build a center—everything. So do you mean somehow that composability is more a thing for web scalers as large enterprises and ISPs, maybe in the range between one and two thousand servers per year?
Yes, I think so. And if you if you look at HCI that's done wonders for people with small data centers, right? You know having eight nodes in a hyper-converged thing is a pretty large cluster and it makes your life very easy to buy and configure, and I think of HCI as kind of the Happy Meal of the computer industry. It's fast, it's easy, ‘wham, bam’ you're done. But once you're dealing with hundreds and thousands and tens of thousands of machines, it's like having a lot of mouths to feed and you don't feed a large family at McDonald's every day. You learn how to cook your own things.
Right.
And that's really what composability is about.
Yeah, and of course these guys want the best efficiency, even if they don't have real control over the hardware manufacturers, right?
Right. And our approach is to say you don't have to have a whole bunch of new types of hardware, but there's existing hardware that's simple enough and cheap enough to be composed into more complex configurations.
Very good, and what are the key technologies that enable composability?
It's all about network bandwidth. So we got started observing that if you had a 10-gigabit network infrastructure, that would be plenty to have disaggregated hard drives because hard drives are pretty slow. It's really hard to maintain more than about a half a gigabit continuously from a hard drive. So you can put a lot of hard drives on a server with a 10 gig pipe—with Flash that changes dramatically, but the networks haven't changed dramatically. It's very common now to have 100-gigabit ethernet and you can just aggregate a lot of flash drives with 100-gigabit. And it's really the network bandwidth and ease of obtaining this incredible bandwidth that's enabled composable infrastructure.
And also new protocols like NVMe over fabric for example, right?
NVMe over fabrics has done wonders to raise the visibility of the possibilities of disaggregation and composition. But we do a lot of stuff with iSCSI as well. And it's perfectly adequate for many many purposes. So we support both in NVMe over fabrics and iSCSI stuff.
And what are the most common applications that run on top of composable infrastructure today?
Well, it does depend on your exact market focus, so we're very much focused on data-intensive applications in large clusters. So things like Hadoop and NoSQL, things like object storage where you know you're dealing with a lot of data on a lot of nodes and therefore have a big problem when it comes to efficiency. And so we're really helping you tune the efficiency of your data center as well as the flexibility so you can respond to changing workloads.
Do you see demand more for provisioning like, I provision stuff the first time and my infrastructure more or less stays that way for a long time, or do you see your customer asking more for a cloud-like provisioning. One won't work today and the next day I want to change things because I have another obligation that needs some CPUs, some storage. And then again I change my infrastructure again.
Right. I think it's more of a provisioning level thing. And you still need some cloud technologies. Certainly, Kubernetes is still a very interesting thing. On composability, and that Kubernetes—the stuff that happens handles the things that happen at a very rapid timescale and then composability lets you tune things at slower timescales to allocate resources. An example is there's a lot of talk going around about how Kubernetes clusters are very hard to manage above a small scale. So a lot of people have multiple Kubernetes clusters. So now how do you move resources around between those clusters? That's the kind of thing that DriveScale can do very easily.
I see, and what is the road to all automation then in these composable infrastructures?
Well, I think part of the definition of compostable infrastructure is that it has to be programmable. There has to be an API. And that's important, especially for our market because when we go into a customer with a thousand machines, they already have a very mature way to deploy servers and it's scripted and maybe they're using chef for Ansible or something. And so we have to be able to plug into that infrastructure. We can't say “Here's a whole new way to do that.”
Yes, but do you mean that you provide a hardware orchestration layer and on top of it you can integrate it with the ensemble or what else?
I mean from a marketing point of view it's better to think of us as underneath that layer. But in reality, we're kind of side by side, because we have a software-based solution. You have to get the software on the server to start doing composability. And that's done with Ansible, but then you add resources to the server dynamically with DriveScale.
As far as I know, most of the work today in composability is done at the storage level. You mentioned iSCSI, you mentioned MVMe. Do you expect new server technologies that will allow us to physically partition a server anytime soon?
Well backing up a little: the reason it's done with storage today is that that was the low hanging fruit and storage is one of the things that that really complicates server choices. So we think that's the right place to start. But there's lots of other possibilities. We have a competitor called Liquid that does composable infrastructure on PCI Express fabric and they can compose GPUs as well as SSDs. So that's a very different approach to what we're doing but a very similar philosophy in terms of composing servers. You asked about chopping up servers and that's possible if you start with really expensive complicated servers. Like there's these 32-socket servers in the world; there's a few companies that have those and they're typically able to partition them down into smaller numbers of sockets.
Yeah, but then we can't call them a commodity service anymore.
That's right.
So [for people] used to this kind of technology, most of the optimization of the code is done in one or two sockets kind of servers. It does get complicated.
And of course in terms of partitioning, you have virtualization, which is a good enough solution for a lot of things, if you just want to partition a server. More interesting is: How do you combine servers from commodity servers to get bigger systems? And there's a couple of very interesting companies there. ScaleMP has been doing that for a long time and a newer company called TidalScale has a different approach, but they're both able to construct large SMPs from smaller commodity servers and you can think of that as composability, but they're also heavily relying on virtualization hypervisors so you don't really get bare metal performance.
This is good because if we are seeing history repeating again and again, so for many years we tried to have these smaller servers and work on scaled out architectures and now we are talking about SMP again somehow.
Right, and it turns out my co-founder and I did a startup that started about the same time as ScaleMP doing that same thing. And for business reasons, we didn't go anywhere with it, but we're very familiar with that technology as well. But the thing that's changed in a huge way has been partly because of the cloud. Everyone is writing cloud-native applications that are stateless microservices—all these types of things that just don't rely on SMPs anymore and are much more friendly to scale out. And a lot more isn’t known about how to construct reliable scale-out applications. That's what we're optimizing for with our platform.
Also with more and more workloads, I see that the limitation becomes RAM more than CPU.
That's actually probably always been the case. It's funny if you look at the history of computing, it's always been main memory that has been an obstacle to progress. And building fast enough CPUs has always been easy.
Yeah. But now we have a remote memory access over converged Ethernet and things like that. Do you think that this will change so we will be able to borrow resources from the rest of the cloud when in need, or do you think that the software will be written to work around this?
Yeah. There's a couple of very interesting things going on there. And people have tried doing this you know network paging memory sharing. Ever since the ‘80s, there's been research projects but there's some interesting products now. Both Intel and Western Digital have products that let you treat an attached SSD as extended memory and this is not to Optane DIMM, but it's the Intel memory extension technology using Optane SSDs and with NVMe over fabrics you could extend that over the network and essentially have—it's glorified paging in some ways, right? You're reinventing technology to bring back effective virtual memory so that you can not have as much expensive main memory...
And also you don't have a difference in latency that would kill you every time you go out...
Right. Right. And the latency thing—it's like any of the storage hierarchy things—if you fit it effectively in cache it works for you. If you blow through the cache then it doesn't work. And so that's the big problem with these technologies is it doesn't work with enough applications to be considered commodity infrastructure because you'd really have to worry about whether your apps are going to work.
All right. And what can we expect to see in 2019 both from a DriveScale and industry perspective regarding composability?
Well, the big thing going on in the industry is incredibly cheap SSDs are on the way. If you're paying more than 20 cents a gigabyte now, you're paying too much and that's probably going to continue to fall. And then NVMe over fabrics is getting mature with NVMe over TCP as one of the standard options, it'll make it much more accessible to general data centers. And then there's a whole crop of chip vendors building specific chips for NVMe over fabrics targets. So we have Broadcom with the Stingray, Mellanox with BlueField. Marvel has a family of controllers and Kazan Networks is a startup doing this. So the whole target side of NVMe over fabrics is also going to get very fast and very cheap.
So the market there is going to expand very, very quickly in the next year or two you mean?
Yeah.
And what do you think about DriveScale—where is it going to develop the product of the platform?
Well, today we're still at… we started out handling disaggregation composition, just of hard drives but as soon as the NVMe over fabric standard was done, we had it in NVMe. The bottleneck to doing composability is often whether or not there's a standard data path, and so if you look at composing GPUs you can do it on PCI Express or there's some software techniques for doing it as well. And we're looking at partnering with some of those companies. But it's hard to see what the universal solution will be and then there's lots of different accelerators coming along for deep learning and other things and they'll all represent things that ultimately you will want to desegregate from the server as well.
Yeah. And somehow telling that 2019 will be a year of transition for composability?
Yeah, I think we'll see it in a lot of more places. Unfortunately, we're also seeing a lot of people just using the term ‘composable infrastructure’ when it doesn't really have any difference from what they were doing previously. So that's not a good thing.
I'm doing research for GigaOm these days around this topic and in fact, if you start to dig a little bit, you find a lot of marketing messages on the same product that was already available two years ago and now it's just ‘composable.’ So it's frustrating I would say. It doesn't help the market I think.
Yeah. So that's something we have to fight against. And who knows? If the market changes, we may have to rename what we're doing or something, but the challenge is getting people to understand that there's a new way of doing things.
So at the beginning of the show, you told me that many of your customers have large installations of servers and they usually work in the big data analytics and this kind of stuff. Maybe you can go deeper on this and tell us about the workloads, about the type of customer that buys DriveScale.
Usually, the customers that are ‘feeling the pain’ are those who have large clusters and yet they're seeing their workloads changing to where now their type of server is not optimal anymore. So they're looking at how much they've spent and how much they're going to have to spend and are very frustrated by the inflexibility of the infrastructure. And an example there is our flagship customer AppNexus, which actually was bought by AT&T a while ago.
They had a large Hadoop infrastructure and were frustrated because they also had a lot of other servers and thus the two types of servers were incompatible. And so we solved that problem for them for Hadoop, and now they're kind of looking at everything else in their infrastructure and saying, “Hey why don't we run Kafka on DriveScale?” Okay, they did that, and [then] “Why don't we run Vertica on DriveScale. Why don't we run our arrow spike on DriveScale?” So they're going down the path with all these types of things. And you sort of have to get in the door with a specific cluster problem. But then once people understand the product, they start applying it everywhere.
Just out of curiosity because I didn't really understand—so the customer can buy servers from different vendors and different types of service at the end of the day, and what about the storage? Do they buy the storage and then you can take the SSDs in the servers or do they need the different servers with disks?
Well, so two things: first, we never sell storage. We're almost always software but we do have a couple of connectivity appliances. And then secondly, the customer still wants to buy from as few vendors as possible. So Dell is a huge partner for us. We've recently come on their Tier 1 reseller list. And so our customer can buy everything he needs from Dell, which includes—instead of buying the storage in the servers, they buy storage in JBODs or in Flash boxes that get attached through DriveScale to the servers. And then typically those SSDs or drives are priced exactly the same whether they're in the server or outside of the server.
Right. So they can buy the servers without the disks and then JBODs that they can attach over the network. Very good. Fantastic.
Tom, thank you again for the time you spent with me today. And maybe you can give us a couple of links about DriveScale. So the website—where we can find your company and maybe you on Twitter just to continue the conversation if somebody is interested?
The main DriveScale website is an obvious place at www.DriveScale.com. I actually have my own podcast series there which is pretty fun to listen to. And then on Twitter we're @drivescale_inc, for the company and I am @aka_pugs, ‘pugs’ being a nickname.
Great! Tom, thank you again. And bye-bye.
OK. Thank you Enrico.
- Subscribe to Voices in Data Storage
- iTunes
- Google Play
- Spotify
- Stitcher
- RSS