In this episode Enrico Signoretti speaks with Liran Zvibel about parallel and scale out file systems as well as HPC and big data workloads.
As Co-Founder and CEO, Mr. Liran Zvibel guides longer range technical strategies at WekaIO. Prior to creating the opportunity at WekaIO, he ran engineering at social startup and Fortune 100 organizations including Fusic, where he managed product definition, design and development for a portfolio of rich social media applications. Liran also held principal architectural responsibilities for the hardware platform, clustering infrastructure and overall systems integration for XIV Storage System, acquired by IBM in 2007. Mr. Zvibel holds a BSc.in Mathematics and Computer Science from Tel Aviv University.
Enrico Signoretti: Welcome everybody, this is Voices in Data Storage brought to you by Gigaom and I'm your host Enrico Signoretti. In this episode, we will talk about file systems but not all file systems. We will focus on parallel and scale out file systems, HPC and big data workloads, as well as all the other file systems enabling users to lift and shift their workloads to the cloud.
My guest for this episode is Liran Zvibel, CEO and co-founder of WekaIO. WekaIO is a new entrant in the file system arena, but they claim better performance than other file systems, scalability and the flexibility necessary to run both on-premises and in the cloud. And all of this, while keeping POSIX compliance, may be too good to be true, I don't know. We will talk with Liran and we'll see what's your opinion on the cloud and we will get also some information about the architecture of the product and this company. Hi Liran, how are you today?
Liran Zvibel: Good, thank you very much, and thanks for having me here.
So did I miss something in the introduction, do you want to give us a little bit more of a background about you and Weka?
I think you actually did a great introduction, and you were correct: WekaIO is the world's fastest and also the most scalable file system. So we solve two big issues. We make sure that shared file system ends up being faster than what people are used to be getting from a local file system, but also we solve all data scalability, so we have directories with billions of files and cloud system with trillions of files which new workloads actually require. And you were certainly correct in the fact that we have the ability to run natively on-premises in the cloud and also lift and shift some workloads to the cloud and back or between the customer’s different on-premises data centers.
Well, great. I want to cover all of it. But I want to start from the cloud perspective. Looks like the file systems are set again in the cloud, meaning that most of the providers start with block or object storage like Amazon AWS. At the beginning, it was EDS and that's free, somehow leaving the file system behind. But then, with more and more organizations moving data and workloads to the cloud, they had to face the reality and brought back the file system.
So if I take Amazon AWS as an example, they introduced EFS in 2016, if I'm not wrong, 10 years after S3. And lately DFS. So a series of managed services for Windows SMB and Lustre. So in the same timeframe, if you look back at 2015-2017 we saw a growing number of startups in this space too, and this is not a coincidence I think, because the uptick in the cloud adoption in the US especially was 2014-2015.
I don't know if you agree with my analysis here, but do you see a demand for a cloud saving file system or are cloud deployments just a consequence of a broader strategy for your customers?
So we definitely see a requirement for cloud-based file systems, and I think you were spot-on. The reason customers now require a file system in the cloud is that they are looking to move more and more of their traditional workloads to the cloud. So new kind of workloads were designed to work with object storage, or maybe from the other perspective, things that were able to be easily ported to object storage were initially moved to the cloud.
But they think most of the organizations have reached the point where for them to keep pushing things to the cloud, and even the cloud providers, they actually need POSIX semantics because this is what applications use. And if they want to move bigger workloads they need shared POSIX semantics which is a shared file system. So now that the cloud is becoming less and less of a toy and more and more of an actual working tool, this is what the real enterprise organizations require.
So looking not only at your customer, but more in general at organizations that are adopting file systems in the cloud, what are the most common use cases that you see?
I think that basically everyone's looking at the cloud. People require the cloud more when they have workloads that are elastic, meaning that they are more project based rather than have the same amount of work throughout the year. So a very strong example of it is around media and entertainment. They are very strongly project-based and then moving to the cloud makes sense because different projects have different amount of resource requirements.
But then quite a lot of the traditional HPC works (we have oil and gas, we have geospatial calculation), because these kind of organizations wait for data to be collected; after its collected, they have an uptick of processing; and when it's done, they're waiting for the next amount of data to come in. So they don't run their infrastructure 24/7. Many chip design EDI customers are also the same: they have their tape-out dates on only a fraction of the year, rest of the time they need significantly less computational resources.
So basically any organization that doesn't need high performance computer running 24/7 throughout the year will benefit from only leveraging a high amount of resource when they need it, and this is where the cloud is perfect for.
What is the driver for this user? I mean, your solution is fast and optimized, but cloud providers have the ecosystem, an end to end solution at the end of the day. Is it all about savings and getting the job done quicker, or is it more that you can provide a hybrid company to this picture?
So again, you're actually spot-on. Users turn to us when they have to have a hybrid IT, so they need to have their on premises users access the data most of the time and then leveraging the cloud for burst. And if you're using the cloud native solution, you have to keep moving a lot of data each time you're doing it, and it makes the whole process prohibitive.
But then looking at how their work is actually being done, we're enabling these customers to get significantly higher efficiency of their infrastructure both on-premises and in the cloud. Customers are now turning into beefier and beefier servers. Intel adds more cores, obviously NVIDIA has GPU, so single servers can do a lot more, and the traditional protocols such as NFS just cannot catch up. So people use us for getting more efficiency out of their infrastructures, and we're actually showing them that we give them 10 to 100 times faster time to result, so actually calendar time.
You get the result faster, but also at the end of the day, it's translating to a financial advantage, because what we in the past called the TCO, that in the cloud is more difficult too because you are not owning anything, but actually you spend less money on the cloud to run your jobs.
And on another perspective, so last month I published a new report for Gigaom about file storage options for the cloud. We defined three categories in the report: traditional systems that are available also on the cloud, file system designed with cloud in mind, and file systems as a service. In the first two cases, the end user keeps control over the stack, meaning they have their own file system and they can do whatever they want. While in the latter, it's the ease of use that wins.
I put Weka in the second category, so file system design with cloud in mind. In your opinion, what are the characteristics that differentiate next-generation powerful system from traditional ones, meaning Lustre and all the others?
Actually there are quite a lot of characteristics, and I want to give another note: we're fully integrated with AWS, we're on the marketplace. Launching a file system requires automatically generating cloud formation template and just launching it. And AWS even acknowledges it by calling us a ‘competency partner.’ They've identified a small number of their partners on the marketplace that they think are competency partners and we're one of them.
So even AWS recognizes the fact that we're not only designed for the cloud, we're close enough to be considered cloud native. Now, there are a few main differences that we have that the traditional solutions don't. One big thing is dynamic elasticity. So the strong point of the cloud is the ability to dynamically add resources when you need them and then take them away, and this is something that we provide and the other solutions don't. Actually, even some of the AWS native solutions don't have the elasticity that we provide.
So you can take a snapshot, push a cluster to S3 and then turn all the instances off. At that point in time, you don't pay anything, which, if you're using the AWS native solutions, you use as long as they keep your data, that's one thing. But even if you are running the system 24/7, you can choose that you spin up more resources and then you get higher performance; and when you don't need that kind of performance anymore, you can shrink it back down, so full elasticity.
Another big differentiation is several AZ support. I think if you're just taking an on-premises technology, you bolt it on the cloud, it will only have support for a single AZ. Cloud native solution must have the ability to run in a multi-AZ way. So if an AZ goes down, your service doesn't have an impact, that's a different thing.
Another thing that customers now are looking for is actually multi-protocol, this is another thing you're getting from the WekaIO solution that you're not getting from any other solution. So with WekaIO you're obviously getting our own POSIX interface that, as you mentioned, is faster than the local file system. But we also support NFS and SMB. So if you have some legacy instances running, let's say BSD or obviously Windows, you can access the same kind of data using SMB for Windows and NFS, and currently no other cloud solution, the EFS, the VFX Lustre or the VFX Windows can actually support different kind of clients. And then obviously leveraging object storage for economic storage of vast amount of data and transparent cloud provisioning and monitoring.
Do you support object storage as a second tier also or only for snapshots and backups?
We support an object storage for second tier also. So we actually have a lot of customers that have a lot of data, in the dozens of petabytes, but they don't need to access that data at very high speed all the time. So they can store only 10 to 20%, or in some extreme cases, 5 to 20% of their data on an NVMe and this is for extremely fast access, and the rest of the data is actually stored on S3.
And do you support on-premises object stores or are you forced to go on Amazon S3 I mean?
We support all object storages that are compliant enough to S3. On-premises we have successful installations with WDC active scales, IBM's Cleversafe, then Scality and Cloudian.
At Storage Field Day you claimed that your scale of parallel file system is faster than locking storage, and you are repeating it also during this episode. It's easy if I think about the throughput because this is a parallel file system, but what about IOPS and latency? How do you do that?
So our ability to be faster than the local file system is basically down to the fact that we have lower latency and we can offer higher IOPS. And the reason that is the case is that we are paralyzing at the 4k granularity where a local file system is actually serial. So the basic reason we are so much faster is that if you look at NVMe devices, the amount of time it takes to write a 4k I/O is about 30 microseconds. The amount of time it takes to read a 4k I/O is about 100 microseconds, and moving 4k I/O with our advanced networking stack takes about a microsecond. So pushing 4k through the network has insignificant relevance to how long the I/O takes.
But now, if you're looking at NVMe devices, you know the deeper the queue is, the longer the response time. So when I said 30 micros and 100 micros, I was actually talking about queue depth 1, when the local NVMe device is idling and just gives you one answer, but what really happens in a system that wants to get a lot of work done and a lot of low latency operations, you have a deeper queue than 1. What we're doing, we're actually paralyzing your I/Os to dozens and hundreds NVME devices through the network, we already agreed that the network doesn't add an overhead, so on average your I/Os get much reduced queue depth than what it would get from the local file system.
So that is the reason that even for what traditionally has been the hardest I/Os for file system the 4k I/Os, we can spread them around the network and we consistently show customers that a local file system is limited between a 100 and 200,000 of I/Os and this is down to what you can get from a single NVMe device. And with our file system, users can get millions of 4k I/Os on a single server.
Okay, that's quite different than what usually happens.
Right. We're actually the only file system to be re-architected and redesigned from the ground up for NVMe and fast networking. So if you go today to a bookstore, if you can still find it, and you are buying a computer science book that you're going to pick and you're going to go to where they're talking about the data structures for a file system, you're going to read about things that make sense from the perspective of a single controller and hard drives. You're going to get trees, you're going to get description of directories that are quite central in their management.
We have redesigned the file system for infinite amount of controllers, extremely low latency networking and NVMe, and this goes through our data structures, our algorithms, our control environments. So it's redesigned to take the best advantage of the hardware people [you] now have on-premises or in the public cloud.
If I look at an on-premises installation, you have total control on the average, so the connections, the switches, the nodes and so on, so you are pretty sure that all the cluster is very homogeneous. But if I go to the cloud like Amazon (and I keep mentioning Amazon because they are just the most common choice in the market), but actually it's for everybody.
So you can have a classic instance that somehow don't perform the same way or, in other cases, the network doesn't give you the same performance from node to node and so on? You have these little inconsistencies, and do you see from your customers' issues on this, or it's just that you keep adding nodes or whatever to make it work anyway?
This is actually an excellent question, and this also distinguishes traditional on-premises solutions that run on the cloud and solutions that were designed with the cloud in mind. So when you have the on-premises solution, the traditional ones, even in our previous companies when you created storage, you have the assumption that the hardware works. If the hardware doesn't work, you meet events and at some point some service person will come and will replace it to hardware that works 100% of the time.
We have designed WekaIO with the notion that hardware almost works. So on-premises, it means that also if you have glitches, and I think on a Storage Field Day that you were able to say that Shimon took down two servers and the client still got the same kind of performance, so even on-premise you have glitches, you're not going to have a service disruption, but it's even stronger on the cloud and what you get is what people call the ‘noisy neighbors.’ It's not that AWS provisioned you a weaker instance. You get slightly less work because other people are also utilizing your infrastructure and different people may be utilizing the infrastructure around you in a different way.
What we have created at WekaIO is very fine granularity of load balancing. So if you look at traditional storage, usually the performance you're getting out of your storage system is the slowest server multiplied by how many servers you have. Or if that vendor has done a tremendous job being a previous generation storage solution, you get the average performance of the servers multiplied by how many servers you have. But still if you have one really lousy one, it drives the whole average down.
We have re-architected our solutions so the performance you get out of the cluster is the aggregate of all the resources that you get, and we actually prove it to customers when they're doubling their cluster size on-premises or in the cloud, they get twice the performance; you double again, you get twice the performance. And as you've mentioned, on the cloud, we cannot do it by asking AWS nicely, “hey, for this experiment, can you please give us the good instances?” We get it by perfectly load balancing at a very, very granular level, so each server provides all the performance it can provide to the cluster. And on average, if you have twice the servers, you'll get twice the performance.
I have another question about the architecture of the product. Historically speaking, most scale-out systems are quite complex. In general terms, they have a centralized metadata management, local file system on each node, and this node store data chunks, and they are usually good only for a limited set of workloads, may be sometimes they are designed for a specific workloads, like HDFS for example, or even Lustre. You can have blocks that are not smaller than MB’s They are not good for IoT and they are not good for maybe storing small files in other cases.
But you talked about 4k blocks, and how do you manage it then workloads that need very large files for example?
So again, this is a great point. One big difference is that we have a completely vertically integrated solution; whereas the other solution have a block layer that needs to be resilient and a local file system and then a virtual file system placed on top of them, and that's part of the reason of not being able to get low latency because you're running an I/O through your parallel file system which is virtualized, then it goes down to local file system, then it goes down to a block device.
We have vertically integrated the file system and the protection, so we don't have a concept of a block device, that's on the one hand; and then on the other hand, we don't have centralized anything because centralized something means that at some point you will stop scaling. So our metadata services are completely scaled out and distributed like the I/O services, so if you add more servers, you'll get more metadata powers as well.
And another important thing that we have is a very intelligent quality of service mechanisms like you would expect from high-end storage, and our quality of service mechanism basically allows you to run mixed workloads in a way that makes sense for everyone, because as you probably know, even if you're buying block storage solutions and the expensive ones, it's very difficult for you to get low latency 4k I/Os and high throughput, let's say one meg I/Os out of block solutions that are easier to manage. With WekaIO if you run a job with 4k I/Os, these I/Os are going to come back with low latency. And when you run another job with one mega I/O or 20 meg I/Os, you're going to get very high throughput.
So we have extremely clever quality of service, internal mechanisms in our queues to make sure that the things that you would expect to get the low latency are low latency, and the thing that you would expect to get high throughput are high throughput. And our customers actually see that they can run mixed workloads and multiple workloads on the same cluster, and it works well; whereas you've mentioned for the traditional parallel file systems, you have to optimize the block sizes and other stuff, so they are truly optimized for one kind of workload. What happens in many cases for this kind of file system is that you're buying that project. You have the vendor's expert come in and optimize it for the customer for the first couple of three months, then they go away and it works okay for the customer if they're only running one specific form of workloads. But then something in the workload changes and then the performance drops significantly. With our kind of system these kind of things are not going to happen because we adapt to the different kind of I/Os that you run, so there are no tunables for you to actually tune.
This is good. I'm not good in tuning this kind of stuff. When I tried, I usually get – it's a tough job. I mean, not that I'm that bad, but all the times that I open my own lab, I try to tune these kind of things. You get not always very, very exceptional results, and there is a lot of work around this.
Right, and even for the competition, I think there are a handful of people worldwide that can do it well, and we have observed this phenomenon and made sure it doesn't happen with us.
Very good. At the beginning, we mentioned a bunch of workloads, for example, EDA, media and entertainment, but actually you didn't mention AI for example. Everybody mentions AI, and you didn't. But somehow it looks like that this file system is good for this kind of work, especially because you can leverage a very huge dataset. Because of the scalability, you have the backend of the object store, so you have a second here to store huge amounts of data and so on. Do you have any customers looking at your solution for this kind of work?
Definitely. I think we talk so much about AI that we are by now taking it for granted. We have a lot of the autonomous projects actually running on us. Some of them you can actually drive on the roads today with commercial products. We have some autonomous claims that will run on us soon. So we have quite a lot of the high-end companies that are doing the cutting edge jobs in machine learning leveraging us.
By the way, machine learning is not just autonomous systems, it goes all about commerce and retail and risk analysis and financials. So all kinds of companies are moving to require machine learning, and they realize that the kind of benefits they expected big data to give them a decade ago, and in many senses big data didn't provide on all of the promises machine learning actually does provide, because you have a much stronger analysis tool and a much stronger mathematical tools to cover you. And you're right, the more data you have, the better is your ability to generate fine-grained results.
So we have customers that collect a lot of data and they also want to keep all of their data, so that they can come back to it and see if they're creating a slightly different model, how would the new model work with a previous kind of data on the one hand, but on the other hand they have this huge GPU field servers that gobble data at incredible speeds. And what we are showing for these customers is that they must use us because previously they have been leveraging a local file system and the local file system is not even fast enough to make sure all the GPUs are utilized.
But the big problem with the local file system is that you're wasting the time caching. So you're wasting the time copying your data into the local file system, and you're wasting the time copying the results out. And we're showing that we're about four or five times faster than the local file system, but customers are getting much bigger savings because it would have been four times faster if the GPUs would have been able to run 100% of the time with the local file system. But what they actually see is that for long patches they spend during I/Os and then these machines are idling and these machines are actually very expensive.
Yeah, indeed. So before wrapping up, I have a couple of very simple questions. One is about the licensing and I how that the licensing is going for you as well. And the other one is about other providers. We talk all the time about Amazon, but I guess you support also Azure and Google Cloud, right?
So from the marketplace perspective, we're currently only on Amazon. So if you don't want to talk with us and just provision a cluster today, you can only do it on Amazon, and then you go to the marketplace, you subscribe to our service and it will auto-generate a cloud formation for you and you could be up and running totally on your own. Today, if you want to run WekaIO on the other clouds and obviously the most prominent ones are Microsoft Azure and Google GCP, you will have to talk with us and we can help you do it.
And what about the licensing?
So the licensing model we are transitioning storage from being an appliance that you buy, and you have to keep re-buying every three or four years, into a subscription model. And the reason behind it is that we are geared towards NVMe devices that don't fail like hard drives every three or four years, so you don't have to replace the appliance on the one hand, and we are scalable into very, very large, many thousands of servers clusters so you don't buy a pre-configured box and when that one's full you buy another one, and that original box will fail at some point.
So we are transitioning into customers just having a single large WekaIO cluster for all their needs, and they would buy an annual subscription over it. We are charging differently for the NVMe tier and for the object tier, and then as you increase it, you're just paying the small increment that you're increasing. So we allow you to transform your on-premises infrastructure to be a lot more public cloud-like.
Very good. I think it's time to wrap up, and before leaving, I'd like to ask you where we can find more information about WekaIO and websites, twitter handles, whatever can empower the audience to stay in touch with you?
So that's a great question. We have our website at www.weka.io. Then if you'd like to read the documentation, you can go to docs.weka.io. And if you'd like to start an AWS install, you can go to if you start.weka.io.
Very good. Liran, thank you very much for your time, and bye-bye.
Bye-bye, thank you very much. I really appreciate you having me and I had great fun recording this session with you.