Voices in Data Storage – Episode 27: A Conversation with Michael Ferranti of Portworx

:: ::

Enrico speaks to Michael Ferranti of Portworx about data storage and containers.

Guest

Michael Ferranti is currently the VP of Product Marketing at container storage company Portworx. He specializes in enterprise software, SaaS, cloud and containers, and previously served as the VP of Product Marketing at ClusterHQ where he led global efforts to create, develop and grow the container industry's first Data Management solution—Flocker

Transcript

Enrico Signoretti: Hey everybody, Enrico Signoretti here for a new episode of Voices in Data Storage, brought to you by GigaOm. Today, I want to talk to you about data storage and containers. Honestly, if I think about the first time I saw containers in this Linux thing that allows you to keep the operating system but actually bring an application is something that at the very beginning was seen as a different form of a transition or a different form of distributing applications that now evolved into something more complex, more mature. In the beginning, I remember having this conversation with some people that were already working in this kind of environment. They told me, “What? We don’t need storage. Everything is stateless. The world will go stateless in a few years, and nobody cares about it.”

I was really skeptical. In fact, what happened? Enterprises started to adopt containers, and storage is a thing. Persistent storage, enterprise storage means availability and these kind of features that we are used to. We lived a couple of years of confusion, I mean, several initiatives, projects that went sideways, and everything was a mess. Now, we have CSI, which is the standard for Kubernetes, so maybe things are getting better. There are a lot of questions that enterprises have on storage and Kubernetes.

To talk about this topic today, I invited Michael Ferranti from Portworx. He’s the VP of product marketing there. Hi, Michael, how are you?

Michael Ferranti: Hi. I’m doing great. Thanks for having me. Looking forward to chatting today.

Great, so thank you for the time you took to record this podcast today. I usually ask my guests to give me a short introduction about themselves and their company.

Yeah, happy to do it. I mean, hearing your intro, I really – it gave me a number of flashbacks. What you just described in terms of the very, very early days of the containerization movement was one in which we said, “You know what? Containers are for stateless apps. If you have a stateful app, first of all, I’m sorry for you. You’re doing it wrong, but if you’re going to do it anyway, then you need to run that outside of containers.”

My containerization journey started at that point. I was working at a company called Rackspace with a team that was building and running a SaaS application, and the engineers on that team were using containers. In fact, one of my colleagues at the time was on Docker’s homepage as a customer testimonial talking about just how powerful a technology Docker was. At that time, I was starting to think about, ‘okay, I really like my job, but I want to do a startup. I want to build something from scratch; not just be a part of an existing company or even a very successful one.’ I said “I want to do something around containers. If developers are this excited about it, then there’s got to be something there.”

Then I said, “well, what about containers?” I just started looking at the enterprise IT stack, and I said, “okay, well, I could do something around containerization itself. That’s a compute problem. I could do something around networking. I could do something around security. I could do something around storage.” In the process of this, I met an entrepreneur who had actually built a storage and data management solution for an alternative version of containers, BSD jails, and really convinced me that in order for containers to be useful, you had to have storage and data management as part of it.

As soon as I told my friends and my colleagues that I was going to go and join this startup that was going to do storage and data management for containers, I got the reaction that you talked about at the beginning of the call, which is “You’re crazy. You obviously don’t understand what containers are for. You don’t understand how modern applications are stateless,” and I knew that based on that feedback, my decision to quit my comfortable job and join a startup looking at this space was either a giant mistake or potentially a giant success.

I think the intervening five years – that was in June of 2014, the month that Kubernetes coincidentally launched, about a year after Docker itself had come on the scene. In the five intervening years, I think, as you pointed out, containerization itself has come into its own. Enterprises are now saying “How do I use this for my entire IT stack?” which means that we can’t ignore data and all of the attendant data concerns around security and performance and reliability and disaster recovery. I’m more convinced than ever that if these problems can’t be solved, then containers are just going to be a blip in IT history.

The same way, if VMware never moves beyond test/dev – if vSphere could never be used for anything other than test/dev environments, then not only would VMware not be what it is today, but we wouldn’t have Amazon. We would have Google, but they would still just be doing search and YouTube. We wouldn’t have GCP. We wouldn’t have Azure because that required virtualization technology that could run enterprise apps, and I think we’re at the same point now with containers where, unless we can bring on those true enterprise class applications, they’re not going to live up to their potential.

Yes, indeed. A lot of time passed, and now finally, we are seeing enterprises adopting Kubernetes mostly. Kubernetes is the orchestrator on which you base your infrastructure, and then on top of which you run your applications. To some extent, all these micro-services based on Kubernetes and containers could be stateless. I mean, you can use extended [Kubernetes] on that basis. You can use ‘extended’ on the object stores.

Theoretically, you can do it without having storage managed inside the Kubernetes class, but actually, there are a few issues. Okay, the first [one] that comes to my mind is, for example, portability of this application. I mean, if you’re leveraging external services, how can you be sure that in the next environment where you are going to move your application you can replicate the same identical thing? I don’t know if you agree with this, but maybe you have something more to add on that.

Absolutely, portability is a huge concern. If you’re running on Amazon and you’re using RDS, RDS is a great service, and I’m not going to discount it entirely. If you’re using RDS behind your Kubernetes platform, then you need to know that you are binding your containerized applications to Amazon. It’s going to be very, very challenging to move those applications to Azure. It’s going to be very difficult to move it Google. It’s going to be very difficult to move it on prem, and that includes from a DR perspective. Most of the customers that we work with are picking Kubernetes because it allows them to escape their own datacenter but also to avoid vendor lock-in at the cloud perspective.

The customers that want to avoid vendor lock-in, these are not people that don’t like their cloud provider. They absolutely love Amazon. They love Google. They love Azure, but they’re realistic. They saw what happened in their own datacenter with VMware that they also love from a technology perspective but, ultimately, was limiting. They don’t want to repeat those same mistakes, and so that’s one of the big reasons we are seeing customers put their data services directly on Kubernetes. They can have that portability. That’s very, very important to them from a risk perspective and from a reliability perspective.

The portability is an important aspect, but actually, with the introduction of the CSI – and acronym means container storage...

Interface.

Actually, the acronym means ‘container storage interface.’ Now, all the vendors, primary vendors, everybody is developing a CSI plug-in. Why should I think of another storage platform in my datacenter? I mean, I already have everything. Can we avoid having something [new] and stay with my traditional array? Why not?

That’s a great question, and I’ll explain why I answer it in a particular way. Obviously, everything depends on the details, and so I can’t provide architectural recommendation without understanding the differences. I’m going to make some assumptions, and I’ll just lay them out.

The first assumption is that you’re adopting Kubernetes because you have some amount of scale within your IT environment that requires automation. Kubernetes was invented at Google before it was called Kubernetes because of YouTube, because of Google Search. These are large scale systems that you could not manage with an army of SREs. You had to leverage automation, and so if you’re picking up Kubernetes and you’re using Kubernetes, that means you yourself have some level of scale that you can’t just solve with human operators. If that’s the case, then you need to look at whether or not your technologies in your stack are able to handle that type of scale.

Just to get to the storage aspect, can I use my existing storage array with a CSI plug-in? I was speaking to another analyst recently, and they said that they had done an inquiry with an array vendor who was saying you know what? In one of our typical arrays, on a daily basis, we’re getting dozens of operations that are performed by a storage administrator. When this array goes into a Kubernetes environment, or rather is automated or managed by Kubernetes environment, that number goes up by an order of magnitude, so the number of operations on your array is going to increase dramatically.

Some people think, oh, well, I’m not running tens of thousands of pods. Why would I have tens of thousands of operations, right? I might only have 100 pods, and I’m very easily able to have a couple hundred volumes on my existing array.

The difference is that, when you put Kubernetes in charge of a deployment in management, you never know how many times that container is going to be deployed and redeployed and moved across your environment because Kubernetes is constantly rebalancing your cluster to maintain the desired state that you define in your application configuration. You’re not in a situation where within a typical VM workload you’re going to deploy an application once, and it’s going to live in that location pretty much for its entire existence.

With Kubernetes, you’re going to deploy an application once. Then it’s going to be moved. Then it’s going to be moved again. Then it’s going to be moved again, and that puts stress on any system that was not designed with a high level of paralyzed operations in mind. That pretty much defines most SAN or array-based storage systems.

The other thing is that just because I have a CSI plug-in does not mean that my storage array can be managed via Kubernetes using all of the Kubernetes primitives. CSI does not define any particular behavior. It simply is an interface. For instance, if you can take a snapshot of your – using your storage array, which you can because that’s just basic table stakes functionality. That does not mean, as an example, that you could take a backup with snapshots at a namespace level via Kubernetes.

The storage array itself has to be able to understand what it means to – what a namespace is, what resources are associated with that namespace, and how to control them as a group. This is one of the things that a lot of our customers end up realizing late into the game is that the primitives that they expect to be able to use as a part of their Kubernetes deployment, things like namespaces, don’t have an equivalent within their existing array, and in addition to the scale problem that we talked about, you can end up in a situation where you’re not able to efficiently manage your storage resources because they don’t speak that Kubernetes native language.

Let me try to recap a little bit what we said. First of all, there is a problem of amount of operation that can be performed by Kubernetes. We totally agree on that. I mean, if the application is to scale and you need to spin up, I don’t know, 100 containers all of sudden, the risk is that your array is not able to perform enough operations to provision the resources quickly or quickly enough.

Yeah, exactly.

The second part is also scale. This is also incompatible – humans and machines working together could be very, very complicated. Letting your machine do stuff autonomously, it’s not for everybody. The other thing is the level of complexity. Our old operators saw the competence of a Kubernetes classes can be mapped not really, really one-to-one to a traditional array. Maybe you can find operations that somehow don’t find they’re relative in the physical platform, and you don’t really know what is going to happen in some operation, especially complex, orchestrated operations where you have to take, for example, a snapshot of hundreds of containers to make a backup copy, or replications, or things like that. That should be the case why – that should be the reason or the first reason why you should look at a different platform.

Is there any other reason? I’m sure that over time, in the next one year or two, most of the vendors will come up with solutions to this. There will be new versions of the few more. There will be a new version of the management interfaces, and maybe they will come up with solutions. Is there any other reason for which somebody should think about a platform designed for a Kubernetes?

Yeah, I think there are. It’s a number. Really, what is important to the particular enterprise is going to be different. I mean, clearly, there’s a trend toward software defined everything. If your datacenter strategy includes buying a specialized hardware from specialized vendors that more and more is an anti-pattern – you have the ability to use software on top of commodity hardware in order to run really large scale enterprise applications.

One reason I think that VMware, as a little bit of an aside, rightly embraced Kubernetes as part of its VMware announcements this year is because they’re seeing pressure within the enterprise datacenter to remove vSphere and to use Kubernetes natively on bare-metal servers. They want to continue to serve their enterprise customers with a whole host of capabilities and not just say, “okay, well, Kubernetes has given me the ability to get rid of vSphere.” They don’t want that narrative in the market. They’re saying “you know what? You can use vSphere to manage your Kubernetes.” All of this is to underscore this idea that many parts of the stack are being architected out, and we see that a lot of times with these specialized hardware storage systems in favor of software defined systems that can run in multiple environments, including the cloud, not just on prem, so that’s one.

Another reason I would say is most storage systems think about the world from a storage perspective. I know that is an obvious statement. I’ll follow it up by saying Kubernetes is not an infrastructure centric view of the world. It is an application centric view of the world. I’ll challenge you a little bit. Enrico, I have the utmost respect for you, so please understand that I agree that the enterprise vendors will try to solve the problems that I spoke about. I’m skeptical that they will be able to because I think there’s a fundamental mismatch between an infrastructure view of the world and an application view of the world.

As a for instance – most storage arrays assume – because they were based for VM – because they were built for VMs, assume that a single application runs on a single VM. I can manage an application with machine-based capabilities, so for instance, I can back up an application running in a VM by taking a snapshot of a machine. That doesn’t work in a model where you have a multi-container application that’s a distributed system that is running across a whole host of machines where, if I’m going to take a backup of it, not only do I need to be able to back up individual container volumes across a host of clusters, but I need to be able to influence the application itself to quiesce its database in a way that makes sense for Cassandra, or for Kafka, or for Elasticsearch such that I can take that distributed snapshot. That requires software at the app layer.

One of the things – and not to make this a pitch for Portworx, but I this is I think a good illustration of the point I’m making. For our customers who need to take application consistent snapshots and backups of distributed databases like Elasticsearch and PostgreSQL – excuse me, and Kafka and Cassandra, we built a tool with pre- and post- hooks that understands how Kafka needs to be quiesced versus the way Cassandra needs to be quiesced versus the way that Elasticsearch needs to be quiesced so that that snapshot can be truly application consistent and resilient against data corruption in the case of a recovery.

We heard from our storage advisors that you’re thinking about the problem wrong. You’re a storage provider. You should not be thinking about things at the app layer. That’s one of the few things we don’t have to worry about. We have to worry about data loss. We have to worry about data corruption.

One of the things, though, that we can do is say, “you know what? That’s an application level concern,” but from our perspective, our customer is the application owner or the application architect. We want to give them the ability to use storage for the benefit of their application, not for the benefit of infrastructure. I think some of the array vendors are going to struggle to develop that application know-how and the ability to execute that application know-how within their software stack, which takes a very infrastructure focused view.

I’m sure like any prediction it’s not going to be 100% accurate. I’ve seen enough times where a statement is made at a – in a keynote, but it turns out that years of architectural decisions make it much more difficult to implement. That’s the beauty of technology evolution, which is that we wouldn’t have seen things like Kubernetes itself if we could’ve just incrementally changed the way in which we manage VMs, and I think that’s one of the exciting things both from a vendor perspective but also from a customer perspective. You get to take advantage of those exchanges.

We are talking more about data management than proper storage management here and the fact that you talk about application, but actual application manages data. At the end, you are thinking more at the application level to get the data consistent in the right way, something that if you start to think about something that you start to view from the storage perspective is much more complicated and maybe impossible in some cases.

It’s totally understandable that if you have something that sits in the middle, that understands Kubernetes and understands the applications that are running on it, it can take, as you said, a snapshot, making the right call for replications and so on.

Also, another thing that we mentioned at the very beginning is portability. I think this data and storage layer that is portable software means that you can install it everywhere. Then I have a question for you. Yes, it’s portable. At this point, I have to ask this. How does Portworx work? How can I make it live in my own premises for such where I already made investment in my traditional storage infrastructure and, at the same time, having it running on a cloud, any cloud possibly?

Yeah, it’s a good question. I’ll just explain a little bit about how Portworx works. We’re 100% software solution, so Portworx itself runs as a container. You would use your existing Kubernetes basically, processes to deploy Portworx as a – we have an operator. You could deploy us a DaemonSet. You can just run the Portworx pod on any host in your cluster.

Once you’ve done that, once you’ve installed Portworx, basically, on each of your Kubernetes worker nodes, we are going to take the storage that’s available on each individual server so that, basically, the block devices that would be visible if you were to run lsblk on the command line, on any of those hosts, we’ll see a bunch of block devices. It could be a single block device. It could be multiple block devices. You might have some SSDs, some HHDs. You might have some NVMe. If you’re in the cloud, you might bootstrap each of your VMs with say an EBS volume on the cloud, or in your vSphere datacenter, you might have – each of those volumes might be a Netapp volume or an EMC volume. Really, from our perspective, it doesn’t matter.There is some storage that’s available on this host.

Portworx will take that storage, combine it with all of the other storage available on each of the other nodes, and turn it into a single cluster-wide storage fabric. From there, you can deploy your Kubernetes applications by defining your deployment spec, which would include, if you’re deploying PostgreS, will I need a volume? It needs to have this size. Maybe it needs to have a particular performance profile and need a certain – guarantee a certain level of IOPS. I need to imply an encryption policy or backup policy.

You define all of that in your storage class or your PBC, and then Portworx makes it true. You deploy the application. We look at the underlying storage resources, and we basically match them to your desired state that’s defined through configuration. From there, let’s say the customer said I always want to have three copies of my data. We will basically create replicas of that data somewhere else in the cluster. We understand network topologies such that, say you’re running an Amazon and your Kubernetes cluster has spread out across two different availability zones. We’ll place the replicas across those boundaries so that you get the maximum amount of high availability just as a function of your network topology.

Then you talked about portability. Okay, so now I have my application deployed within Amazon. I’m running across multiple availability zones. I have my performance profile that says some apps are running on NVMe. Others are running on a block storage with dedicated IOPS. Portworx has done all of that for me automatically. Now let’s say I want to take a backup, or I want to – let’s make it even more complex. I want to have a DR process for this application. What Portworx will then do to solve the portability problem is not just back up and move the data. Clearly, in order to have disaster recovery for a stateful service, you need to back up the data, but that’s not enough in order to quickly recover your application.

You also need the application configuration itself, so Portworx will move both the data and the application configuration between environments. Say from Amazon east to Amazon west, or from Amazon to Azure, or from Amazon to datacenter. It doesn’t matter what the particular targets are. We’ll package up the data and the application configuration, which means when you need to recover that application it’s simply a matter or redeploying the pods. Your app configure is already there, and it’s already been remapped. For instance, when you move those volumes, the UID of the volume is going to change. We would automatically rewrite your configuration, the AML files, using those new volume IDs, making recovering the application very, very fast.

This is just another example of one of the differences between Portworx’s perspective of the world where we are a storage solution, but we think in terms of applications, not just in terms of infrastructure. I don’t know of any other storage solutions that are going to back up application configuration and handle the rewriting of app config for the location of your infrastructure elements like your volumes in order to speed recovery. We have DR capabilities that can do zero-RPO by spanning a single Portworx cluster across environments when the latencies are not too great in order to do that, or we can do snapshot based. In both instances, we can guarantee that low RTO, recovery time objective, because the app config is moved along with the data. That’s how we think about portability.

At the same time, it looks like because you have this storage layer that sits into Kubernetes and can be on the cloud as well as on premises, especially on premises, you can theoretically take advantage of existing storage resources and storage capacity instead of using storage that is the local node and enabling reuse of existing resources. This is probably why some of your investors are big primary search vendors.

Yeah, so in our latest round of funding, Netapp participated, HP participated, as did Cisco. Yes, absolutely, there’s a desire to make those existing investments on the customer work for Kubernetes deployments. You’re exactly right. A customer doesn’t have to say “you know what? I made this investment in a really great storage array. I know that it doesn’t quite meet the needs of my Kubernetes deployments, but man, I would really like to be able to use it and not have to buy new hardware.” Portworx makes it possible to do that. You can use your existing storage with the Portworx software layer on top and truly have a cloud-native solution.

This is different from say, using CSI directly. If you use CSI directly, again, CSI does not define any particular behavior. CSI doesn’t mean that your storage array is going to suddenly be migrating application configuration between environments in order to ensure low RTOs. They’re going to handle RPOs, how much data you’re going to lose, but in terms of application recovery, that’s not their level of concern. With Portworx, to borrow a phrase, you have the best of both worlds.

Yeah, fantastic. I think that this conversation was great. I mean, we only scratched the surface here. To wrap up the episode, it would be nice to have a few links about Portworx where we can find your company and Twitter, as well as on the web, and maybe if you want to share your Twitter handle with us, somebody can contact you directly to continue this chat.

Yes, absolutely, so I’d love to continue the conversation. On Twitter, I’m ferrantim, @ferrantim. That’s Ferranti and [the letter M for] my first name, Michael. Then you can find Portworx on Twitter at @P-O-R-T-W-X. That’s @Portworx without a couple of letters. As you know, it’s hard to get a good twitter handle these days, so P-O-R-T-W-X, or you can just mention Portworx, and we’ll find it.

Great, thank you again for a nice conversation. Bye-bye.

Yeah, thank you very much for having me. This was very enlightening. Thank you.

Interested in sponsoring one of our podcasts? Have a suggestion for a great guest? Please contact us and let us know.