I was fortunate to sit down with Matt Butcher, CEO of Fermyon, and discuss all things application infrastructure, cloud native architectures, serverless, containers and all that.
Jon: Okay Matt, good to speak to you today. I’ve been fascinated by the WebAssembly phenomenon and how it seems to be still on the periphery even as it looks like a pretty core way of delivering applications. We can dig into that dichotomy, but first, let’s learn a bit more about you – what’s the Matt Butcher origin story, as far as technology is concerned?
Matt: It started when I got involved in cloud computing at HP, back when the cloud unit formed in the early 2010s. Once I understood what was going on, I saw it fundamentally changed the assumptions about how we build and operate data centers. I fell hook, line and sinker for it. “This is what I want to do for the rest of my career!”
I finagled my way into the OpenStack development side of the organization and ran a couple of projects there, including building a PaaS on top of OpenStack – that got everyone enthusiastic. However, it started becoming evident that HP was not going to make it into the top three public clouds. I got discouraged and moved out to Boulder to join an IoT startup, Revolve.
After a year, we were acquired and rolled into the Nest division inside Google. Eventually, I missed startup life, so I joined a company called Deis, which was also building a PaaS. Finally, I thought, I would get a shot at finishing the PaaS that I had started at HP – there were some people there I had worked with at HP!
We were going to build a container-based PaaS based on Docker containers, which were clearly on the ascent at that point, but hadn’t come anywhere near their pinnacle. Six months in, Google released Kubernetes 1.0, and I thought, “Oh, I know how this thing works; we need to look at building the PaaS on top of Kubernetes.” So, we re-platformed onto Kubernetes.
Around the same time, Brendan Burns (who co-created Kubernetes) left Google and went to Microsoft to build a world-class Kubernetes team. He just acquired Deis, all of us. Half of Deis went and built AKS, which is their hosted Kubernetes offering.
For my team, Brendan said, “Go talk to customers, to internal teams. Find out what things you can build, and build them.” It felt like the best job at Microsoft. Part of that job was to travel out to customers – big stores, real estate companies, small businesses and so on. Another part was to talk to Microsoft teams – Hololens, .Net, Azure compute, to collect information about what they wanted, and build stuff to match that.
Along the way, we started to collect the list of things that we couldn’t figure out how to solve with virtual machines or containers. One of the most profound ones was the whole “scale to zero” problem. This is where you’re running a ton of copies of things, a ton of replicas of these services, for two reasons – to handle peak load when it comes in, and to handle outages when they happen.
We are always over-provisioning, planning for the max capacity. That is hard on the customer because they’re paying for processor resources that are essentially sitting idle. It’s also hard on the compute team, which is continually racking more servers, largely to sit idle in the data center. It’s frustrating for the compute team to say, we’re at 50% utilization on servers, but we still have to rack them as quickly as we can go.
Okay, this gets us to the problem statement – “scale to zero” – is this the nub of the matter? And you’ve pretty much nailed a TCO analysis of why current models aren’t working so well – 50% utilization means double the infrastructure cost and a significant increase in ops costs as well, even if it’s cloud-based.
Yeah, we took a major challenge from that. We tried to solve that with containers, but we couldn’t figure out how to scale down and back up fast enough. Scaling down is easy with containers, right? The traffic’s dropped and the system looks fine; let’s scale down. But scaling back up takes a dozen or so seconds. You end up with lag, which bubbles all the way up to the user.
So we tried it with VMs, with the same kind of result. We tried microkernels, even unikernels, but we were not solving the problem. We realized that as serverless platforms continue to evolve, the fundamental compute layer can’t support them. We’re doing a lot of contortions to make virtual machines and containers work for serverless.
For example, the lag time on Lambda is about 200ms for smaller functions, then up to a second and a half for larger functions. Meanwhile, the architecture behind Azure functions is that it prewarms the VM, and then it just sits there waiting, and then in the last second, it drops on the workload and executes it and then tears down the VM and pops another one on the end of the queue. That’s why functions are expensive.
We concluded that if VMs are the heavyweight workforce of the cloud, and containers are the middleweight cloud engine, we’ve never considered a third kind of cloud computing, designed to be very fast to start up and shut down and to scale up and back. So we thought, let’s research that. Let’s throw out that it must do the same stuff as containers or VMs. We set our internal goal as 100ms – according to research, that’s how long a user will wait.
Lambda was designed more for when you don’t know when you want to run something, but it’s going to be pretty big when you do. It’s for that big, bulky, sporadic use case. But if you take away the lag time, then you open up another bunch of use cases. In the IoT space, for example, you can work down closer and closer to the edge in terms of just responding to an alert rather than responding to a stream.
Absolutely, and this is when we turned to WebAssembly. For most of the top 20 languages, you can compile to it. We figured out how to ship the WebAssembly code directly into a service and have it function like a Lambda function, except the time to start it up. To get from zero to the execution of the first user instruction is under a millisecond. That means instant from the perspective of the user.
On top of that, the architecture that we built is designed with that model in mind. You can run WebAssembly in a multi-tenant mode, just like you could virtual machines on hypervisor or containers on Kubernetes. It’s actually a little more secure than the container ecosystem.
We realized if you take a typical extra large node in AWS, you can execute about 30 containers, maybe 40 if you’re tuning carefully. With WebAssembly, we’ve been able to push that up. For our first release, we could do 900. We’re at about 1000 now, and we’ve figured out how to run about 10,000 applications on a single node.
The density is just orders of magnitude higher because we don’t have to keep anything running! We can run a giant WebAssembly sandbox that can start and stop things in a millisecond, run them to completion, clean up the memory and start another one up. Consequently, instead of having to over-provision for peak load, we can create a relatively small cluster, 8 nodes instead of a couple of 100, and manage tens of thousands of WebAssembly applications inside it.
When we amortize applications efficiently across virtual machines, this drives the cost of operation down. So, speed ends up being a nice selling point.
So, is this where Fermyon comes in? From a programming perspective, ultimately, all of that is just the stuff we stand on top of. I’ll club you in with the serverless world—the whole kind of standing on the shoulders of giants model vs the Kubernetes model. If you’re delving into the weeds, then you are doing something wrong. You should never be building something that already exists.
Yes, indeed, we’ve built a hosted service, Fermyon Cloud, a massively multi-tenant, essentially serverless FaaS.
Last year, we were kind of waiting for the world to blink. Cost control wasn’t the driver, but it’s shifted to the most important thing in the world.
The way the macroeconomic environment was, cost wasn’t the most compelling factor for an enterprise to choose a solution, so we were focused on speed, the amount of work you’ve got to achieve. We think we can drive the cost way down because of the higher density, and that’s becoming a real selling point. But you still have to remember, speed and the amount of work you can achieve will play a major role. If you can’t solve those, then low cost is not going to do anything.
So the problem isn’t the cost per se. The problem is, where are we spending money? This is where companies like Harness have done so well as a CD platform that builds cost management into it. And that’s where suddenly FinOps is massive. Anyone with a spreadsheet is now a FinOps provider. That’s absolutely exploding because cloud cost management is a massive thing. It’s less about everyone trying to save money. Right now, it’s about people suddenly realizing that they cannot save money. And that’s scary.
Yeah, everybody is on the back foot. It’s a reactive view of “How did the cloud bill get this big?” Is there anything we can do about it?
I’m wary of asking this question in the wrong way… because you’re a generic platform provider, people could build anything on top of it. When I’ve asked the question, “What are you aiming at”? People have said, “Oh, everything!” and I’m like, oh, that’s going to take a while! So are you aiming at any specific industries or use cases?
The serverless FaaS market is about 4.2 million developers, so we actually thought, that’s a big bucket, so how do we refine it? Who do we want to go after first? We know we are on the early end of the adoption curve for WebAssembly, so we’ve approached it like the Geoffrey Moore model, asking, who are the first people who are going to become, “tyre kicker users”, pre-early adopters?
We hear all the time (since Microsoft days) that developers love the WebAssembly programming model, because they don’t have to worry about infrastructure or process management. They can dive into the business logic and start solving the problem at hand.
So we said, who are the developers that really want to push the envelope? They tend to be web backend developers and microservice developers. Right now, that group happens to be champing at the bit for something other than Kubernetes to run these kinds of workloads. Kubernetes has done a ton for platform engineers and for DevOps, but it has not simplified the developer experience.
So, this has been our target. We built out some open-source tools and built a developer-oriented client that helps people build applications like this. We refer to it as the ‘Docker Command Line’ but for WebAssembly. We built a reference platform that shows how to run a fairly modest-sized WebAssembly run time. Not the one I described to you, but a basic version of that, inside of your own tenancy.
We launched a beta-free tier in October 2022. This will solidify into production-grade in the second quarter of 2023. The third quarter will launch the first of our paid services. We’ll launch a team tier oriented around collaboration in the third quarter of 2023.
This will be the beginning of the enterprise offerings, and then we’ll have an on-prem offering like the OpenShift model, where we can install it into your tenancy and then charge you per-instance hours. But that won’t be until 2024, so the 2023 focus will all be on this SaaS-style model targeting individuals to mid-size developer teams.
So what do you think about PaaS platforms now? They had a heyday 6 or 7 years ago, and then Kubernetes seemed to rise rapidly enough that none of the PaaS’s seemed applicable. Do you think we’ll see a resurgence of PaaS?
I see where you are going there, and actually, I think that’s got to be right. I think we can’t go back to the simple definition of PaaS that was offered 5 years ago, for example, because, as you’ve said before, we’re 3 years behind where a developer really wants to be today, or even 5 years behind.
The joy of software – that everything is possible – is also its nemesis. We have to restrict the possibilities, but restrict them to “the right ones for now.” I’m not saying everyone has to go back to Algol 68 or Fortran! But in this world of multiple languages, how do we keep on top?
I like the fan out, fan in thing. When you think about it, most of the major shifts in our industry have followed that kind of pattern. I talked about Java before. Java was a good example where it kind of exploded out into hundreds of companies, hundreds of different ways of writing things, and then it sort of solidified and moved back toward kind of best practices. I saw the same with web development, web applications. It’s fascinating how that works.
One of my favorite pieces of research back in my academic career was by a psychologist using a jelly stand, who was testing what people do if you offer them 30 different kinds of jams and jellies versus 7. When they returned, she offered them a survey to ask how satisfied they were with the purchases they had made. Those that were given fewer options to choose from reported higher levels of satisfaction than those that had 20 or 30.
She reflected that a certain kind of tyranny that comes with having too many ways of doing something. You’re constantly fixated on; Could I have done it better? Was there a different route to achieve something more desirable?
Development model-wise, what you’re saying resonates with me – you end up architecting yourself into uncertainty where you’re going, well, I tried all these different things, and this one is working this. It ends up causing more stress for developers and operations teams because you’re trying everything, but you’re never quite satisfied.
In this hyper distributed environment, a place of interest to me is configuration management. Just being able to push a button and say, let’s go back to last Thursday at 3.15pm, all the software, the data, the infrastructure as code, because everything was working then. We can’t do that very easily right now, which is an issue.
I had built the system inside of Helm that did the rollbacks inside of Kubernetes, and it was a fascinating exercise because you realize how limited one really is to roll back to a previous state in certain environments because too many things in the periphery have changed in addition. If you rolled back to last Thursday and somebody else had released a different version of the certificate manager, then you might roll back to a known good software state with completely invalid certificates.
It’s almost like you need to architect the system from the beginning to be able to roll back. We spent a lot of time doing that with Fermyon Cloud because we wanted to make sure that each chunk is sort of isolated enough that you could meaningfully roll back the application to the place where the code is known to be good and the environment is still in the right configuration for today. Things like SSL certificates do not roll back with the deployment of the application.
There’s all these little nuances. The developer needs. The Ops team platform engineer needs. We’ve realized over the past couple of years that we need to build sort of haphazard chunks of the solution, and now it’s time to fan back in and say, we’re just going to solve this really well, in a particular way. Yes, you won’t have as many options, but trust us, that will be better for you.
The more things change, the more they stay the same! We are limiting ourselves to more powerful options, which is great. I see a bright future for WebAssembly-based approaches in general, particularly in how they unlock innovation at scale, breaking the bottleneck between platforms and infrastructure. Thank you, Matt, all the best of luck and let’s see how far this rabbit hole goes!