In this episode, Jon Collins speaks with Weaveworks COO Steve George about Kubernetes, microservices and DevOps.
Steve joined Weaveworks in February 2017 as COO. In a career spanning 20 years, Steve has worked in a range of roles in the technology sector, most recently leading Canonical’s operations and corporate development. His interest and support for FOSS goes back to 1997, when he got hold of his first copy of Slackware on floppy disk.
Jon Collins: Hello and welcome to this Voices in DevOps podcast. This week I'm here to speak to Steve George who's COO (Chief Operating Officer) at Weaveworks, which is a company that's built its business around having a Kubernetes target. So kind of ‘right place at the right time,’ if that's the right thing to say, always a really safe bet. But back in the day when you were founded I don't know...
Steve George:Yeah I don't think it was a totally safe bet. Ironically we were actually founded exactly the same time when Google was launching Kubernetes. So it does feel a bit like fate had something in store for us there and Kubernetes has become sort of the center of what we were doing. But you know early days really it was all about containers and Docker. And then... I guess the guiding goal there was trying to enable developers and operators to have the same experience. So that's really that kind of devops thing at the heart.
And then over time Kubernetes has become the orchestration technology which has become the center there. But certainly there are a number of other cloud native technologies now that are coming together under that whole CNCF banner.
And before I pick you up on anything, I'm going to pick myself up. I think I'm pronouncing it wrong. It's ‘Kubern-a-tes, is it, not Kubernetes
Yeah, I guess it's a bit like Lin-ux and Len-ex, right?
Which I'm still not sure I get right. Just my claim to fame about Linux is that I have a footnote in the French Linux manual written in 1991 or something like that. So there you go, ‘Go me.’ I was a slight way user back then it was 0.99% or something or other. These days I just try not to break things, so I just write about stuff. But enough about me. We're here to talk about you.
We're here to talk about devops and we're here talking about that whole microservices, devops kind of thing. But first just just tell me a bit about yourself. How did you arrive at this place? What's your path to COO at Weaveworks been?
Yeah, well so funny you bring up Linux there because I basically spent 10 years at Canonical, the company behind the operating system. And I was responsible there for developing what became the Cloud and the Enterprise division. But in the early days, that was really all about server and then Openstack, and then our strategy to taking Ubuntu into public cloud as we saw that developers were really focusing on public cloud as the place where they were going to build the next Google, the next ‘you know whatever,’ the next future wave of innovation and it was all public cloud centric.
So and then before that I was like you know I'm of the age which was part of the sort of Internet revolution. So bringing the Internet into companies and businesses and so forth. For me the long running strand is how we can bring innovation and technology together and how we can bring those benefits to a wider audience. So what I really care about is taking a technology which is cutting edge, and then making it available to a much wider group of users so that they can take advantage of it.
So you're a ‘webennial’ then?
Yeah exactly I'm a webennial, you’re right. I'm going to use that.
You heard it here first. I always wondered, I wish I'd been working in the ‘70s because that was a really happening time for technology, but then I kind of started my career in the late ‘80s which was also on open systems. So that was a really exciting time, and now is a really happening time, so I don't think there's a wrong time.
Every decade we seem to get a new big, big moment and I suppose that... is what really got me into Weaveworks, because when I was looking for the next thing, it was quite clear that it wasn't just about Docker or individual technologies. But what we're really talking about is a generational change in how we drive the development and the deployment and the whole of the software lifecycle, in part, over the fact that everything is now web centric, and we're bringing that to the fore. So a lot of the things that I started out with a client server space are gone, and now it's about application development and delivery being completely centered around this set of web technologies.
That's what we kind of call microservices, right? We'll get into that but you used the term earlier which was ‘cloud native.’ So I was thinking, ‘oh that might ring a few ‘what the heck’ bells, as in ‘what the heck is it?’ bells So maybe we could start by saying what you mean by 'cloud native' and how does micro services fit into the cloud native definition?
Right. So I think cloud native is really to do with the Cloud Native Computing Foundation, which is the CNCF Foundation, which is part of the Linux Foundation, and it's bringing together a bunch of these technologies which help you to develop in a ‘cloud first’ way. So Kubernetes is I suppose the touchstone technology there, developed by Google and then released to the CNCF yes, but there's a number of other really important technologies: network overlays for Docker containers, SDA and so forth, really sort of core technologies which are needed to develop in the cloud first way.
I think you're right, it's “What does that mean really?’ There's a set of practices around containers both in development and in production, around orchestration, meaning being able to operate those containers at scale in a production environment, and then application architecture, meaning microservices going beyond 12 factor applications to enable a more dynamic way of developing and deploying new changes; and then a whole set of practices around that as well: from devops, from automation, from observability and insight reliability engineering. So it's sort of a completely new way, both from an application technology perspective and from an operations perspective.
It's interesting and I wouldn't claim to do anything. The closest I get is a bit of BHPB these days (which is kind of like 10 years ago as far as anyone else is concerned), but I'm trying to get my head around a lot of this stuff, and as far as I can see, there's two things that really, really matter in the cloud native world and we've been speaking about all of it, but you touched upon the other one: the first is application architecture.
So when you're thinking from a microservices perspective, you're thinking about your application chunks and how they relate to each other, etc.. That's fine. The other one that's become really, really important is the networking and you mentioned SDA, as just if you can't get... So essentially we're building massively distributed applications that the bits need to be able to talk to each other however and wherever, and therefore getting the network right has become really really important. And it was always really important, but it becomes one of the two really important things as opposed to all the other stuff.
Yeah. And I mean coming from the sort of slightly earlier position, the way I now think of it is that basically what we're talking about at this point with Kubernetes and so forth is when I was building applications, the distributed system was within a single operating system. I was thinking about processes and how they talked to the to each other and inter-process communication and so forth.
And now with Kubernetes, what we're talking about is containers are an individual process, and so now we're thinking about inter-communication between containers so that involves the network, and really Kubernetes is the operating system for you know running all of those processes. So if you look at it as being ‘Kubernetes is your kernel’ and your individual applications, your microservices are processes running under that operating system, then it feels a bit more straightforward or at least it feels a bit more straightforward to me.
That makes sense, and it's an operating system. It has to be one that people are seeming to tend more and more towards. So there's dockers form, I mean Amazon's got AWS, it's got its own stuff, etc.. But so this is the domestic cloud services. So as you say, it's a massively distributed operating system [with] multiple applications, so that all makes sense. Want some of that, I can see some.
Let me let me throw this in because you say that it's kind of a new way of doing things. Something that's been an idea I've been playing with (which I'll say very quickly and then we can move on) is that this feeds right back in to very, very old 1970s structure design techniques around modularity and cohesion and coupling and reducing dependency. Eddie Yourdon's stuff, Larry Constantine. But the difference is now we can do it globally, so the net new is that you can you can run one container in South Africa and another container in Russia and you have an application, whereas they were thinking as you say, very much within within an individual computer.
Yeah I mean I think that kind of linkage question it's a continuum, and that is a sort of base principle that all computer science operates on. So in some senses you see this sort of continuing situation where things are the same. But it's interesting that we're now seeing new ways of thinking about that. So that global nature of how people want to or need to develop and deploy their applications brings new levels of complexity.
Yeah. Don't don't get me wrong: the reason I say it is because I'm playing with the idea [that] it's really useful hook when we start to talk about the enterprise and when we say to them, “It's all new. You've got to learn a whole bunch of new things.” There are some old hooks that we can use and say “Remember that stuff about getting applications right? Let's take that stuff and let's apply it to this new world” and it gives us a starting point.
Yeah, gravity hasn't reversed on the fundamental laws of approach.
Let go. I'll be using that one. So you're working with, let's call [them] the cloud native organizations. You're also working with enterprise organizations and lots of the conversations I'm having at the moment are around how enterprises can start to adopt this this smashingly clever stuff, with reason. Because if you're in a Volkswagen... just popped into my head there's an example of if you want something clever to run in the car, you want something to clever to run in the cloud, containers like those are the logical way to build out that application architecture where the two are communicating. So there's lots of good reasons to want to do this.
What do you think enterprises are struggling with, even if they think ‘yeah I'd really like to do some of that stuff?’ What what are you coming up against when you talk to organizations?
So I think... there's many enterprises. So the surprise for me at Weaveworks has been the significant traction from very large enterprises. I think I'd be correct to say that there's not a single large investment bank that we haven't worked with in the last year. And that's not normally I would say, the way in which these types of technologies are adopted. But there's so many positive drivers.
I think you alluded to one challenge for people is the sensation that's it's just all new, it's a completely new... break from the past in such a significant way that how do we apply what we know to this way of working or this new set of technologies, and what do we throw away and what do we keep? And the second thing for me is that it is technologies but it is also work practice. You know it's that old saw: people-process-technology.
And I think the other thing is that... there is a lot happening. It can be quite overwhelming, exciting and overwhelming. You know there's many new threads coming together right now...just from basic questions. We just referred to them, like ‘If I want to use the public cloud. Which public cloud should I use for this way of working? What's the right way to do my CICD process? Should I [or] should I not use SDA?’
There's a lot of these decisions all the way down from... all the way up at architectural level and strategy level and right down into how do we maintain control over specific technologies and uses. So it's kind of a three part problem.
And do you have to get everything right in order for anything to work? Or, it's a stupid example, but can you can you build a microservices based application using Waterfall?
Yeah absolutely. I don't think you... you probably wouldn't want to advise a client to do that from my perspective, but certainly I think you can mix and match. There's ways that you can go from a monolithic perspective to then breaking into containers. You can have sort of macro containers.Just take everything and put it in a container... a similar thing to what sort of happened in the virtualization phase, and then steadily break things out from an architecture perspective.
You can choose some of the more straightforward ways of using Kubernetes. To begin with... we often advise clients that they initially start with stateless applications rather than going fully stateful to begin with because it just keeps the complexity down. So... there's ways of doing that progression which, we spend a lot of time trying to work through the application set that the client is operating, and then sort of chunk them into sections, which makes it easier for them to take on board.
Then there's organizational things such as bringing together a ‘tiger team’ that is going to work on this. You tend to get some people within an organization who are very excited about it. Get them really enabled and then perhaps dispersing them into other teams so that you've got strong advocates within the organization [for] this new way of working, rather than feeling that it's been sort of just handed down..
Seeding the goodness.
I remember it started with forts or fortress or something like that, and then I did a bit of googling and Sun Microsystems bought a company called Forte, all those years ago. I remember it was more than a review, less than an audit, it was sort of a software project in a financial sector company. And essentially they they bought Forte and they wanted to move away from their mainframe based application that they built and they [went] and built the same mainframe application within Forte, which was which was a wonder to behold.
So it had all the same issues around the lack of scalability and slowness and difficulty to maintain, etc.. They got nowhere that they had done it in Forte, so blessed, which is all possible. But one thing that struck me about Weaveworks, what I'm remembering is that you are quite a process-y kind of company. You advocate for... I think you might have even invented GitOps. So how did it how did it happen that you realized that? Was it a the fact that you're those kinds of people, or did you realize the need to cover the process side as well?
Right, right. I think in part our DNA is quite ‘enterprise centric’ so we are an open source company, so often people see us in that way, sort of open source innovators in our carriage or something. But actually our founders Alexis Richardson and Matthias Redstock, they were involved with RabbitMQ and then after that, they were involved with Spring and some of the pivotal work. So I think we as a team are kind of very aware that for particularly larger scale development and operations teams, you need an operating model that is not just about jamming a new technology into a team that's exciting for six months but then they move on, and the new technology... the next new thing comes along.
So it's really about embedding a whole working model for the whole of the technology group, and making sure that that works for them so they can achieve the benefits they want, whether that's agility, speed, greater effectiveness, efficiency, whatever their central thing is that they want to get done. And so we came up with GitOps because in part what was happening was we had focused on developer tooling and the developer side of the equation, and what we were discovering is as part of our SaaS, we were delivering Kubernetes, and operating Kubernetes as well as some of the previous orchestration things in Docker. What teams were stopping us about was ‘well how do you operate Kubernetes? How do you do updates? How do you do upgrades? How do you release your applications? How does your CICD process work?’ You know all of these things which are really about the whole software lifecycle.
And so we coined this term GitOps to explain an operating model for building and operating in a cloud native way. What it's really about, as you can kind of hear in the name, is the Git part of it is all about storing configuration management within Git. And the Ops part of it is really about observability and automation. So at the heart, what we do is: Kubernetes basically operates on a model driven approach, so you tell Kubernetes, "I want you to run five or six instances on this application, configured in this way" and then Kubernetes goes away and does that for you, so it can be a bit complex and opaque.
And so what we're doing is bringing together that configuration management so you know exactly what you asked the system to do, and when you asked, and who asked... them to do that. And then the observability in terms of monitoring and understanding why the system operates in that way.
So going back to our earlier analogy about an operating system, on the one hand you can see it as being configuring and understanding who launched what application when, and then on the other side, you can think of it as being all of that process tooling like PS and all those kind of tools that are used to enable [people] to understand what the application is doing. But now we have that in the form of metrics monitoring and a range of observability tools.
And I'm gonna say forgive my ignorance, but I think that's going to be my catchphrase. When you use the term ‘observability’ is that an industry standard term? Or is that something that you're applying there?
Yeah, no it's an industry standard term and it's coming to mean a sense of things around basically logging metrics monitoring and the ability to treat all of the processes and all of the containers across a Kubernetes cluster, and to dig into them in the same way that you would dig into processes on an operating system. So it's really about bringing that kind of visibility into what's happening. So it's much more than simply dashboards for your application, it's really being able to query and understand exactly what's happening within all containers across the containers.
I was going to say...It's a bit more than transparency and visibility. Essentially when something changes, it's having a kind of insight as to why that thing has happened. I noticed that when I was browsing around, I was stalking your websites, stalking Weaveworks, but there was there was a Slack thing which said something like "so-and-so has changed, this could be because..." So you've actually created some kind of Slack event, which then triggered a set of potential reasons and possible actions. It's about actionability of observation and not just about kind of well you’re buggered mate.
Yeah exactly. Yeah. So we spend a lot of time with clients now working through... run books and playbooks and we do exactly that. So over time as you are operating your application, Kubernetes will trigger certain events and then we can connect that to a run book and an observability and we can say when you see this happening, “You know, this is what the query looks like, this is what the system looks like, and now here is the things that you should do in order to right that problem or resolve that issue.”
And that could be whatever is appropriate in that circumstance.We've put an automated check that will just restart that container or it could be OK, we're seeing this problem repeatedly on a Friday at three o'clock. I actually did have a client who had this and they were like, ‘every Friday at 3 o'clock our whole system slows down, we're not sure why,’ so we were able to gather data from the system over time. And then we could compare previous weeks and find the analysis and find the trend within that by digging into the analytics and then solve that problem.
So that reminds me, not the most complex bug I ever found when I was a programmer but my favorite was when I realized it was all the timestamps of all the files that were wrong were the same, and I realized it was the guy's birthday and what I reckon is that he came back from the pub and had worked out all the answers to all the problems he had been trying to solve for weeks, and he'd solved them all mostly and created a whole bunch of other problems. Once I knew that that was the situation... and then putting stuff back together again.
You had social background information that you knew it was his birthday as well.
Exactly. That's excellent. So we're coming up against time. And I think what's really come across here is the ‘enterprisey-ness’ of Weaveworks. That's not that's not blowing smoke. That's just where you're coming from. But therefore from that, if you could summarize [for] any organization, a larger organization that looks at this stuff with horror, and goes, ‘Look, we'd love to, but I can't see this ever working.’ What first steps would you advise in order to get that foot in the door? Should they start at the top? You mentioned tiger teams. How would you just give people that starting block?
Well obviously call me, but aside from that the obvious point I really like [is] the idea of tiger teams and small groups that you're bringing together to work on a constrained problem to understand exactly how it's going to work within your organization. Because I do think that there is a kind of bit in the industry where you get the messiahs who are telling you at events, “This is this is how you should do it, this is how you do it using the... (whatever large internet company, non-named company I won't mention) way. And if you do it this way it will work for you.” But for many enterprises they have a number of constraints both technologically, [and with] audit situations and so forth.
So I really like the tiger team approach because I think it allows you to take the best of the new way of working, but also apply it within your unique environment, learn what's going to work for you and then that's the basis upon which you can take it to the wider organization. So no big bangs, a progressive way of doing that development. And as I think I've said, it's both the technology and the working model, and I think that's where we think that GitOps is a great way of making that model innovative, but also keeping that control over exactly what's happening.
So just do it, but don't just do it all. And also do it within a governance framework, within a process framework, in a structured fashion.
I don't think I'm going to win any awards there for my revolutionary speech, but I think that really is the right way to approach it.
Ironically I think it's the kind of anti-award isn't it? Because as you say, it's not evangelism. It's ‘This stuff works! But don't try and don't bite off more than you can chew and do it in a structured fashion,’ which is quite refreshing to hear. Well with that thank you very much Steve. It's been great to talk to you and I hope we can do this again sometime.
Thanks John. Cheers.