Today's leading minds talk DevOps with host Jon Collins
Liz Rice is the Technology Evangelist with container security specialists Aqua Security, where she also works on container-related open source projects including kube-hunter and kube-bench. She is chair of the CNCF's Technical Oversight Committee, and was Co-Chair of the KubeCon + CloudNativeCon 2018 events in Copenhagen, Shanghai and Seattle.
She has a wealth of software development, team, and product management experience from working on network protocols and distributed systems, and in digital technology sectors such as VOD, music, and VoIP. When not writing code, or talking about it, Liz loves riding bikes in places with better weather than her native London, and competing in virtual races on Zwift.
Jon Collins: Hello, welcome to this edition of Voices in DevOps where I’m delighted to welcome Liz Rice who I’ve known for a few years. Not from a security perspective, interestingly, so. I’m fascinated to know more about that. And I understand also now things to do with CNCF, and I haven’t even said who you are yet. It’s technology evangelist with Aqua Security, Liz.
Liz Rice: Hello. Hi. Thanks for having me on the show, yes.
Why are we talking here with you, Liz? What brings you to the DevOps party? What’s your background, and how did you arrive at this point?
So, my background is development, software engineering, and a few years ago, I got through a startup that I was working on. I got very interested in containers and involved with containers. And, that startup kind of bled to death, as is the way of startups, but I was fascinated by containers. And I wanted to kind of keep in that field, and I was fortunate enough to be introduced to Aqua Security who do security for containers and cloud native software. And we hit it off really really well, and I’ve been there now for just over two years learning a lot about security. I’m coming at it from a developer’s point of view but learning a lot about more the kind of upside and the security side as I’ve been going, so hopefully, I can kind of take that learning experience and share that with other people.
Also, interestingly, you’re coming to it from the point of view of more of a startup. You’re having to do security whereas it wouldn’t necessarily have been the first thing you did whereas a lot of enterprise companies are already doing security, and they’re having to do DevOps.
That’s true, yes.Yes. Yes.
That must vary things. And how did you get fascinated about containers. Glass of wine, one thing led to another?
Something like that, yeah.
Was it a logical choice? How did that all happen?
I remember the first time anybody mentioned the word Docker. He was pretty enthusiastic, and he was like this is going to change the way we ship software. It’s super exciting. And at the time, I thought, okay, yeah, fine. I learned more around it, and then I think at the point where I kind of realized how containers offer many different attributes, many different positive characteristics to different groups of people. They make it easy to ship software repeatedly. They make it easy for developers to run containers or to run software on their laptops. They make it easy to run that same software in the cloud.
And then from a sort of an architectural point of view, they kind of help us start thinking about breaking large stratifications into microservices. A container is a natural fit for those kind of subcomponents, and then you can start thinking about how you secure not just the whole application but the kind of individual containers, or it could might as well have been individual pods. And there’s all sorts of interesting aspects to containers. Plus, I really, really like kind of how containers are actually made out of Linux kernel constructs. It’s fascinating how that really – I can geek out about that all day.
I mean, interesting though, I was writing about what I coined the Goldilocks principle. I actually mentioned it in the last podcast,this idea that the module of software would be neither too big nor too small, but just right. And that’s really hard to – what’s just right? I don’t know. I mean, yeah, how many services does it contain, or what’s the API look like? But actually, if you’ve been around the block a few times, you just kind of go, yeah any bigger than that, it would just be messy. Any smaller than that, it would just be messy. And you’re doing complexity at both ends of the scale.
Then I think the nature of security is – where was I going with this? Yes, bear with me. So, we’ve been trying to do that right for years. I mean, in the 70s and 60s and probably even before then, we’ve been trying to get that just rightness. The thing that changed in the 80s was distributed systems, and then the 90s obviously the internet. I think what we’re seeing with microservices is finally we’re getting to a point where we can both do the just right thing, and we can build them in a massively scalable distributed way. So, it’s easy for people like me to go – sit on the outside as observers and go, oh, well, it’s just what we had before, but actually, I think we are at a different point.
Yeah, I think one of the things that we’ve added with this containerization of microservices is the idea of scaling different bits of function independently. I think that’s one of the real strengths of this kind of cloud native approach. You don’t have to take your giant monolith and scale the whole thing. You can just scale up the parts of the software that are actually busy at any given moment. I mean, I think that’s a huge strength of this kind of architecture.
So, I’m going to try and say I understood that. I was like heads up, everyone. Before I did this podcast, I said I’d occasionally play the incompetent. And I think this is one of those moments, so bear with me. So, what we get with containers is not just the fact that you can have a separate units of codes that can be separately deployed, but also, then we can duplicate it and sort of scale it out, if you like, that old model of distribution by having multiple versions of the same thing. Am I on the right track?
Oh, thank God. I have never done this. I mean, I’ve done a bit of PHP recently, but this is beyond my tent. So, carry on, please.
So, I guess you can think about – oh, I don’t know. Let’s say you’re running an e-commerce website and you have maybe some back end processing that you do around stock. I don’t know, maybe updating levels of stock, and most of the time that back end processing isn’t really doing very much. So you don’t need many resources dedicated to it. But if you suddenly get, I don’t know, a new delivery of stock – I’m not quite sure how this matter will go. You suddenly need to process all that stock. You can spin up the containers that run the code that deal with updating your stock levels and just temporarily while you need to do that work. Meanwhile, your order processing may be whatever. When you’re busy on Black Friday, you’d have a whole load of containers scaling up around payments and taking orders.So being up to scale at different parts of your system independently according to demand, it’s a real benefit of this architecture.
It’s kind of what we had with virtual machines, but that was doing it at an entire – the whole stack level. That was obviously going to have overheads whereas with containers we’re trying to kind of just have unique little units that then sits on the stack. So as you say, it also runs the top of Linux, ultimately, doesn’t it? Yes, Quicken can run Kubernetes on top of Azure can do as well.
Who cares what’s underneath?
There are such things as Window containers as well as Linux ones, although I pretty much – at the point where we say that Windows is an operating system, that’s pretty much everything I know about it. I’m much more knowledgeable about Linux.
Before I get into – my wrist slapped by the people over AWS, that doesn’t all have to be Docker either. There are other models or Kubernetes, so there you go.
Yeah, that is one of the interesting things about why landscape is evolving, and there are lots of different tools and platforms out there that might seek different enterprise needs differently. Docker have in the last, what, year embraced Kubernetes, so that’s kind of been quite a big shift. It used to be the case that the runtime, the kind of low-level part that actually runs your container was always essentially a Docker component, but we’re seeing increasingly things like CRI-O from Red Hat that are a separate and different runtime, which is a level of detail that most – it’s pretty down in the weeds. It’s not the first thing to think about, but it’s interesting that there are all these different components that can be sorted in and out. And that’s one of the interesting things when we think about it from an Aqua perspective and being able to secure your containers whatever your runtime is and whatever the platform is and whether you’re – I mean, as you mentioned, Amazon, you might be using something like Fargate or Azure containers as a service, Azure container instances. There are different ways of running containers and being able to apply the same security principles wherever you’re running your containers. You can take the same container image and run it anywhere, but being able to hook those containers together, sort of connect them together and do things like security around them, is a bit more of a challenge.
Let’s get on to that. I mean, certainly, what I’ve seen from an observatorial standpoint – can I use that word? Does that word even exist? I don’t care. So anyway, what I have seen is that over the past two years, people have gone, oh, why not just use Kubernetes? That’ll do as a kind of default. And suddenly, everyone’s talking about that, and there may be a lot of other choices. It didn’t suddenly appear in a vacuum with no other choices, but generally, that’s the kind of default statement. And I think it’s born out in research for those that are doing the rounds. I think while only a small number of organizations overall are fully embracing Kubernetes as their kind of first thing, most organizations are talking about it as what they would use, so that’s the mindset.
Then, as we get on to having chosen the way that you’re going to do it, then you get into, whatever you’re using, these higher level issues start to arise. So sure, we want to build a massively distributed system. Yeah, but is it secure? How do I know? So, I’m not going to second guess what kinds of challenges you hit as yourself. Maybe just run us through what kinds of challenges that are faced by people trying to build these distributed systems from a security perspective.
So, there are lots of things that you can do to improve security throughout the lifecycle of a container. But I think one of the key challenges that we sometimes see is recognizing that traditional security tools don’t always give you everything you need. The key difference is that now your code is sitting inside these container images, and then, at the point where you’re running it, it’s sitting inside containers. And you need tools that understand how to get inside those images and inside those containers. And the reason why they want to get inside is to check for vulnerabilities. The biggest problem in security, it doesn’t matter whether it’s traditional or container architectures is known – people exploiting known vulnerabilities, if you’re familiar with things like Heartbleed of Shellshock, that kind of famous vulnerabilities. There were plenty of other, thousands of other vulnerabilities out there. And when I talk to people and they say what’s the one thing that I should do to improve my – if I was only going to do one thing, what should I do? And my answer to that is scan your container images for known vulnerabilities, which in a DevOps environment, you want to do – you want to automate that process. You can build it into your CICD pipeline. Depending on the scanner you’re using, you may have different levels of granularity around the rules you can apply, but the fundamental thing you’re trying to do is don’t run a container image that has a high severity of vulnerability.
Call me old fashioned but yeah.
Yes, and then that leads you to the next question of, well, how do you know which containers you’re running? Are your containers approved? Have you just pulled this container from the random Internet, or is it something that you have actually scanned and checked and know came from the source that you expected it to come from? So, it’s like Aqua will do the scanning for you, but they will also validate at the point where you deploy a container that it was scanned and that it meets your criteria. And then you can get into this last part, which I just – I find the most fascinating part about container security which is runtime, so you think about your microservice that is just doing one job. It may only need one executable to do that one job, or maybe it’s got a few initialization processes, and then it runs. And, if you can place that container and make sure that you at least spot if it runs something unusual and I didn’t prevent it from running something unusual, that’s hugely powerful, and it’s something you couldn’t really do in traditional deployments because there’d be so many different processes running on your virtual machine. It was pretty hard to spot something anomalous. But inside a container, there’s not so much going on, so it’s much easier to detect those anomalous behaviors. You can see the real power of that for detecting ongoing attacks. It’s very, very cool.
I can see the power. I can also see the huge challenge from the point of view of, in the old days, you’d just say is your application secure? And you can run a code scanner. You can scan the library. Then you’ve got one executable. You go, yeah, I’m pretty confident on that one, whereas now I don’t know how many containers you might end up with, the runtime level, and I don’t know where they would be. So it’s both the power but also the challenges. You’ve basically not got such an easy notion of an application anyway.
For sure, I mean, we typically see 100, 200 microservices in a deployment. That’s not uncommon. I mean, some people, they’re just – they’re containerizing one giant application. That also happens. But when people are breaking things, the microservices architecture, hundreds is pretty common, and then you might have potentially dozens or hundreds of incidences of each container image. So, you cannot do it manually. You have to have automation to do all the scanning to validate your containers as they’re being deployed. And anything that you try to do manually would quickly turn into a nightmare.
And so, each one of those, essentially, you can, I don’t know, flag as green. But yeah, I’ve got 200 things running out there, and we scanned all of them. And as far as known vulnerabilities are concerned – I need to put my teeth in when I say that one. As far as known problems are concerned, we’ve scanned everything in our National application for those things.
That’s right, yes. We didn’t deploy anything that didn’t meet our criteria. New vulnerabilities can get discovered, and if we subsequently rescan – it’s a good idea to rescan your images every so often. You might find, oh, a new vulnerability has been discovered in this dependency, and it’s in this image and this image. So, we need to identify all the – we need to rebuild those images with the updated version of the dependency so that it doesn’t have vulnerability just like patching in the old days but potentially quicker. And we need to identify the running instances of the container with the old version and replace them with new versions, and again, it’s something that kind of needs to be done automatically because by hand is hard.
Yeah, I was going to say it brings me to a question, which is, sure, I get that it’s kind of turning everything from don’t know to relatively green. I say relatively because there could be things you got to take a decision on becausethings were a bit dodgy. It could be a risk, but we’ll run with it anyway, maybe put numbers in there. But then how to, the phrase shift left keeps popping into my head. Get out! Get out! I’ll let you say it. How to make sure that you’re not introducing new ones. I can see you, for example, turning up with your box of CDs or whatever anyone does these days.
We ship using containers these days, Jon.
I know. I still remember how many floppies there were. It’s 44. Remember that? As a one shot, I get it. As a how to integrate it into the software development lifecycle and into the CICD pipeline and so on, it’s harder, isn’t it? Or is it just a case of, well, we just dropped a – just make sure it always goes through our filter and drops a good one, or is there a process where it needs to take place at? How do you get over that hump?
Yes, it’s basically a case of adding it into the CICD pipeline. You mentioned shift left, and that’s all about trying to move things as far left in that pipeline as you can. And so, doing things like scanning your containers at the point where you’re building them so that developers can see whether or not they’re – if they’ve pulled in some dependency that has a problem, they can see that really early on in the life cycle.And having things like scanning kind of as a plugin for common source like Jenkins. You just drop the plugin in, and it will automatically scan your images as they’re built and before they get pushed into the container registry. And so, that’s a very common model, and I think that’s the key to getting past this hump is adding that automation in early into the – in your adoption of a DevOps process. Having security, thinking about how you can plug that in. It’s like all these things. It’s not rocket science, but you just have to do the work to get the plugin into your pipeline. And then that gives you useful information.
I was talking about a security by design, shift left, that whole – it’s a model where you’re just building things securely in the first place is a really good thing, right?
And the pushback I got on it was not at the expense of innovation when we’re just – we want to try 20 different things, 20 different ways of doing something, and at that point, we don’t know which one’s going to be the best. We don’t want to add – what would be my view is that you could be thinking about security across all 20. They’re saying we just can’t – that would mean we could only do – we could only try out 15 different ways of doing things because we don’t have unlimited time. So, does that reflect the kind of conversations you’re having with organizations, or are they – do they just suck it up and get on with it? It came from a CIO, so I’m liable to take it as reliable as I can.
Yeah, it seems to me that if you – you want to take your security seriously, and to be fair, as a security company, we tend to be having conversations with people who are concerned about security and want to do best practices. Once you’ve dropped those tools into your pipeline, it really doesn’t make any difference whether you’re – you may as well run them on everything. And does it take time to run a vulnerability scanner over an image? Sure, it takes a bit of time.But it’s a price worth paying to make sure you’re not running with the same vulnerability as the Struts vulnerability. I think it would be a – what do you call it, like a false economy? False economy so it’s a short – scanning things before you put them into a production environment at the very least.
I mean, to be fair, what you’re talking about is vulnerability scanning as opposed to people doing the stupid scanning.
Yeah, I mean, things like the static analysis stuff is valuable. It’s never going to get – it’s going to be perfectly possible to write insecure code that gets past a static analysis tool, which is no – it’s not supposed to be a criticism of static analysis. It’s just a very difficult problem. In general, to say my code is definitely secure, that’s a really hard thing to say. But why not just run the tools? If you have them available, just run them. You don’t have to have them block your deployment process. If you’re in a real hurry, you could just run them overnight after the event if you really, really, really wanted to push things quickly.
You’re stuff deals with – it’s essentially about an automation overhead as opposed to process overhead whereas the static analysis, that kind of – after you coded it in a secure way, it’s more about process overhead. Then, I mean, you have peer review, and that kind of thing is possibly unavoidable. It’s still going to be one of the best ways of, yeah, just another human who’s looked at – who knows that kind of problem space or me.
For sure, yes. What were you thinking?
You need to realize that. What were you thinking, yeah?
Yeah, no, absolutely. Humans will still be – however much we talk about automation, humans and expertise is always going to be a core requirement for writing and shipping software. So I think sometimes people worry that their jobs are going to be automated away. There’s going to be plenty more – human eyes are going to be needed on code for some time to come.
I mean, interesting, you said – with Aqua and more broadly, we were at a state – we seem to go back 20 years. It’s most definitely when it comes to – we were at a state 20 years ago where antivirus software, for example, so desktop vulnerability scanning was the exception. And then we kind of pass a point where, oh, yeah, of course you got to do it. And you would not have an antivirus scan, and then it became part of the platform. With tools like yours, it seems to be that you still kind of – and feel free to contradict me on this, but you’re still seen as the exception. And you’re going look, mate, you need to have this stuff. And would it be there in 18 months’ time, in 5 years’ time, whenever, and people just kind of go, yeah, of course, and it just becomes part of the platform? Where do you think we are?
I would hope so. Yeah, I mean, it’s difficult to tell scientifically, but anecdotally, I would certainly say that – when I was first getting involved in this security side of containers, not very many people knew about vulnerability scanning at all.And now, there’s quite a few tools out there. There’s the commercial tools like Aqua. We also have a free one called MicroScanner, and there’s an open-source scanner called Clair. There’s lots of options out there that people are using pretty broadly, I’d say. And I don’t know what the percentage of adoption is, but I’m sure it’s only up. I think normally these days, if I do talk, when I talk about vulnerability scanning, not everybody’s going to say they’re doing it, but a larger portion of hands go up.
Do you get people – I mean, does anyone ever say, yeah, I can’t be bothered with that, or is it all – it’s a fair cup?
Actually, not so much I can’t be bothered with that. What I have heard is false positives are a nightmare. And I actually was speaking to – this is probably 18 months ago now, but a large German retailer, he had been using an open-source scanner. He’d given up on it because he just felt that they were spending so much time tracking down false positives that the benefit was getting lost in this overhead. So that is a concern, and it’s one of the things that differentiates one scanner from another.
You nearly pitched there. That was good.
Well I try not to.
Yeah, this is a podcast, not a webinar.
Yeah, exactly. But the reason why you get these false positives is because mostly vulnerability data or the sort of core source of vulnerability data is this thing called the National Vulnerability Database. It’s American, so therefore, we say it’s National rather than International but anyway.
Yes, like National baseball championship.
Yes, the World Series. They have a great database there of telling you which version of a particular library or which versions rather are susceptible to any given vulnerability. But what they don’t have in their NVD is any information about patches, and if you have a Linux distribution, it may or may not have a number of these vulnerabilities patched. And so you could be using a version of the library that is a base version with the vulnerability but plus a patch, so you don’t actually have a problem. And the more sources of information that scanners take like information from vendors about their distributions or information from vendors about their particular components of their software, more sources of data generally speaking, the less likely you are to have false positives. But if anybody ever tells you that they have no false positives from their vulnerability scanner, they are lying.
So if you could wave a magic wand other than use our stuff to get organizations just better at DevOps security for ones that keep it in scopes, what would be just main thing you would advise, the main thing that you would change, the main thing that you would want to just see being done differently right now?
Well, I think it really – if there’s just one thing, it’s get that vulnerability scanning built into your pipeline. Just have that automatically finding builds. If you’re hitting high severity vulnerabilities, that would be a huge step for any organization that’s not currently doing anything. If they are doing scanning, I would say the next step is make sure that you’ve got a process, some kind of gating, some sort of admission control set up to make sure that what you run in your deployment has been scanned and has been given the thumbs up. Because there’s a huge potential – if you don’t have any checks there, there’s a huge potential for deploying the wrong images that have not been through your beautifully set up pipeline. I’m sorry. I cheated there. I said two things.
No, that’s fine. You essentially answered the question exactly as I put it, which is do the scanning, but then have a framework around it. I hesitate before I use the word framework. But have enough structure around it to make sure that you take the scanning into account the correct way, and so you’re not just deploying rubbish anyway, all your insecure stuff anyway.
Cool. So I think we’re running up against time, so I think we’ll leave that as the last word. And I’m actually not going to leave that as the last word. No, I’m going to say it does kind of surprise me that vulnerability scanning is still so new with containers. Where do you think things will be in two years’ time, and then we’ll wrap up?
I think the reason why vulnerability scanning is new is because this whole adoption of DevOps is still new for a lot of organizations, so there’s a huge number of enterprises still finding their way around this. And I think we’re going to see in two years’ time – I mean, just looking at the growth of end users who come into KubeCon , for example, and talk about their experiences, it’s gone from a few outliers to a lot of enterprises. Almost everybody is adopting containers, and I think, if they’re doing it in any kind of serious way, in two years’ time, we’re going to see the conversation will have moved on from vulnerability scanning. That will be a known obvious thing, and we’ll be much more talking about the runtime features and the kind of things we can do to protect against zero days.
Yeah, I guess, it goes through that way. So yes, sure, let’s do this. Oh, it’s a performing thing, isn’t it? So everyone’s saying right now they’re forming and saying let’s do this. Actually, norming of, oh, didn’t realize it would be so hard. Then we’ll get into the norming, which is where vulnerability scanning and a lot of other things will become more default than exception and then into proper performance. Let’s see if it takes two years or however long it’s going to take.
Maybe even faster. It’s moving so fast.
It is moving very fast. With that in mind, we best get on with it, hadn’t we? I need to say thank you so much, Liz, for joining me on this podcast. I certainly learned a lot, and I hope you out there in the audience did too. If you’ve got any questions, we’ll be tweeting at your Twitter handle, Liz, so if you’ve got any questions for Liz, please do respond to our Twitter. And we’ll endeavor to respond. Thank you, Liz.
Sounds great. Thank you, my pleasure.