Today's leading minds talk DevOps with host Jon Collins
David is the Director of DevOps at Lucidchart with 20+ years in DevOps (many of those years were before DevOps was a thing). He has led infrastructure, security, network, ops, and DevOps teams at several organizations including Fidelity Information Services, FamilySearch, and Lucid Software. In 1999, he was one of the top-rated Quake 2 players in the world, and he loves pizza.
Jon Collins: Hello, and welcome to this episode of Voices in DevOps where I'm here to speak to David Torgerson. I hope I’ve pronounced your name right there, David?
David Torgerson: You absolutely did. Thank you.
Excellent. Good. The first hump I've got over there. I’m here to speak to David Torgerson, who's director of DevOps at Lucid Software, all about the challenges and opportunities and what we can do about it with DevOps. I know that, David, you've got a much broader background than just the current company you're working for. Maybe you could give us a bit of an intro to who you are and what brought you to this point that we're here talking about DevOps.
Absolutely. Thank you for having me. This is going to be a lot of fun. I actually started my career in security. I was working for a Fortune 500 company. It was very slow changes. I spent a lot of time just building my career and deciding what I wanted to do. From there, I switched companies and went to another Fortune 500 company or similar sized in the nonprofit world, and spent half of my career in security working on developing policy, pen tests, really everything related to security, whether it was logical or physical.
You just said pen test, though. I'm going to interrupt because that's the most boring thing ever, isn't it? It's completeness and coverage and dull.
It sounds really amazing until you actually do it. Then it's 90% paperwork and 10% fun. I actually decided that I did not like paperwork, nor was I good at it. I decided to switch and move more to the infrastructure side of things. I had seen how especially in enterprises, the infrastructure side of things and the application side of things tend to move really slowly and there’s a lack of communication, and I actually really like communicating with people. It seemed like a great fit, and I had that technical background. I decided to switch more to the infrastructure side of things to help push along and marry security, application development and the infrastructure changes that were happening.
‘Opsy Ops’ then, IT change management, configuration management stuff.
Absolutely. The organization I was working at, which was –
That was really interesting compared to security, but you're still thinking: “There must be more than this.”
There absolutely was. The organization that I was working with was called FamilySearch. They had never really had anybody that had tried to bridge the gap of those three worlds: security, infrastructure, and application development. What that resulted in was a lot of animosity between the teams just because everybody wanted to hold onto their own kingdom. They didn't want outside influence. There was no incentive to collaborate. As a result, things simply moved slower.
I received a reputation of being the guy who could get things done. All I did was talk to people. I didn't actually do any work. I just interrupted work – is what I thought, by talking about the latest TV shows or things like that. Because of that communication gap that had been bridged, I received that reputation, which was great for me, and it boosted my career. It wasn't nearly as complicated to get things done as people made it seem to sound.
Did you turn into... I'm going to say ‘that guy’ but I don't mean that guy. I mean that guy who said “oh, well, we can't make this work. Hey, let's talk to David. David will sort it out.” Did you turn into the ‘sort it out guy’?
I did. Just a little brag on myself, when I was leaving the organization, the CEO saw me in the hall, and he came up and said, “If you ever want to come back, let me know. We'll have a position, whatever you want to do.” Two months later, I actually saw him at a conference, a very well [attended] large technical conference. He stopped me in the hall and he said, “Listen, for real, if you want to come back today, let me know your salary. You are my most trusted engineer,” things like that, so glowing praise.
Name your price; take my money.
Brilliant. Of course, you said no, because you do more interesting things.
Yeah, Lucid is just a way awesome place. It's the best place I've ever worked. It would be incredibly hard to leave just because it's mine.
Okay, fair enough. I mean, so with all that exposure, I'm really interested because essentially, I'll just say things as I perceived how DevOps can evolve in lot of organizations. I spoke to Andi Mann from Splunk about this stuff as well. A lot of DevOps is seen as, Dev, Ops. When they talk about the Wall of Confusion, it's developers saying, “Well, Ops don't understand us, but we can communicate with them better.” It's still an ‘us’ thing and a ‘them’ thing.
The communication is that one way, whereas what you're talking about is coming at it from the operational, infrastructure side and actually communicating with them/developers over there. I mean, is that how you saw it?
Yeah, it's really interesting and that's a great question. In fact, I've been interviewing candidates recently. This is a conversation that we have during every one of the interviews because it's something that I think is incredibly important is being able to bridge that communication gap. Even though the goal is to have DevOps and have no animosity between the teams' collaboration, the truth of the matter is that the responsibilities are slightly different, actually significantly different. Infrastructure requires a skill set that is not common among developers. Infrastructure management is lacking a skill set that is common among developers, which is coding. There absolutely are and will continue to be specialties in different fields simply because there are different responsibilities. The important thing is that the two groups can understand and communicate with each other.
When I interview for an SRE or a DevOps infrastructure position that's focused more on [the] infrastructure side, we actually put them through a coding experience and we don't have an expectation that they're going to be able to write algorithms or that they're going to be able to design object-oriented architecture. What's important is that they can ask enough questions of the developers that are in the room to come up with a working solution. We don't necessarily have to – everybody have the exact same skill set. In fact, the lack of diversity would be detrimental. What we do expect is that there's a common respect and that the ability to communicate and understand difficult problems, even if you're not the domain expert is one of the fundamental skills.
Funnily enough, as a complete aside, I was just talking to my daughter at lunchtime, and she said there’s this guy who's now the CTO of the company that she's working for and she said, “but he had a music degree.” Actually, the fact he had no technical background whatsoever and that he just learned to code stood him in really good [stead] and then he liked the coding, but he didn't come to it with this siloed, blinkered approach to what programming was about. He had a very different aspect.
Also, I’m just thinking that there are different – I mean, we talk about infrastructure people, but you and I know – I guess everyone knows that there are infrastructure people that spend their lives scripting. The UNIX background is very much about piping stuff through in different shell scripts, and some of them... PowerShell in the Windows environment arrived. It’s traditionally less of a scripting environment or drag-and-drop environment. Then you've got other environments which are the big heavy lifting environments which had nothing to do with any of that. They're about letting the systems do what the systems do well. We've got to get all of these different types of groups engaging with the notion of what development is, presumably.
I mean, you've already talked about communication;, you've already talked about the need for those different skill sets, etc. With all that in mind – and the answer could be communication, but it could also be other aspects of what we talked about or something else. How would you frame your experience when you look at applying DevOps principles and actually making them work? What would you see as the big things that are getting in the way of that now in the organizations that you know and have worked with?
Honestly, it is the pipe dream. You talked about DevOps being this magical land where everybody sleeps at night all the time and there's no weekend work, no after-hours [work]. The reality is it's a constant struggle. The biggest thing that I see negatively impacting DevOps is the desire to get to that magical land instead of recognizing the reality that things are going to go wrong. Unconscious bias is an actual issue that people have to consciously overcome.
For example, one of the things that I see [as] a common mistake, especially around SaaS organizations, is to not have any sort of specialty positions. Let's just hire developers and have developers fill all positions. Developers typically have gone to a school. They've typically done quite well in the school, and they're incredibly smart. They're capable of learning how to be a database administrator or to scale back-end code.
The challenge in that model is that developers went to school to become a developer, so while they may become capable and in fact one of the leading database administrators, eventually they're going to have the desire to get back to their roots, to get back to development. They will transition from focusing on database and database design back into a full front-end developer, which while that's incredibly valuable to the organization to have that skill set moved back to the general development pool, what it leaves is a gap in the knowledge for leading one of the core components of the actual application.
There has to be a balance between those who have specialty knowledge and those who are generalists. The generalists absolutely can and should move around the organization, but there have to be those staple positions to help bring consistency through the organization. The biggest challenge that I see with DevOps today is that we've swung the pendulum too far to where we are just saying everybody should be a generalist and again, that just leads to gaps in core knowledge bases.
That's brilliant. There's a factor within that which I'm going to hypothesize about, so bear with me. What it leads to is this whole ‘everything is code’ notion, so there's something that we have a cognitive ability, so if you’re a policeman, if you're a cop ,you see everything through the eyes of a police [officer] – they were walking down the street at three miles an hour and then etc. Similarly if you're a developer, you see everything – everything can be programmed, which is fine and that's absolutely true. Equally then the language changes the philosophy of how you approach things and structure things. Years ago I had a debate with a good friend, an analyst, Neil Ward-Dutton, about the difference between DBAs and developers.
Where we ended up was that both are the same but it's a bit like ‘Is it a wave or a particle?’ Developers see the world in terms of waves and DBAs see the world in terms of particles and they're the same thing. You do need to have different ways of looking at it, if you like. Long and the short is if we end up with a whole bunch of generalists, we're also ending up with a whole bunch of people that see the world from a process perspective, from a programming perspective. Actually, we also need people that see the world from different perspectives because those perspectives help us solve problems in different ways. There you go; that’s my hypothesis. What do you think?
Absolutely, I completely agree. I also think it's important that, getting back to the communication, that each generalist or specialist can represent the world from a different view. Even if they're doing it poorly, it's important that they're capable of at least understanding the hypothesis of the opposing environment.
If that's the case, so we've got a two-phase thing. The first is enable people to get very good at the specifics of what they do. The second thing is then enable those groups to communicate with each other. I think the first thing is also if you apply it too much, you end up with the old siloed environments that we've traditionally seen. Security people over in their lobbies doing security and hating everyone because they keep breaking things, which is wrong as well. What you're saying is the pendulum has swung too far the other way. What's the answer? Is the answer literally as you say just to start talking about game shows and have shared packed lunches, or is it a staged approach? What would you apply as a solution to that?
Unfortunately, this is where the answer gets complicated. There's not one solution that you can apply. People are different. They respond to different things. I have worked with people that absolutely love going to lunch every day. I have worked with other people that absolutely hate talking to people just because they are incredibly uncomfortable doing so.
We are in that industry, aren't we?
We absolutely are. It's important to find a balance but not cater to any one type of person. Again, the diversity in the environment is what really drives the innovation. If you can get different minds in the same room communicating, it's incredible what can happen. Just a really quick example: we went through an exercise where we wanted to move to continuous deployment a few years ago. We had gathered the information and several of us went into a larger group to present the plan, saying we want continuous deployment. Here are the reasons why; here's the benefits; here's the risks. Within five minutes, the entire room was yelling at each other, and that's not an exaggeration. There was 20 people in the room all having 10 different arguments.
After 15 minutes, we decided to stop and we wanted to answer two questions. Define continuous deployment, define continuous integration, just come up with a one-sentence definition. Everybody come back. We're going to adopt two statements. We came back and scheduled an hour-long meeting to literally just adopt two sentences: What does continuous deployment mean? What does continuous delivery mean? Once we had those base definitions, we were able to have a productive meeting.
Going back to people see the world through different views, ‘continuous deployment’ was a term that meant something different to everybody. Even though we all thought we were talking about the same thing, we absolutely weren't. One of the things that we have learned from that experience is the importance of having definitions. Now, when we're proposing application changes, or infrastructure changes, or even changes to the organization, we always come up with a proposal, a document proposal in writing that everybody has the opportunity to review. If they disagree with the definition, they're able to adjust that. Again going back to the communication, I know I keep harping on this but for me, that's what I see [as] the true success in tearing down those communication gaps and tearing down the animosity is simply having a common ground of terms that everybody accepts, because that's a building block. Even though we're speaking different languages, we have a building block that we all generally or genuinely and generally understand.
Oh man, there's so much in that. I said this podcast should be 20, 30 minutes; we could literally spend five hours now. I'm sorry; you’ve broken it. It takes me back to a conversation I was having with an oil company. They said they can't even define – wherever you get oil from, whatever that term is, no one could agree what that term was. It was an enterprise architect and he said, “If we can't even define what it is, where it is we are getting oil from, what chance do we stand with any other word because that's what we do as a business?”
I think that the thing that I'm unpacking in my head as you're saying it is we all use the term DevOps. We all use terms like CICD and in some ways, those are the most boring ones. We're already moving on to value stream management, to something as code or whatever, containers, Kubernetes, as though we've already agreed [to] those basics. What you're saying is just forget assuming that we've agreed [on] any of those basics. Just go right back to the room. What do we mean by that? People might go, “Oh man, yeah, we don't have to talk about this stuff yet.” Okay, you give me your definition and then we'll see how we get [on] – and if everyone does agree, it's 10 minutes.
That's right. That's absolutely right.
Taking that forward then, I'm thinking that that was the most profound thing. We're kind of done. Nice to speak to you then. That was great, thanks. [Laughter] Where do you end up with that? I mean, we don't want to end up with this. The whole point of Agile and all that stuff is you don't end up with these huge Zachman Framework style complexity of data dictionaries and so on and so forth. How do you make sure that you don't get locked into the old school way, that you keep the new without making it all look awfully like the old, I guess is the question.
Yeah, absolutely. When I joined with Lucid – sorry, after leaving FamilySearch, I joined Lucid. That was going back to your original question and finishing that story. I went from FamilySearch to Lucid. When I joined Lucid, I was hired on as the first DevOps guy at Lucid. In fact, I kept that as my official job title as long as People Ops let me. It wasn't until recently that they wanted me to actually have a real title. I like my first title but when I joined, there was 23 people in the office. Now there's over 500. When I joined, we had a couple of million users worldwide. Now it's over 20 million. The growth that Lucid has experienced in the time that I've been here has just been remarkable.
One of the awesome things and one of the reasons I really like Lucid is I've been able to help build it from the ground up. When I walked in, I was the DevOps. I was security and I was IT. If it wasn't actually writing code that users used or selling to our customers, that was the scope that I had. I had worked in environments that simply did not work well together. There was tons of animosity. Animosity led to things just being incredibly slow. Tasks that should take less than an hour would take weeks or months. In one case, over six months just to get a server turned on because of all of these arbitrary roadblocks.
When I came to Lucid, I did not want to create that environment, especially since we had a blank slate. One of the first things that we did was try to think outside of the box. How can we make sure that there's never going to be a silo where somebody can run off on their own and be isolated, and put up a barrier to simply have email in/email out or ticket in/ticket out? One of the first things that we did was make sure that the actual development organization participates in our on-call rotation. Now, some developers really like that and others really, really don't. If a developer's adamant that they do not want to participate, that is okay. There's a place for them. It's just not in the DevOps place. That doesn't mean that we let them go. That just means that we find something that's more catered to their...
There can't be a magic lamp.
That's right; there can't be a magic lamp. Our operations team or the people that are getting the first line notifications are actually compromised of our developers. Now, one of the benefits to that is because we have so many developers that participate in that rotation, each on-call rotation is about every four months. Nobody gets burned out from having to do something once every four months. Not only do we have now a very large pool of people to be on-call, the amount of ownership that those individuals feel about the production system in other environments is really significant. They're the ones that are actually fixing the system. They're the ones that are taking those notifications.
Another thing that we did is we set a limit to the maximum number of alerts that we get in a given week and if we ever go above that, we have authorization from the executive team down to drop everything, and work on addressing those issues to make sure that our operations and our infrastructure and database and all of these other specialty positions have a really happy work/life balance. Nobody likes getting woken up in the middle of the night, and nobody likes being interrupted even during the middle of the day.
If we address those issues that everybody, again, going back to a common ground, that's something that everybody can agree on is: getting an alert sucks, and we've defined that, getting an alert sucks. It builds camaraderie. There's a sense of we are all working on this problem together. What we found is that when we do go into meetings to talk about infrastructure changes or application changes to support the infrastructure, there's that common language that is already established simply because of how the on-call rotation works. Now, whether or not that scales to 10,000 users or 10,000 employees, we'll see, but it has worked incredibly well from start to where we are now.
There's a question of would you need 10,000 users? A lot of big teams are big because they couldn't do it with a small team. Does that make sense? All that ‘man months’ stuff and I'm fascinated by that because, again, but going back to the Wall of Confusion and developers feeling ‘we are now communicating with ops; our work is done.’ What you're actually doing is taking developers to the other side of the wall and saying – and you're not rubbing their noses in it, but you kind of are.
It's not ‘Let's see how you get on.’ It's ‘This is how it actually happens’ and then suddenly when you do that – and as you know and as I know, we've both worked on that other side of the divide – it really changes how you think about prioritization.To your point about stopping everything in order to get something fixed, because you know how important it is to get that fixed before you carry on, isn't it? Otherwise you're just building up more problems to be solved. There's no point. You're not being more efficient.
That's absolutely right. On the infrastructure side, one of the primary goals or one of the primary things that their performance review is based around is how successful are they [at] making the developers productive, so not how well is the infrastructure working, how productive are the developers able to be because of the work that they're doing?
On the developer side, it's how stable is the production system? Even though, in a traditional environment, you would think that operations owns the production system and developers own the code. Well, the developers write the code and the infrastructure team actually pays for and manages the infrastructure system. What they're rated on is how successful are the developers at their job. The developers are rated on how stable is the production system and how happy are the end-users, which again, encourages that cross-support between the groups.
Excellent. A question for you, which feel free to talk more about that stuff, but wrapping around that is the whole notion of automation. Obviously, we talk about automation helping make the developers more efficient. We also talk about automation in a way, or I've certainly heard it talked about in that ‘No ops, you don't need operations anymore. You can automate everything and it all just dyna-magically works.’ How do you perceive the realities of automation across the two sides and actually enabling the two sides?
The thing that I have seen [as] the biggest detriment to that is fear that that is going to compromise anybody's job. I've talked with so many people that are scared that automation is going to get rid of their job, and I think that is incredibly foolish. If those tasks are automated, that means that you can spend more time doing things that you actually enjoy, like research or improvement.
At Lucid, we try to automate everything. That's one place that the developers typically significantly outshine people with a more traditional infrastructure background. Our infrastructure team has actually been tasked with enabling developers to automate the infrastructure, so provide the tools, provide the mechanisms for the application to automatically scale. Why are we doing this manually? There are triggers. There's things that we can monitor that can automatically do that for us. Let's do it. Automation is absolutely key. It's absolutely important. That's a place where you can actually get to a magical end and it is glorious.
Interestingly, to link back to where we started on this was that you're moving through a series of – well, I added the level of boredom, so was feeding you a line there. Automation, from that point of view, does remove the worry because it removes the boring stuff first, obviously, because the harder it is to automate, the more interesting it is anyway and the more it needs that manual intervention. I don't know why anyone should be scared of removing the stuff that's really, really dull from their lives.
Absolutely. The reality is if you've been automated, it's reproducible. If something is incredibly difficult, if it's going to take weeks to do once manually, that should probably be a good indication that it needs to be automated because if there ever is a failure, it's going to be weeks before you can replace it.
Oh boy, yeah, absolutely right. Okay, good, I think we should wrap up there. We talked about the overall complexity and seeing it from the operational point of view, rather than seeing it from the development point of view. We talked about the whole importance of having definitions in place and then taking developers – what I've really taken out of this is how to ‘ops-ify’ the whole DevOps scenario, and then you can start to apply automation to it without taking away from the fact that we're always going to need people. My goodness, aren't we always going to need people? Are there any last thoughts that you have that you just want to leave people with, don't do that, do this stuff?
Honestly, DevOps is not a technical problem. That's the takeaway. It is a communication problem.
It all starts, middles and ends with communication. Well, thank you very much, David, for your time. I, once again, have learned a whole new perspective on this, so I thank you from the bottom of my heart. I hope our listeners have enjoyed that as well. Everyone out there, thank you for listening and do tune in next time, but David, thank you very much.
Thank you. This was fun.