How Twitter is using ThousandEyes to monitor operations




Input sound file:

1011.Batch 3

Transcription results:

Name: Performance management past the perimeter

Mohit Lad
Rafal Waligora

Announcer 00:02

I went through the whole list before of who has been here and who will be speaking, Facebook and Google and so forth, and I was like, ” Wait, what about Twitter?” We’ve fixed that, so there’s somebody here from Twitter too. Before I introduce him, I’m going to let actually our next guest who is Mohit Lad from ThousandEyes. Come out on stage, Mohit, and let’s welcome him and let’s start of pre-welcome his guest that will be on in a few more minutes.

Mohit Lad 00:32

ThousandEyes is launching today and I want to clear the air before we start. We have nothing to do with the NSA so far and I know I see a lot of excited faces in the crowd and all of those excited faces are wearing glasses, we’re not going to fix your vision either. So neither of them is true. We’re launching today and a lot of people asked me, ” Why are we launching at Structure?” When my PR team came to me with all kinds of reasons around why Structure is a great venue, great press, high quality, high caliber, excellent coverage, there was only one thing that really struck with me, which was if you get a speaking slot at structure, you don’t have to pay the registration fee and that’s the coolest bet about launching at structure. Let’s give a good hand to Structure for letting us come here and announce our great company here.

Mohit Lad 01:29

You guys actually have a sense of humor, that’s great. Before we’re going to ThousandEyes, I want to tell a small story and this is actually a true story and I swear it is a true story, and if I’m lying you can see my eye starts to twitch. You would see that I’m actually telling the truth. When we moved into our first office San Francisco in Financial District, the lights would go off at 7 p. m. to save energy and any startup would find that really annoying. What you would need to do would be to pick up the phone, dial a number and enter a sequence of digits to turn the lights back on again. For the first two days, I did that. I actually manually picked up the phone and called these numbers and turned the lights back on. Ricardo, who is our Co-Founder, who generally lacks patience, decided that this was too manual. He went up, wrote a cron job, used Twilio to do a bunch of calls and automate this whole process. Since then, for the next few weeks, lights were on and we could continue working until one particular day, the lights went off again. Any normal person from this audience would have picked up the phone and turned them back, but we were sitting in the debug why our script failed. What Ricardo found out was that the scripts were okay, but Twilio was running on Amazon which was having an outage. This is interesting because Amazon outage is actually turning off our lights and that’s the kind of dependencies we are bringing into the environment today. We fixed that and then few weeks later, the lights go off again and we look at each other and say, ” Amazon must be having an outage again,” and it was actually true. It just scared us how much dependency we’re bringing into the environment where we don’t even know where these things are coming from and how many things lie in between. Today it’s the lights that these things control, tomorrow it could be our toilets that could not be flushed because Amazon is down.

Mohit Lad 03:30

In order to make my point, I had the designer make this really interesting graph which says, ” An enterprise from 10 years ago,” but this is actually one of the enterprises today and it’s one our customers that I could not name obviously. But this is an actual graph of four locations for them accessing an on premap, and you can see it has a branch office, corporate backbone and data center. If you compare this to what the enterprise looks like today with cloud applications on, these are the same four branch office locations. The segment you’re seeing is actually the only part of the network that they completely control. Then, there comes the Internet and then there comes the cloud applications. This is just for four cloud applications that this enterprise uses. The typical enterprises use at least 100 different cloud applications, some of them they don’t even know of and I wanted to show all of these applications and this visualizations, but Structure would not give me a screen starting from this point here all the way up to there. In fact, I was trying to convince them to move to IMAX in Metreon which would have been a much bigger screen. This basically represents a huge growth in how you looked at your network and if I abstract that picture into this small three-picture cartoon, you basically see there are three different segments; there’s the enterprise. There’s the Internet and there’s the SAS provider and all these network segments operate really very differently. The big problem here is nobody actually sees the entire picture. Not only is the Internet a big part of what you use to reach a cloud application or to deliver a cloud application if you are on the other side, but it’s also constantly changing.

Mohit Lad 05:15

These are three things that we see which are really problematic in this particular situation. One is, you obviously don’t see the entire picture. The second is, there is a very big, strong disconnect between application and network performance and then there’s the last part which is more of a people aspect where troubleshooting is now not just walking up to your IT guy and telling him, ” Fix this.” You actually have to call, open a support ticket and you have to work with multiple people from different organizations, and we all know how that is, right? We call tech support and they tell us to restart our computer and that’s painful. So those are the three problems that exist today. What we want to do with ThousandEyes is really open these boundaries and create an environment where we can look at all these three network segments as one big network.

Mohit Lad 05:57

There are three core components of ThousandEyes that I would like to highlight and I’m not going to go into details here because whatever say here you’re probably not going to believe me, so I will save the details for one of our customers who’s going to join us in a few minutes. An X layer is the ability to correlate application performance with underlying infrastructure like networks, BGP, DNS and so on. This gives you the ability to really understand what’s going on at the application layer and what are the reasons that it’s happening. Finally, interactive sharing is another key component of what we do where you can now share live data with other parties, so they can see the same picture that you see. It’s not just sending screenshots, it’s not like he said, I said, you actually see the same picture. With that, I would like to say that ThousandEyes is really meant for different parties, it’s meant for the enterprise, it’s meant for the cloud application providers. You can use it in different ways. You can use the public agents as a cloud app. You can use the private agents as an enterprise to deploy it within your environment. In order to highlight how ThousandEyes can be useful, what we’re going to do is call Rafal Waligora from Twitter to join us and tell us about how he uses ThousandEyes. Thanks Raf.

Rafal Waligora 07:23

Thanks Mohit. Again, that’s my full name over there, but I go by Raf, just first three letters of my name. I’m @FreakOverload at Twitter, many people like that handle. I’ve been at Twitter for about two and a half years and just to set a little context, we have been working with Mohit and ThousandEyes for close to two years, so it’s been a long journey. I’ve been working together on the product and just making sure that it actually does what we wanted it to do. When they started at Twitter about two years ago, we were just moving from the cloud in-house and there were little challenges associated with that. First of all, Twitter is a software engineering company. We have a tremendous amount of engineers coding and writing a lot of telemetry to collect performance data using browser navigation API or even using just growth boomerang. It was getting data that helps you build great applications. That data, however, needs to be aggregated and it’s not the readily available because it just needs to be map reduced and presented in different formats. What we needed is some internal tools and more than anything we needed a tool that could look at the Twitter structure externally and tell us if there were problems. I have to be honest with you, when I first met Mohit and heard about the product, I was a little skeptical, because I was thinking well, ” How could it be different than anything else?” There was little product out there that helped you monitor infrastructure externally. I guess I was wrong, there’s a lot of special things about ThousandEyes. Number one, it’s visual and lets you get to troubleshoot really fast and it highlights the areas of problems, and it not only helps you visualize the performance of your app externally, but it also highlights the performance of the infrastructure. I’m going to walk through some slides.

Rafal Waligora 09:23

First and foremost, I said visual. You’ll see over there at the very top, you see a timeline and that’s the baseline of the performance for the specific metric and within a timeline, you’ll see deviation from norm, you’ll see either a dip or peak. At that point, you can click on the dip or peak and you’ll see the status of your application at that given time. Now, a lot of people would say that the overlaying of the agents on top of the world map, it’s kind of talky but I don’t think it is. It’s very significant especially if you’re running a global company and you’re running an application that’s used globally. The thing that’s really stands out for me and there’s the next slide that shows us that ThousandEyes helps categorize the problem right there and then on the dashboard. For instance, from this point I can see that this is not a DNS problem, it’s not a server problem, it’s very likely our problem because my connect and SSL are failing. From this view, what I can do when I’m determining and it’s our problem, I can dive into the narrowed view and from the narrowed view what we will see is a different visual representation of the infrastructure. What this actually is, is the complete layout of all the infrastructure systems leading to my application, which is I think groundbreaking. There isn’t too many products like that on the market right now.

Rafal Waligora 10:56

When you want to troubleshoot, you just look at the areas of focus and then you can determine, ” Hey, there’s two routers, 25% packet loss affecting some of the agents internationally.” At any given time, as you’re troubleshooting, you can share the entire visual aids that you have with your partner, so you just click Share, specify the time when it was associated with the test you want to share, and you’re going to just email it to your friends.

Audience 11:26

Or Tweet it [laughter].

Rafal Waligora 11:32

The entire infrastructure layout is built using TCPTrace just like traceroute inherent, it’s pretty complicated because routers and switches have multiple interfaces, and if you were to visualize multiple traceroutes in a single plan, it’s just a lot of data. So what we can do in the product is aggregate multiple IPs and multiple nodes into one, that’s something significant to the operator. For instance, in this case what we did, we aggregated a whole bunch of IPs just the interface IPs really from a router and right there you see a visualization of how your New York, for instance, router looks like in the big picture. Last but not least, if you ever needed to go a little bit deeper. For instance, it’s a pretty common problem for a network engineer to come in to work on Monday, and someone says, ” Hey, website was slow.” Then, the typical answer is, ” Well, okay. The Internet might have been slow right/” What you can do, you can drop down into the BGP visualization. It shows the asset they were specifically tracking and how BGP routing globally might have affected it. In this specific case, it looks like – we looked at the timeline, there was a deviation. We looked at the specific data and we see that BGP path has changed and it’s very possible that an ISP was in maintenance or there was an issue on the Internet. With that, I’m going to pass it back to Mohit.

Mohit Lad 13:06

Thanks Raf.

Mohit Lad 13:10

That was great by the way. I’m going to record this and use it in every sales conversation we have gone forward. These are some of our other customers who have worked with us over the last few months and sometimes more than a year, to help us develop a product. We have several customers that are outside of this set which are either new or their PR departments just need a whole lot of dating before they can get comfortable to release their logos. We have several customers in Fortune 500, we have seven of the top ten SAS companies as customers and I was actually starting to count, and I realized when I looked at the Structure speaker list, that more than ten speakers at Structure are actually representing companies that are using ThousandEyes, which is pretty cool. Here’s one thing I would like to offer to all those speakers that are going to be on stage, either later today or later sometime tomorrow, if you can give a shout out to ThousandEyes and if your logo is not here, I’ll give you an extra 20% discount on top of whatever we’re giving you. With that, I would like to end the talk. Thank you again for listening to us and being a part of our launch. We’re really excited and hope that we can do something for you guys in the future. Thank you so much.

Comments are closed.