Why Guavus analyzes lots of telecommunications data before storing it all

Anukool Lakhina Guavus Structure Data 2013

Session Mane: Think Different, Think Signal Processing: Approaches To Real-Time Network Data.

Speakers: Announcer Anukool Lakhina


… he’s the founder and CEO of Guavus, and he’s going to be talking about ‘Think Different, Think Signal Processing: Approaches to Real-Time Network Data.’ Please welcome Anukool to the stage.


Thank you. Ten years ago, before Big Data was a phrase, Sprint Labs – the advanced technology labs – had a really wacky, crazy, audacious idea. The idea was this: what if we could collect every single packet from our network? What if we could instrument the networks so that we could collect every single piece of data from the network? Imagine what we could do. If we knew what subscribers were doing in our business, if we knew what applications they were using, if we knew what devices they were using, if we knew how they were interacting with the business, we’d be invincible. We’d know everything about our business. That was that simple idea that really began a journey ten years ago.


What Sprint did was they went around buying expensive probes – deep packet inspection probes – and they started installing this in their network. Next to each of these probes we put one of these big bad babies – these massive Dell-powered servers, which were really filled with hard drives. What we would do in those beautiful romantic days when storage was still expensive and people didn’t have the phrase ‘Big Data’ established, we would put these probes, we would put these servers right next to them, we’d turn the probe on for an hour, and after an hour we’d have to shut the probe off. Because what would happen is you would fill up all these hard drives with data. That’s all we could collect – one side, one hour, one terabyte, a lot of storage.


Then, Sprint turned to the most advanced network, which is the FedEx network, to actually carry this data around, because FedEx was cheaper and faster than the Sprint backhaul network to actually backhaul all this data. We used to joke – we used to call this ‘package-switched networking’ whereas it’s ‘packet-switched networking.’ These were the fun days – this was before obviously data science became sexy. In those days I was a lowly researcher, now I’d be called a data scientist. I wish I was doing that now so I’d have more sex appeal. Anyway, my best day was when the FedEx truck would pull in, we’d take out the storage arrays – we’d take out the hard drives – we’d install the storage array, I’d go and run a whole bunch of math, do regression models, I’d do the data analysis, and after all was said and done I’d come back to Sprint and say ‘Hey! Guess what? I found something! Bank of America is being attacked!’ And the question they would say was: ‘When? When? How? Where?’ And I’d say ‘No, no, no, that was yesterday. That’s not now, that’s yesterday.’ They would humor me and they’d say ‘That’s really great, come back when you can actually help us with something actionable. You’re just researchers. Come back when you actually have something interesting to show us in a timely fashion.’


Since then we’ve done a lot. Guavus was founded on that idea. Founded in 2006, about 400 people, globally distributed, and we’ve really taken that problem of what we call automating the FedEx model of moving data around from these large carriers and being able to extract timely insights that people can act on. Our key realization was, in order to really solve that problem, you had to go back to the primitives, go back to the axioms and, again, think about redesigning the entire data fabric from the ground up.


What does this data fabric need to do? Fundamentally, the data fabric has to solve this problem: it has to be able to take data from the network, distribute it at a massive scale – quite honestly, this is why data became big. Data did not become big necessarily because we started blogging more. Data became big because networks and sensors got instrumented, connected, and they just started generating all this valuable data. So what this data fabric has to do is take all this network-generated data and then mash it up, add the business-context information which exists already in information systems today. It’s this marriage of low-level, high-volume, high-velocity data from the network with IT information that already exists about building plans and subscriber demographics and cost models. If you can marry these two things together, suddenly what you have for the first time is you know the now. You know exactly what is happening right now in this environment.


That was really the problem that we set out to do. Of course this has a lot of challenges. You’ve got to do it at scale, so typically petabyte-scale data sets every day that are being generated. You’ve got to do it at very high cardinality – we talk about Big Data, but one of the things I don’t hear a lot more is this data has a lot of cardinality. Really what that means is you’re reasoning about trillions of objects – you look at a network like Verizon. That’s 100 million subscribers. You have to actually profile each of these subscribers. You’ve got to reason about 100 million dimensional objects. This is where you’ve got to throw machine-learning at the problem, you’ve got to throw distributed systems at the problem, you’ve got to move-compute towards the edge, so that you can process the data as it’s coming in. This data fabric requires a departure from the traditional model which is centralized store-first analytics, which basically means bring the data to a central place, store it, and once I’ve stored it, I can ask questions about it. That model works really well when you have small amounts of data and you don’t have to worry about the FedEx problem that I talked about, and you want to answer questions on a quarterly basis. But when you’ve got massive amounts of data coming in from across a continent, you have to depart from that model. Centralized store-first analytics doesn’t work. You have to take the approach of distributed compute-first analytics. What I mean by that, move-compute to the edge, analyze the data as it’s coming in, and once you can do that you can start pushing out these timely decisioning applications.


That really is where the value is, fundamentally. The value is really in providing these applications that decision-makers inside these Telco environments, whether it’s for network-engineering or security, whether it’s for customer-care, marketing, monetization, what you want to do is democratize that data which is locked away in a network, mash it up with information that already exists in the IT systems, and provide these timely decisioning dashboards that Telcos can then use to not only optimize their cost, delight their customers, and obviously make money – discover new revenue opportunities.


What does this dashboard look like? It’s what you would expect from a consumer application. Self-service, very dynamic, the right information at the right time to the right user, which is actionable. Let me give you some examples of how these dashboards are used. One of our customers used one of these dashboards and what they discovered was 1% of their capacity in New York was being consumed by an application which was really an M2M application that these taxi-cab companies had. It was actually a credit card transaction application. You’d go to one of these New York taxi companies, they’d swipe your credit card – that’s actually a connection that they’ve bought from a Sprint or a Verizon or what have you. What this operator did not know was this application was actually being used to carry live news coverage. So this taxi-cab company had been very ingenious and had rejiggered a credit card transaction application to carry live video streams, especially at 9 am, 5 pm, which is exactly the time when the network gets congested.


When armed with these analytics, the network guys went to the pricing guys and the product guys and they said, hey, you’ve got a problem over here. Now that we can finally talk to each other, because we’ve got a shared view, you might want to go back and change how you’re thinking about how you’re pricing this application because it’s killing the network. Just to give you some numbers here, this is a brain-dead ROI. The system paid for itself before it went live, because 1% of capacity savings in Chicago on the radio-access network for a network like Verizon – we’re talking hundreds of millions of dollars. Simple anecdotes. This is just a simple application, but the ROI’s are very compelling.


Let me give you another example. Another one of our customers is using our customer care analytics application to solve the following problem. Most of you know this, but what’s happening in North America and globally is data plans have gone from unlimited to limited. I don’t know if you’ve seen your phone bill recently, but when you look at your phone bill, it’ll have all your voice conversations, all your voice calls, dissected by daytime, nighttime, weekday, weekend. The funny thing is, voice is unlimited. Data isn’t. And in your phone bill, what it’ll show is your data usage was 55 gigs. Or 55 megs. You’ve got no visibility into why am I being priced what I am being priced. So one of the hidden costs of rolling out Smartphone is the calls that are going to customer care. Again, the numbers here are staggering. The majority of the calls that go to customer care organizations in these Telcos aren’t anymore about ‘I can’t reach voicemail.’ They’re really about ‘Hey, I can’t reach iHeartRadio,’ or ‘Hey, why the hell do I have this overcharge on my phone?’ Because I don’t know about you, but I have no idea what the weather app does on my phone, I have no idea what a byte is on my phone, and most people don’t either – certainly my mother doesn’t. This is a challenge. How do you solve this problem?


To us, this is a data problem. If you could see all this data, then what customer care agents can do is actually be able to proactively respond to angry subscribers who call in and say ‘Hey, the reason you do have this overcharge is because you went to YouTube and you spent this much time in YouTube.’ So now what we’re seeing carriers do is really take that customer care data, take that network data, mash it up together, and get to providing very dynamic, self-care portals for end-users. But also, internally, they’re getting intelligent about how they want to resolve these calls coming in.


This is again a very clear ROI – very simple, clear bottom line improvement – and honestly, these are just two simple examples. There’s countless more of these examples. While you do need a lot of rocket science, you don’t need a lot of unnecessarily creative [inaudible], uncovering these new use cases that can be very-very big ROIs – $50 million problems, $100 million problems – and you don’t need to do a lot of hunting around to discover these use cases. That’s what makes this Big Data have a big impact at the bottom line for these Telcos – because these Telcos really do need to move from being model-based, aggregate, average companies to much more precise, dynamic, acting on the right time to the right user.


What we’re particularly excited about is where we see this going. I told you about ten years ago Sprint had this project. I think ten years from now we’ll be having a very different discussion. We’ll be having a discussion about how you can take all this data, how you can extract insights from it, but fundamentally, how you can transform your business processes. How you can make your business processes data-driven. How you can get towards right-time, data-driven business processes. Because what we’re seeing our customers do today, as we’ve deployed these massive analytics platforms, and as they become comfortable with using these analytical applications, what our customers are now pulling us to do is, hey, I’ve got this really great dashboard, I can do all these things, but I want to be able to take things from this dashboard and trigger that business process. I want to automate this whole decisioning cycle.


I believe that we’re in the early stages here – I honestly believe we’re at the tip of the iceberg – but this is where the world, we believe, is headed towards: where you can go from continuously measuring, continuously sensing, to knowing the right information at the right time to finally triggering the right business process. I think that’s the way the world will move. That is where you will see the big impact to businesses from Big Data. Thank you.


firstpage of 2

Comments have been disabled for this post