Session Name: Big Data Is Broken Without Integration
Speakers: S1: Announcer S2: Chris Albrecht S3: Stacey Higginbotham S4: Gaurav Dhillon
Please welcome your MC Chris Albrecht back to the stage.
CHRIS ALBRECHT 00:08
Everybody feeling good, feeling networked? As people start filling in for this second half of the morning session I just want to remind you of a few things. Welcome back! Due to fire marshal regulations please keep the aisles clear. Remind you to mute your cell phones please. If you want to ask a question there are microphones in the middle of the aisle here, please use those when asking a question. The WiFi is GigaOM and there is no password. Please remember there have been a lot of great comments and a lot of great capturing snippets of the show so far. You can follow us on twitter at GigaOM and make sure you tag your tweets with #dataconf.
CHRIS ALBRECHT 00:56
Remember to download the mobile app, you can scan the QR code on the back of your badge for a program guide or log in with an email address and use the password sd2013. Exchange contact information with other people, message with them and just in general have a full and more richer event experience. And that is about it for all the housekeeping kind of things.
CHRIS ALBRECHT 01:19
Right now I want to welcome back to the stage Stacey Higginbotham. She’s going to be talking with Gaurav Dillon the chairman and CEO of SnapLogic about Big Data is Broken without Integration. Please welcome Stacey and Gaurav.
STACEY H 01:44
Alright, big data is broken without integration – oh my. We have heard so far this morning about how data science is broken, how machine learning is more of an art and maybe the current way we train people is broken so we’re going to continue on with this theme of things that can be fixed with big data. Gaurav here has a really awesome story, so we’re going to get to that. But first let me let him introduce himself – Gaurav.
GAURAV D 02:18
Stacey, thank you pleasure to be here. Brief introduction: we’re SnapLogic, we integrate information for large companies, either it’s big data or it’s SaaS applications that they’re adding into the enterprise. I’ve been in the integration business for two decades, started Informatica in my garage, was founder and chief executive for about 12 years took it public and so on. Took some time off and like a lot of 1. 0 people I’m doing a 2. 0 thing, so here we are doing integration and the cloud as it were.
STACEY H 02:48
Now you started in your garage, which is a very old school thing to do. Where did you start this company?
GAURAV D 02:53
It was a sublease from some industrial design company-
STACEY H 02:58
GAURAV D 02:59
It was a little better, it had air conditioning and of course the internet, which we didn’t have in 1992.
STACEY H 03:05
Just imagine that guys – wow. To get a sense of what you’re talking about when you’re talking about data integration you have this great case study involving Splunk and a large Fortune 10 company. Can you tell us how they used you so we can start getting a sense of the parameters of what you’re talking about?
GAURAV D 03:27
That’s a fascinating example of what we believe we’ll be seeing more and more and that particular case study is about this Fortune 10 company which is a large industrial conglomerate, they have businesses in different verticals. They need to gather a bunch of IT events together to understand what is happening in terms of their risk possession. So they use Splunk – they feed a bunch of the information at the log level into Splunk – and as you can imagine across a company of that size that’s rather big. But they also needed to bring in events from a SaaS layer – like most enterprises they’re going to the cloud, they’re adding SaaS applications.
GAURAV D 04:09
For example, they’re adding service now to manage their help desk, so they needed to bring in all sorts of SaaS events: what has somebody created as a ticket; who’s logged in; who’s logged out; and they wanted me to couple that together. So to bring in that SaaS layer of data they use SnapLogic to enrich the typical machine data with business level data to give them additional insights.
STACEY H 04:30
And they put that together and what do they get?
GAURAV D 04:33
What they get is a level of information about their security position that they would not get just from the machine level information. Because the machine level information doesn’t have the context of what you have at the SaaS layer at the business layer. That layer of information typically is in your CRM system, in your help desk system, in your marketing systems and so on. So when you add that in you get an insight into your IT risk profile in a way that you simply cannot get from just log level data. Log level data says IP address something-something-something went out to IP address something-something-something, but what was that IP address? Is that Stacey, is that me, is that Om, who is that?
STACEY H 05:13
And risk is basically leaking information, that kind of-
GAURAV D 05:16
STACEY H 05:17
Awesome. You also have something about Activision?
GAURAV D 05:25
That’s right, we can talk about that. This is again fascinating because what we’re seeing again, whether you’re a large company – or in this case a large gaming company – Activision is one of the largest users of Salesforce.com. They have one of the biggest customer masters in Salesforce.com and in particular the use case with SnapLogic is to couple the in-game data. I’m not sure how many Call of Duty gamers we have in the room, but if you’ve ever played Call of Duty, that information in Call of Duty is essentially stored in a large big data hadoop cluster and that information is used for all kinds of product management decisions. Pricing decisions, the various items – guns, grenades – whatever teenage males want to use to have fun with, it’s all done there. Now-
STACEY H 06:17
In the virtual world?
GAURAV D 06:19
In the virtual world. But to make economic sense they need to combine that with the financial information that says: what should be the pricing of this thing? The need to run some analytics on fraud detection. Shock horror, you’ve got young teenage males with a lot of time on their hands, of course they ” game the game” – you get in a chat room and shoot each other and try to get points and so on. So SnapLogic brings in the in-game data, ties that with their customer and economic data and helps them produce economic models for what the pricing should be, what if any fraud detection comes out. Again and again you’ll see that – there’s machine level data or massive amounts of information that needs to be coupled with business layers to provide something like economics, to provide something like security profiles – to essentially create business value.
STACEY H 07:08
OK, so this is like if I want a flamethrower, this is going to set the pricing for it based on what other people are doing and that kind of thing. Awesome-
GAURAV D 07:15
Or how many flamethowers people reached for when it was free.
STACEY H 07:18
That’s fascinating and to me that’s the holistic promise of big data and earlier Sean Gourley was actually talking about data intelligence and taking not just this prediction but this [inaudible] view. To do that you have to be able to grab legacy data, machine data, all this data. You guys are basically normalizing it and making it so everything talks to one another coherently. But when you start talking about doing that in practice, what are your big challenges? I can think of big data – where do you put it, do you have two copies of it? I don’t know. Then things like real time data – how do you do it quickly or does it matter? I don’t know.
GAURAV D 08:07
It does. It turns out people want things yesterday in the business world – sometimes they can’t get them in that time frame. I think the challenges are, of course – size. You have large amounts of information so you need a scale-out architecture of some type. However that’s done, whether somebody uses our product or something, that challenge remains. The challenges that we get called in to address have to do with a diversity of systems that produce that information. I think this is one strength of SnapLogic is we have these things called snaps which are essentially smart connectors and we have a lot of them, like 150. So when you need to bring in a Salesforce information or some business level information you don’t need to start from first principles to hunt and gather that data. The hunting gathering I think is essentially where we get brought in because whether it’s custom applications or business applications that need o be brought in and that needs to happen in business real time. It’s not stock trading or flying a 777 by wire type real time, but it’s business real time – you need it in a minute SLA or two minute SLA, something like that.
STACEY H 09:16
So why wouldn’t I just pull an API from these guys?
GAURAV D 09:20
Because there’s so many of them-
STACEY H 09:21
GAURAV D 09:22
That’s the issue. People who’ve tried to do it for the first time realize that for simply connecting one end point to the other that does make sense but then the thought that ” Oh we’ll just write some ProScript for this, and we’ll write some ProScript for that, we’ll write some Ruby for that” it sort of silts up. I think that’s where, in a sense, in a business context as opposed to scientific where you may have lots of free labor, slave labor – also called graduate students. But if you don’t have access to a lot of graduate students chances are out of the ten person team on big data, two to three are doing hunting and gathering of information to light up the big data. So at SnapLogic we feel that someone can free up two or three headcount in a good sized project because we can automate that – boom – rather than have to have a third of your team hunting and gathering for information.
STACEY H 10:13
Sure, OK so it’s just another virtual layer on top of that to make it easy, because everyone loves easy.
GAURAV D 10:21
STACEY H 10:22
So let’s talk about enterprises of your customers. We had talked about some of the marketing and retail, like how this level of sharing of data and integration can actually change the way people buy and sell. Do you want to talk about how you’re going to save retail from Amazon?
GAURAV D 10:42
We’re trying with Best Buy – we’ll see how that goes. What’s very clear is that there is no just retail-retail, there is retail with multiple channels. Much as Apple has stores as taxation catches up who knows, Amazon will have new shipments and other kinds of things, so this is all going to blend in an interesting kind of way. With retail though we find that there is an acute shortage of good recommendations online. In fact, even before we do smart stuff like recommendations just keeping store inventory in relatively real time update at a company the size and scale of Best Buy is an onerous task – just that transparency. We had the sale on iPads – you get the first one for $50 off – if we want to switch that off how long does it take? Well it would take you quite a long time – 12 hours or so in a network of that type with all these applications in the back that need to talk to each other to pull that from, in a sense, the promotional line-up.
GAURAV D 11:43
So with companies like that I think the low-hanging fruit is transparency, inventory management in a multi-channel selling, and that intraconnection is vital. Stuff that will happen in the future is recommendations which I think Amazon does very well, Netflix does very well, but most retailers don’t have and that’s very much a big data type of application in my mind. I think as the future progresses you’ll see location data – when you’re in a store if you’re the customer, you have an app – location data I think is going to come into play in a very very good way, and we’re excited about that.
STACEY H 12:17
Geographic location, like I’m in Texas and I want this, or like I’m in this part of the store and I want this?
GAURAV D 12:21
I’m in that part of the store. There’s even an interesting app that’ll produce little sounds that can be sensed by the microphone if you have the app on that can tell you what aisle you’re on. They’re running a pilot that knows that if you’re browsing for something and what aisle you’re on and if you’re troubled and just walking backwards and forwards in a certain pattern it sends somebody over to say ” Hey can we help you, you look like you’re looking for maybe this kind of hammer or tool?” Because when you’re like ” Hmm superstore – lot’s of choices – where’s that thing I came in for?” That can be quite helpful, I think.
STACEY H 12:54
It could, although I enjoy walking around hardware stores and playing with things, so I get a lot of help anyway because they’re just like ” What are you doing here?” Let’s talk about – because this is one of my big passions – is the idea of the internet of things. GE has the industrial internet, and there’s IBM and their smarter inventory that has trucks that know what’s on it and they get stuck in the desert and that’s bad because it’s refrigerated items – anyway. You all saw that add campaign, right? When I think about things like that, I think about how our entire economy is going to be these rivers of data and information flowing back and forth. I usually think of it in more pure form, like an API call because those are the types of people I talk to. But when we get to this kind of world view, is it enough just to integrate this data, or how do you teach people what to do with it, how do you find the data sources that you want to pull in to bring this stuff together – are you guys thinking about that and helping your customers there?
GAURAV D 14:00
We think that’s going to come in a big way. There’s a Sensors Expo in Chicago every year, about 800 people, most of whom are electrical engineers who manufacture sensors – they know how to transmit HDTP stream, but not much else. Can’t spell geek without double-e – I’m one – so you’ve got lots of them in there. But the question is what do you do with that stuff?
GAURAV D 14:23
One example that we hope to put into production with this partnership in the months to come is there’s a very-very amazing sensor that looks at a boiler in a power plant – they have these big steam boilers – because with nuclear energy going away it’s going to be either green, hopefully, or a lot of fossil fuels burning to produce steam that produce electricity. Well these boilers cost hundreds of millions of dollars and this sensor looks at the temperature of that boiler in the infrared spectrum and today all the sensor is doing is providing a cable feed – literally a television feed – so somebody in a baseball cap chewing gum is looking at it to make sure it’s not going to blow! OK, this is fun. It’s 2013, couldn’t we put that into hadoop? Couldn’t we have analytics around ” This is going to blow in a week because it’s had these hotspots”? I think those sorts of things that are examples that we see in the savings on energy to take down a boiler, to put it back online, to manage them are also true of aircraft engines, are also true of automobiles, are also true of smart meters that are nothing but low-wattage wifi devices streaming out a signal to a telephone pole somewhere in your neighborhood.
GAURAV D 15:37
We think that the internet of things is an amazing thing to come. It is very tangible and the ROI is very real. On some of the big data projects some of the data science works – hooray Nate Silver predicted the presidential election but he got the Superbowl wrong – for those of us from San Francisco were like ” What were you thinking?” So I think as big data looks for business value there is enormous business value in the internet of things. It may not be at the micro-level where your washing machine is smart about when it turns on because the power is cheaper – certainly not in the United States where power is cheap and we tend to consume a lot of it – but in countries where people watch their power consumption – it may be more expensive or fossil fuels aren’t available – those sort of things will happen. But very certainly you see early indications like what Nest is doing with thermostats – that’s just the beginning of what we think is going to be an enormous wave of consumption and production in analytics.
STACEY H 16:32
So on the backside of all this sensor data, some of it is open – they have an API – some of it not so and people are reverse engineering things to have things taught to each other. What would your role then be in something like that? Where do you see – and maybe not SnapLogic’s role – but what is the role of data integration in that world – how does that end up working?
GAURAV D 16:59
I think it’s moving past the roots of – if I may use the warehouse word for a moment – the roots of what is today a $30-50 billion business intelligence data warehousing industry, the roots of that came from bar-code scans. You went to a store, you bought something. You got the bar-code scan – well the bar-code scan was initially designed to replenish inventory. But it turns out the analytics behind that – ” Are we selling more Bud or Coors?” Is their market basket analysis of beers and diapers – which still keeps coming up by the way – every business use case on big data has the same thing we used to talk about in 1995 which is ” Oh yeah, people who buy diapers buy beer.” The point is that today that industry, by and large, is focused in on retail-level scan data. Imagine if the in-flows of information are actually machines. Now that is when some of the numbers around the big data market start to ring true. Because today – imagine that is a very small amount of information coming in, it’s just bar-code scans – but if everything was producing a river of information now the opportunity to get those values that you got just in retail just blow the mind.
STACEY H 18:17
Wow – and with my mind blown we’re going to have to call it the end of this chat, but thank you so much.
GAURAV D 18:26
A pleasure. [applause]
CHRIS ALBRECHT 18:32
Thank you Stacey and Gaurav. Up next we have Improving your Product with Big Data Insights.