Stay on Top of Emerging Technology Trends
Get updates impacting your industry from our GigaOm Research Community
Session Name: Where Is The Big Data Industry Going?
S2 Sean Gourley
Alright, up next we have Where is the Big Data Industry Going, something Im sure all of you want to know. And thats going to be the speaker Sean Gurley, he is the CEO of Quid. Please welcome Mr. Gurley to the stage.
SEAN GURLEY 00:28
So big data is actually quite big, and indeed today it cuts across many different industries and wraps itself around the world. Indeed, big data is so ubiquitous that it has decoupled itself from the very technology that once drove it. The databases, the machine loading algorithms lot less important today then the philosophical outlook. When we talk about big data now, were really talking about a philosophical outlook that embraces empiricism, that has quantitative measurement at its heart, and perhaps suggest that if something cant be measured, it may not be as valuable as something that can be.
SEAN GURLEY 01:06
Yet, we as a society have embraced this philosophy as we grasp for a meaning of the complexity of the world we live in. Everyday were outsourcing more and more of our decisions to algorithms to choose the music we listen to, the friends we spend time with and even, if youre the case of an uber-driver, your employment contract. So we as a society have moved to a world where we need to start to question what the limits of big data are. About what kinds of problems it can solve, and what kinds of problems it should solve. This is exactly what David Brooks did last month, for those of you who read this article, the Op-ed piece in the New York Times. He looked at the limitations of big data and identified actually four big attributes that big data struggled with.
SEAN GURLEY 01:58
The first of these is understanding the nuisances and dynamics of human relationships. The second of these is it tends to lean on a crutch of correlation at the expense of causation. The third aspect is big data holds up as a badge its objectivity, while trying to disguise some of the value judgments that are implicit in both the measurement and the model. But perhaps most damning of all, was the idea that big data cant solve big problems. That we are all sitting here today with one of the biggest technologies that weve invented, and were using it to solve trivialities. We use it to solve advertising problems, hypothetical data scientist pulling down information about a childrens breakfast cereal, measuring and constructing data to build models to determine the exact color of the packaging, the position on the shelf, and how much money we should charge for it. We have brilliant models of this, and yet data is strangely absent in the bigger questions of childhood obesity, diabetes and indeed nutrition in general.
SEAN GURLEY 03:04
The nave empiricism of data science is reaching for low hanging fruits. It measures what can easily be measured, it changes features that can easily be changed, and as a result of that it ignores the bigger problems that we as a society face. Data scientist are presented with a set of parameters to optimize over, yet they dont take the time to step back and say, should I even be optimizing this at all. This is something that we actually have to deal with, and I think were in this world, because if we rewind the clock back about five years, rewind back to a time of Facebook, 2008 it hits 100 million users, and we have to appreciate this from historical context. This is one of the biggest movements of quantification of a society that well perhaps ever see. Billions of people are now going online, and taking themselves and their relationships and reducing it down within a controlled system designed by a Harvard undergraduate to a set of a few variables, and colored blue because Mark is color blind. We have conformed ourselves to his view of the world, and we have taken ourselves into a set of vectors that we can then advertise against. This is the trend that has defined us more than anything else, and its also the trend that has walked lock step in hand with the rise of data science.
SEAN GURLEY 04:30
Data science has at its very heart social networks. We take Jeff Hammerback and Vijay Patel, and these are two people who indeed coined the phrase data science, and they were both leading the analytic teams of the two biggest social networks that live in our world today, Facebook and LinkedIn. The philosophy of data science has the hallmark of their experience, but of course, it does. Millions of data points, a clean structured environment, you could run experiments and A/B test. Their job was to optimize the user experience, user engagement, and ultimately extract money from users who were on the system. They got so good at doing this, that now you can take the like information, that we publically share, you can run data science on top of this, and extract out your gender, your religion, your sexual identity and your intelligence. What data science gives us, is that curly fries are highly correlated with intelligent people, this is the great hope of data science.
SEAN GURLEY 05:35
You see we sort of laugh at this, because it says nothing about intelligence, the bigger deeper problem. But in the world of social networks and advertising we dont care, because if I want to advertise to smart people I just look for a correlation with a variable, I dont care about what the variable actually means. So this is the world that were confronted with. Data science, I believe, we need to re-imagine it, because data is incredibly powerful. We need to step back from the scientific notations and start thinking of it as data intelligence. Data intelligence has a slightly different philosophy that embraces some of the messy and unstructured nature of the world that we do live in.
SEAN GURLEY 06:25
So while Jeff and Vijay were running their analytic teams back in 2007, I was in a slightly more hostile place. Sitting in Iraq trying to understand the dynamics of human conflict, it was a messier situation; information was harder to come by. It was complex, but data most certainly had a place at the table when strategic decisions were being made. We had to work a little harder to get the information because of course the insurgents were not structuring the data about their attacks for us. We had open source news reports from 100s of different sources, blended them together, traded algorithms to extract out the event details about when the attacks happened, how many people were being killed, and where they occurred. With this data set, we were able to run analytics on top, and determine striking mathematical signatures that really characterized and defined the way that people were killing each other. Shown here is one of those signatures for Iraq, but this signature showing the distribution of frequency versus the size of attacks repeated itself around the world, independent of geographical or political differences. When we published this work, the first thing that everyone asked is, Can you predict when the next attack is going to happen? Can you predict how many people are going to die? Can you predict where its going to occur? On some levels, of course, you can make a deductive prediction using this model, but why would you sit down and predict if three people are going to die tomorrow, and not try to do anything about it? This is a sort of sense of the world that we are in, where we are obsessed with prediction.
SEAN GURLEY 07:59
I guess the mea culpa here is that I sort of pander to this, because this is the easy story to tell. An equation that predicts conflict, when the deeper thing is we need to understand the eco-system of the insurgents, how they make decisions, how they allocate resources, how many groups there are, and what their half-life is. When we do that, we can actually use this model to understand some deeper truths about insurgency. When you go to the Pentagon, they sit down, and they say, Should I send more troops to Iraq? the mathematical equation will tell you that, but the model can actually give you some insight. The model can tell you how much longer the war might last if you send troops on the ground. Whats interesting here if you send more troops, counter to what you think, the conflict actually moves to increase on the green line, and it comes to a critical point. If you get above that, you get a sharp decrease in the length of the conflict. The critical question then, is did I send enough troops, because if I havent the conflict is going to go on and on.
SEAN GURLEY 09:01
This is not a precise science, its much more a set of quantitative inputs that youll use for a very complex set of political decisions. But you wouldnt want to do this without the data in your hands. We think here about what data science is and compare data intelligence is, the two occupy different ends of the data spectrum. Data science is concerned with improvement of around 10%, small incremental gains; data intelligence is looking for 10x improvements. The goal of data science is to predict and optimize; the goal of data intelligence is to create, change and shape the world were in. The decisions of data science are invariably made by algorithms, the page rank showing information about the right search and the pages theyre in; data intelligence the decisions are most definitely made by humans. These are some of the most complex world decisions we face, humans are making them. The data for data science its big, its clean, it tends to be well structured; data intelligence is small and messy. The communication we use in data science is equations, where in data intelligence, its stories. The problems we face in data science and use to solve are tactical in nature, where as with data intelligence they tend to be more strategic. Its these strategic problems that of course people have been trying to solve them, whole classes in governments and countries around the world. They have been leveraging the strategic toolkit, the basic toolkit that we have, Google, a pivot table in Excel, and throw it up in PowerPoint to present. My experience going through this made me realize that we actually need to build tools for these people. We need to do better than a consumer focused search engine and an Excel spreadsheet thats meant more for budgets than strategic decision making.
SEAN GURLEY 10:56
Three and a half years ago we started the company Quid, and Quid is building this software. We serve, at the moment, some of the biggest banks in the world, some of the biggest corporations, foundations and of course government agencies. They use our software to answer some big questions, questions like what is the most effective way to allocate capital to spur growth in K12 education technology, what are the dominate narratives about climate change in India, what are my competitors doing with advance flexible displays, and should I partner with them or compete, what are the structures of insurgent groups in Syria, and what is the likely impact of peace keepers. These are most definitely big questions and they are questions that are actually beyond the reach of the human brain to solve. We need data to solve them, but data science and data alone wont give us the clean slick solutions that were looking for.
SEAN GURLEY 11:53
So in working with this over the past decade, Ive come up with heuristics that I really think guide our use of data as we tend to solve these big problems. I want to share them with you today, so you can take some of them away as you use data to solve big problems. The first of these is data needs to be designed for human interaction. Humans are making the biggest decisions in the world, and we need to design data for them to consume. The needs of humans are very different from the needs of machines. Humans have a need to get in, feel and have a tactile response to this information, and we should design for that. We need to design human centered interfaces, so they can explore information, they can follow hunches, and they can use their intuition as they weave together multiple story lines. If we do this, we will start to make better decisions from it. Of course as humans use this, we need to understand the limits of human processing, we need to know that the human brain is a massive computational engine, but it has limits. It can store about 150 objects in its functioning memory, but it can only read about 200 to 400 words per minute if it consumes it through reading, and it can make decisions no faster than 650 milliseconds. These things open up opportunities for algorithms to start moving and taking up the slack where the human brain doesnt go. We need to be conscious of that and design systems that let algorithms do what algorithms do best.
SEAN GURLEY 13:24
Data is messy, its incomplete, and its bias. If youre sitting down there trying to optimize where the next store location for Starbucks in New York City is going to be, you probably have a sequel database you can pull down, pull it through your model and youll get an output. When youre looking to search down insurgent dynamics, you might have had written notes in Arabic that dont really lend themselves to optical character recognition. Youve got to deal with this, because if youre solving a problem that hasnt been solved, chances are that no ones created a custom made database to solve it for you. We get around this, but we also need to do what is uniquely human, which is to spend time with that information, get our hands in it. This is highlighted with the likes of mortgage-backed securities, AAA ratings that if you spend time with and dug into the people who were writing the signatures on the bottom line of those mortgages, you would see that the incomes didnt necessarily justify the kind of mortgages they were getting.
SEAN GURLEY 14:26
Data needs a theory. Where I come from in the world of physics, you dont truly understand something until youve got a theory to describe it. In a lot of the literature, theres been a lot of debate around whether big data negates the need for theory. That we simply have so much information that it doesnt really matter what the theory is. Thats fine if youre trying to predict a future that looks much like the past, but the clients that we work with, they make decisions that shape the very world that we live in. When you decide to send 30,000 troops to Iraq, you change the situation on the ground there, and the data youve collected and the predictive models that you had before, my very well no longer apply. If youre going to change the world, you need to first understand it, and to understand that world, you need to have a model. When you have those models, you can manipulate variables and start to imagine what it would be like based on the decisions that you are going to make. Theory is very important.
SEAN GURLEY 15:32
And finally, data needs stories, but stories also need data. Data, when its put up in front of you as a number, it gets stripped of the context of where the data came from, the biases inherent in it, and the assumptions of the models that created it. We put that number up in front as data imperialism, and we say just accept it. The stories that we tell actually these actually incur the nuisances of the information that are incredibly important. I can show you an equation here covering insurgent dynamics, and we can take a deep breath and rest with the mathematical certainty that that confers. I can also tell you the story of Hercules and Hydra, slicing off a head for only two more to grow, and this equally well describes an aspect of insurgency. The beauty of this is its incredibly transmissible which is important when youre making political decisions. Indeed this has been around for 3000 years. What we need to do is combine together these kind of stories with data, and if we do that we have a chance of making better decisions.
SEAN GURLEY 16:41
I believe that big data can indeed solve big problems, indeed it must. When we do that, it looks like data is a science and it leans more to data as intelligence. A world that starts to embrace the messiness of information around us, that has humans most definitely inside of the loop, and looks for underlying theories and understanding, and doesnt just accept correlation as an end goal. We need to design the interface between man and machine. We need to design a way where humans can do what they do best, and machines can do what they do best. If we do that we have a chance.
SEAN GURLEY 17:20
I want to leave you with a final myth, the myth of the Centaur, the half man, half beast. In that myth these are the wise creatures with skills and prophesy and wisdom. I think when we start to tackle these problems, we need to ask ourselves, Is the problem Im solving truly a big problem, and is it worthy of the technology weve all been granted access to; am I using the elements of human ability properly with the elements of machine ability? We need to think about how to build better Centaurs. Thank you.
Thank you Sean. I like waffle fries, I dont know what that means, but curly fries are delicious. And we are going to enter in to our first break of the day. I hope youve enjoyed the morning sessions, weve got a one hour break sponsored by Qubole, thank you so much. During this break you can network, please meet each other, use the app to interact with each other if you want to exchange contact information. Visit the exhibit areas and the workshops are located right behind here. There are three, get there early because they fill up. Microsoft is sponsoring a workshop in the Oceanic Room; Snaplogic is sponsoring a workshop in the Aquatania East room; and stop by the GigaOM research table. Find out more about our research service and get your sector roadmap report and get some refreshments. General session will resume at 11:30. Thanks everybody.