Blog Post

Is your big data worth the effort?

[protected-iframe id=”f5278945ad5920b59b45aea68168f2c9-14960843-25766478″ info=”http://new.livestream.com/accounts/74987/events/2117818/videos/22072561/player?autoPlay=false&height=360&mute=false&width=640″ width=”640″ height=”360″ frameborder=”0″ scrolling=”no”]
Transcription details:
Date:
20-Jun-2013
Input sound file:
1005.Day 2 Batch 3

Transcription results:
Session Name: The Value of Information

Joe Weinman
Dave McCrory

Joe Weinman 00:00
Warner Music Group, lets welcome Dave out to the stage.

[applause]
Dave McCrory 00:09
Hi, so you would have seen that my presentation was described as the Value of Information, and really what I’m going to describe is info economics. And I’m going to describe what that is right now. So information is really communicating with a goal and trying to convey something that will reduce uncertainty or increase the probability of how an outcome is going to turn out. So this is kind of a simplification but for the purpose of this talk that’s what I’m going to go with.
Dave McCrory 00:51
Economics is something that we’ve all heard of, has been talked about many times. Analyzing the production, distribution, and consumption of goods and services. So this is interesting in manufacturing and all other lines of business, and it’s a study that’s been around for hundreds of years. When you combine the two, it’s analyzing the production, distribution, and consumption of information – info economics.
Dave McCrory 01:20
There’s a micro-economic theory that’s called info economics that talks about gain theory and bunch of other things. I’m kind of reusing the term primarily because I believe that we don’t think enough about information in the technology industry. We have IT and the enterprise, and that’s something that we all take for granted. It’s supposed to be information technology, yet if you look at all of the talks and other things they’re all centered around data. I don’t see data and information as interchangeable words or things. I see them as very different. They’re still very related, but they’re different.
Dave McCrory 02:06
Data is not information. So, it’s bits and bytes, it’s a number, it’s a name, it’s very specific, it’s a small thing. So if you look over on the right hand side you see the number 1033. That doesn’t mean a whole lot just by itself. it’s a bit of data, it can be bits and bytes, it could be binary. We could have 1 0 1 0. That might mean something to someone who knows computer science, it’s not going to mean a whole lot to someone who isn’t familiar with that.
Dave McCrory 02:42
I also see data as a raw resource. And if you don’t know what that odd looking rock is over there, it’s called tight oil, which most of us think of if you think of shale oil. It’s actually something that contains oil inside of it, but it’s actually fairly difficult to mine and expensive in the energy industry. The reason is because the oil is actually trapped in this rock, and so there are only a few ways to actually get the oil out. And I see information as something similar, its a raw resource that you have to apply a process to actually mine it and get it out.
Dave McCrory 03:24
So to turn this into something useful, you have to apply something else. If you look at these numbers, you might think of the top one as something like say a credit card information. And the second number from the top is maybe a phone number or something like that. The third could be a US social security number for example. It’s really all based on that same number that we talked about, but we have some knowledge of what this could be and with the addition of the context that I was describing, it actually takes on a greater meaning. It’s becoming information.
Dave McCrory 04:02
But information really isn’t the end game. Information isn’t the end game because you have to do something with the information. It’s fuel, you can power something, but just having the information and doing nothing with it really isn’t all that powerful. You’re not giving value for the information you have. There’s a value chain to be applied to this. Just like there are value chains in production and economics.
Dave McCrory 04:29
You might think of it as that 1033 that I spoke about before, could be an address, you know Page Mill Road. Then you know it must be a road, and you know it’s Palo Alto, and we now know it’s an empty lot. That gives me additional data around that bit of data that I didn’t have before.
Dave McCrory 04:50
So there’s an information value chain. I described this one before, this chain is how you move from having a bit of data into being able to take an action. So you move from data to information, information applied to some base of knowledge, where you add it to your base of knowledge, and then you’re able to take an action if appropriate. The values in the action. So there’s a cost to obtain that information, and there’s a value returned if you take an action on that information. That’s really what you’ve got to be focused on in IT, in business. If you’re not gaining value from the information, and therefore with the data and what you’re doing with it, then you’re wasting time and money, and you should be focused on something very different.
Dave McCrory 05:45
So what does it cost to produce information? That’s something that oddly don’t hear a lot of people talking about. I hear a lot about all these different projects data and such, but what are you really getting out of it and how are you getting that. So if look at the process of moving from that shale oil all the way over to something that becomes fuel to put in your car, the real action you care about is obtaining that fuel and actually being able to use it to drive somewhere. It’s not all that handy to just sit and have a bunch of rocks over in the corner. We seem to lose sight of that very often. We get excited about the technology, but not the actual implications it’s supposed to have on the business.
Dave McCrory 06:30
So, applying that same thing to this chain we have the chain, but how much data do you need? How much information do you need? And what is the knowledge applied? And what is the value of that action? It really boils down to how do you apply the cost to obtain versus the value returns. And it’s interesting because the energy industry does the same calculation when they’re mining raw materials to turn it into something that’s based on an energy source. So there’s energy returned based on energy invested. So how much effort did it take? How much money did you spend, energy did you expend to actually get this new energy source back out? And if you expended more on mining than the energy you’ve gotten out, it’s a losing proposition. And people don’t seem to look at IT and data systems all of these things as something that could be a poor returning proposition. They think that magically because they built a system that collects data that somehow that’s going to provide them with this incredibly large amount of value.
Dave McCrory 07:43
It’s not that hard to follow a model like this and figure out are you really getting any value, are you getting some value, or is the value so tremendous that it does make sense to continue to make these investments. We don’t seem to be doing that or talking about that, at least not enough.
Dave McCrory 08:00
So we have our example of data with the 1033 information being Page Mill Road applied to the 1033, so now I have some context, so now it’s an address, we know that. We happen to say in our place, No, it’s a meeting place. So we have the knowledge that there’s supposed to be a meeting place and we have the address, we know it’s an address and we have the information. And that gives me the ability to drive to the meeting, so that’s the action I want to take. So I can take that action only because I had all those different bits and I was able to apply them and put them together.
Dave McCrory 08:36
Big data, which we talk about, usually turns out to be small information. You have huge swaps of information, giant clusters, all these other things, and you get a tiny little bit of information out of all that. So the question is, is it worth it? And in some cases it very well can be, you might gain some very valuable business insight. In other cases, I’ve come across several people that are just collecting data to collect it, because at some point in the future they might be able to get some valuable information out of it. The question is, is it worth doing that? Is it worth keeping that information? And if so, for how long? Who’s doing the analysis on how long they should keep the data? You know you can’t just store all of your data forever, it’s not free to do. And it’s a compounding problem. The more of it you have the more of it you have to maintain, the more of it you have to store. And ultimately the more you would have to comb through to actually get that information that you like. And again in some cases it would be worth doing that, and in other cases it won’t be. It will be something that was actually a losing proposition.
Dave McCrory 09:45
Something else to thing about is ubiquitous information overall has a low value. If everyone knows that bit of information, then it’s not going to provide you with that super special high powered fuel that you’re looking for that’s going to give you the capabilities that you really need to have an advantage in your business.
Dave McCrory 10:04
However, scarce information can have a very high value. This is something that if you look at trading, and if you look at secret algorithms, or even the formula for say Coca-Cola, that’s top secret that is scarce information. So it has a very high value, because it can produce something very valuable. If it became more and more ubiquitous the value overall would be lowered.
Dave McCrory 10:31
There are other effects to this by the way, that I would mention. One of the effects could be that your goal is to have some type of information ubiquity, because that’s simply fueling a different outcome. The action might be that I’m trying to send out a marketing message, and my information is I have a new blockbuster movie coming out, and I want that information to be ubiquitous because it’s driving a different action. And I’m making the money off of the action of each person going out and buying a ticket sale. But the information that the movie is coming out, once it’s become ubiquitous, that core information itself has very little value.
Dave McCrory 11:06
If I was the first person to know that movie was coming out, no one else knew about it, I would have the ability to gain a lot of value directly out of that bit of information. I could go sell that to a news source or something else, and they would pay me. And therefore I would get a higher level of value out of it.
Dave McCrory 11:24
So getting the most value out of information follows that EROEI ratio that I was talking about. In fact, with shale oil or tight oil, there’s actually a ratio. And in some cases .6:1 return, so you actually lose money if you try to mine that shale oil. In other cases, depending on how it’s formulated and where you actually tap it, you can get all the way up to a 20:1 return. So the question is if you make the right choices, you can get a tremendous return out of mining it. In other cases you can get a very poor losing return on that exact same investment. And I don’t think we’re careful enough about that. So the idea is to, of course, find the lowest cost way to get the highest level of value. It’s also trying to get the greatest amount of accuracy. So the thing about the accuracy or getting the highest level of probability out of that bit of information. So if you have highly accurate information it’s going to be much more valuable, and there’s a higher level of value to that then something that might have a 20% chance of being correct.
Dave McCrory 12:38
Fast access to the information so you can be advantaged by having very rapid access to the information, if you have rapid access, you can be advantaged, that’s something that happens in stock trades and other things, that’s the whole fight between trading companies trying to be closest to the change sources. They can be algorithmically advantaged and have fastest access to the information. Finally, the argument around if you can have access to more scarce sources of information, then you can be advantaged over and over again against your competition. Overall, if you think about info-economics, it’s all about getting from the information to the action. In some cases the data to the action, but information is the energy that fuels actions. Ultimately you need that information, you should be focused on that, not data, and please don’t confuse the terms. It happens far more often than any of us realize, and I’m as guilty as anyone else of making that confusion in the past between the two. Thank you very much, and feel free to engage me on Twitter. You can also visit my site, datagravity.org, where I talk about all sorts of things around data, information. Datagravity. Thank you very much.

[applause]

2 Responses to “Is your big data worth the effort?”

  1. Matt Fates

    Clearly an important question to ask when spending big dollars on IT projects…”Is it worth it?” But in this case, that leads to a number of other questions, such as “If you could take advantage of all available data related to a specific problem or opportunity, would you be able to make better decisions?” and “Is gathering all the data easy? hard? reasonable?”; “Do you already have a lot of the data, but it is silo-ed or dispersed?” I certainly agree that Big Data is all the rage at the moment, and some companies are likely deploying it in a way that is total overkill…but they are also learning from their projects and attempts…and this will be valuable. And the costs will come down. As it becomes cheaper and easier to leverage data, those who have started early will have an advantage as they have been investing resources in figuring it out ahead of the crowd.

  2. Ah, but Big Data and its mining is the vogue thing to do, right? Oh and by the way, why is it that many a (particularly Silicon Valley) company finds the need to brag about their use of big data and its mining? Are these companies really that insecure about themselves they find the need to show off (and in the process justify themselves to their VC investors)?