Predicting the Unpredictable

31 Comments

yummy_wineAfter graduating from college, I left the barren Arizona desert for Manhattan to take my first job. It didn’t take long for my new Manhattanite friends to inform me that it was time to upgrade to wine from beer, so I enrolled in a wine-tasting class. But while it was great fun, I don’t think that I was any better at assessing the quality of wine after I’d completed the class than I was going in, though I was much better at faking it.

It wasn’t until years later that I discovered the secret, and it came via a Princeton economist. Understanding the fact that wine is an agricultural product, and as such is dramatically affected by weather, Orley Ashenfelter used decades of weather data and auction prices to come up with this equation for Bordeaux wines:

“Wine quality = 12.145 + 0.00117 winter rainfall + 0.0614 average growing season temperature — 0.00386 harvest rainfall”

Assessing wine is considered an art rather than a science, but oftentimes creativity is about applying a little science to art — as Orley did by taking into account weather and auction data. In an effort to inspire entrepreneurs to also turn ill-practiced art into science, below I share a few other examples.

The Mathematics of War

Sean Gourley, a physicist by training, wanted to gain a deeper understanding of what was happening with the war in Iraq.  So he worked with a cross-functional group to understand “the mathematics of war.” What he found was fascinating, that guerilla wars around the world — in Iraq, Columbia, Peru, Indonesia and Afghanistan — could be reduced to this equation:

war_function

Gourley tells the entire story in the video below, including how he arrived at the fact that alpha (the slope of the line) is 2.56. As he explains it, guerilla war has evolved to a state of equilibrium that can be defined by an equation — that there is an optimal organizational structure for fighting an organized military. Guerillas either discover this structure, and implicitly, this formula, or they get killed off.

Mathletics, Sabermetrics and “Moneyball”

The story of Sabermetrics and statistics in baseball has been told many times, so I won’t repeat it here. Suffice to say that if you have any interest in baseball stats, “Moneyball” is a must-read.

A story not as widely told is that of Wayne Winston and the Dallas Mavericks. A few years ago, Winston, a decision sciences professor from Indiana University, consulted the Mavericks on a new rating system aimed at measuring the impact a player has on the entire team. Points or assists don’t offer much information in and of themselves; what’s far more valuable information for a team is answering the question: “When player x is on the court, does our lead grow or shrink?”

From a 2003 New York Times article about the system:

Ignoring every traditional statistic for players, Sagarin and Winston have designed a ranking that is modeled on hockey’s plus-minus system, in which players receive credit for being in the game when their team does well. Whether they actually score points or grab rebounds does not matter.

”Did you make the pass before the assist? Did you tip a ball to someone who made a shot? Did you set a pick? Did you take a charge?” said Winston, a fast-talking former ”Jeopardy” champion who, like Sagarin, grew up outside New York City rooting for the Knicks of the late 1960’s and early 70’s.

”Nobody’s got a stat for these,” Winston said. ”Ninety percent of basketball is made up of things there aren’t stats for.”

I just pre-ordered Winston’s new book, “Mathletics,” due out this fall.  I can’t wait to read it.

How to Develop a “Prediction Function”

Step 1:

Start with some insight about the relation between two things — like the fact that weather determines wine quality.

Step 2:

Identify “output” data to tune your prediction function — for example, historic auction prices as an approximation for wine quality.

Step 3:

Graph the data to examine the best way to extract the function.  Does the data look like it fits a line?  If so, do a simple linear regression (a very simple way to do this is the Regression function in Microsoft Excel’s Data Analysis package — unfortunately, it’s no longer available on Excel for the Mac, but you can do it elswhere).  Does the data look like an exponential curve?  If so, you can do a logarithmic regression (here is an online tool for a simple regression). And you can use much more sophisticated statistics to find the right equation, if one exists.

Step 4:

Profit.

There are so many areas in which having the ability to make dramatically better predictions would enhance our lives — jobs, dating and health, to name a few.  I can’t wait to see what the future has in store.

Mike Speiser is a Managing Director at Sutter Hill Ventures. His thoughts on technology, economics and entrepreneurship will appear at this time every week.

31 Comments

Dev

Without a sense of how well the wine data fit that regression equation, it’s pretty useless. You can multivariate regress almost any data and get a precise-looking equation (to three sig figs, as above, or more) that still has an error to, um, one sig fig. You can regress randomly generated data, and compute the best _linear_ fit–but you still need to know how good that fit is (which may not be that good–try to linearly regress a higher-order polynomial, or exponential, or anything else, really). That’s one problem.

Another is stationarity–the system may or may not have stationary behavior in terms of the specific parameters here. Maybe the equation holds up over time, or maybe it doesn’t (as it didn’t with, for example, mortgage foreclosure rates up to last year, or on bond spreads up to the LCTM debacle, etc). Even decades of stable data don’t ‘prove’ stationarity, as we all found out.

And finally, we don’t know which input data are necessarily significant, and which are noisy (and will actually make the model *worse* as a result of their inclusion, given spurious correlations in finite samples). The three input variables for the wine equation above may be the most meaningful, or they might not be (there may be other, more significant variables at play)–but the equation alone won’t warn you against that. See factor analysis or PCA.

Given all that, the equation’s significance really has to be qualified in a few specific ways–it’s not the best quantitative model of wine quality—it’s merely the best _linear_, _historical_sample_based_, _limited_input_ model of wine quality. You can do polynomial (or fourier, or wavelet, etc) regression instead of linear; you can try to model the internal generators of wine quality rather than only using the historical top-line numbers; and you can try to look at a broader set of candidate inputs.

Oh, and another good book in the ‘Fooled By Randomness’ tradition (and with a foreward by Nassim Taleb)–Pablo Triana’s recent “Lecturing Birds on Flying”.

Marcus Greenwood

Don’t get too carried away with your prediction functions. Remember that’s what happened with our beloved bankers and economists over the last couple of years and look where it got us! ;-)

I’d recommend reading The Black Swan for a bit of a contrarian view on all this:
http://bit.ly/OLodI

Mike Speiser

Love The Black Swan and Fooled by Randomness and agree that one must be careful not to confuse correlation and causation. However, in the vast majority of human pursuits the distinction is not that, but rather opinion vs. something potentially better. In many cases, statistics (based on a human insight rather than purely on correlation) can be of use.

R

stats are nice and they make things easy for people but i think as far as sports is concerned (in terms of picking players for a team) they work better in baseball, since its not really a team sport, at least when you’re up to bat. In team sports social factors can play a big role, like players getting along with one another, in football where the average career is 4 seasons a player’s previous injuries also detract from his value. in football, a lot of heisman trophy winners have great stats on paper but when they get drafted to the nfl, they completely go bust; its not that they changed physically in any way, but mentally, some people just can’t handle making it big. This is why bill walsh was able to make such great teams, not because of statistics, but because he knew how to pick good players based on their personality.

Mike Speiser

You clearly know your stuff. I’m not sure if your assertion is right or wrong, but I don’t think stats must meet the bar of “better than they are in baseball” in order to be put to good use. You’re likely right, but my understanding is that the Patriots have used stats to do a good job in the recent past, no?

R

I’m not too familiar with what the pats have done, ever since the niner’s dynasty collapsed (i’m a niners fan) i mostly just watch playoffs and the superbowl. But from what i understand, Bill Belichick like many recent super bowl coaches was also a student of Bill Walsh, i’m sure stats can be put to better use in other sports and people should certainly try, i’m just saying if I was a betting man, I would say that statistics will not be as succesful in other sports (as far as picking the best players for a team goes). But if your goal is to say pick the winner of this year’s superbowl, in the eighth week of the regular season then i think statistics can definitely be put to good use, as I have personally done, its just the particular case of picking players for a team, where i’m not sure statistics will be as successful as they appear to be in baseball. Although like Taylor says below, its possible we just haven’t found the right stats.

Taylor Davidson

Actually, the whole point of many of the new stats in baseball, football and basketball (check out 82games.com) is to capture the “social aspect” of how individual players contribute to the overall team’s play.

As a “team sport played by individuals”, baseball is an easier sport to isolate the impact of individual performance on team performance, but football and basketball are catching up by developing stats that meaningfully capture the impact of individuals on team performance. The inability of past stats to capture the impact of “personality” or “team character” is a signal that analysts are using the wrong stats, not that stats can’t be used to understand what’s going on.

R

I don’t think anyone would argue that the wrong stats are being used, you just have to look at the top ten draft picks over the years for many of these sports It might sound strange but if I was picking players for a football team, off the top of my head I would just look at two or three things, a) injury history, b) performance in pressure situations (4th downs, 4th quarters, overtime etc.) but i also think that something like the players relationship with his mother can tell you a lot about his character, and in some cases his character might be strong enough to outweigh some of your other measurements.

Anonymous

Winston also wrote an excellent book about how to use Excel to do all kinds of real-world business analysis. It’s the only book I’ve kept at my desk at work because his techniques constantly come in handy. It’s called Excel Data Analysis and Business Modeling.

R

The Mathematics of war eqauation, should be cleared up for people who aren’t familiar with the notation, the whole left side of the equation just means “probability of people being killed” although some people might think it means “probability of event times number killed” because of the brackets. In this case the brackets are not brackets for multiplication, they are the brackets of a probability function.

R

i wasn’t aware of this till you pointed it out. i guess it is just a coincidence.

R

Sorry, I didn’t notice you wrote this article. I just want to say, this is the first time I commented on a story in a long while, so I guess it tells you how much interesting I found it, thanks for putting it together.

Chris Kauza

Mike,

Thanks for the thoughtful and informative piece. I always enjoy the blending of different disciplines that yield new insights, much as you might find with a good Cabernet or Merlot.

It’s amazing how the more one understands the intricate complexities of the World, the more one can appreciate the broad subtlety of its beauty, isn’t it?

banu

great post .offlate i was getting bored with postings on gigaom talking about iphone apps or something similar boring posts .this one is real intresting post.

J S

You’ll get better with wine in picking out the things you like/don’t like (too hot/alchohol, stems left in, etc). Wine is often a personal review that experts cannot really predict. Sometimes I like to go to movies the traditional critics pan because I know they will often be surprisingly good, or avoid the ones they glow about as they are often surprisingly bad.

In terms of the weather equation .. many wine experts, for which the Bordeaux wines have a more concentrated panel of experts, know all about the Terrior including the soil and weather. They will advise buyers of good/bad weather patterns and what to expect in ‘quality’. The equation is just measuring how tightly price data is following industry expectations.

Most interesting data are the blind tests that show bottle price but nothing else (and then secretly swap high-low matches). The $100 placed on a Two-Buck-Chuck makes it really shine while $2 on the fancy Bordeaux gets a lot of unfavorable comments. Perception is a strong influencer.

Take your friends for a weekend tour of NY wineries. You’ll have some fun.

ala

there was a recent article about the +/- of basketball players recently. it focused on shane battier of the houston rockets though. i don’t know if the same guy was keeping stats for rockets and mavericks or it’s becoming a more common thing to use.

Mike Speiser

Will check it out, thank you. I love Michael Lewis and I’m not surprised that Mr. Moneyball is the one to have written such a piece…

Wilf

Excellent. Perhaps we should add this to the new “functional maths” course in the UK. How to make money from predictions..

Comments are closed.