There’s nothing quite like getting settling into the couch on Sunday afternoon (or morning on west coast), cracking open a beer and yelling at a football coach who gets paid millions of dollars a year to do his job. After all, the guy’s clearly an idiot. Who would run it up the middle on third down and eight? And why does the team still punt the ball all the time? You never punt in Madden NFL, and you win all the time.
You probably think I’m being sarcastic, but I’m not. Statistically speaking, football teams should go for it more often, they shouldn’t run on third and long and they’re almost certainly better off going for two-point conversions. The guys behind Statwing laid it all out in a blog post on Monday. What’s more, they’ve uploaded an entire data set of NFL statistics to their service that users can play around with for free to analyze a huge number of occurrences and correlations.
It’s all about democratizing data
Statwing, you might recall, is one of the “data for dummies” tools I highlighted in a January post about advanced analytics tools so simple anyone can use them. Right now, it’s one of the simplest there is. Here’s how I described Statwing then — although it actually performs more types of analyses:
“You upload data, check the variables you’re concerned with, and it plots their relationship. (It also can describe the variables by highlighting the sample size, minimum, maximum, mean, median and standard deviation.) Graphs are accompanied by explanations as to how strong the correlation is based on various statistical metrics, as well as the results of a linear regression model.”
The ease of use is by design, says Statwing co-founder Greg Laughlin. “There’s a general zeitgeist that people should care about data now,” he told me during a recent call, but they don’t always know to get started or really even see how all the hype around data relates to them. Early on its existence, Statwing is trying to answer both of those concerns by building an easy-to-use service that also happens to teach users about statistics, and by offering up some interesting data sets for people to play around with.
The latter part is easy, but valuable. Data sets like the NFL data or one about the Titanic’s passengers let other people into the data game and get them thinking statistically. They get people saying, “‘Oh, I grok that. I see how this interesting, I see how this is useful,’” Laughlin explained.
Building a data-analysis service that’s actually usable by mere mortals is a bit tougher. At its core, Statwing relies on a rules engine that considers the type of data uploaded and the types of variables (a maximum of two right now) a user wants to relate to each other. It can handle between 10 and 15 different analyses right now depending on how one defines them, Laughlin said, but at any rate they’re the ones used most often.
He credits Cloudera co-founder and Chief Scientist Jeff Hammerbacher (with whom, along with Greylock’s DJ Patil, I’ll be doing a fireside chat at Structure: Data on Thursday) with helping Statwing decide to make the rules engine the service’s core.
That has been a wise decision because it lets lay users get what they need out of the service without worrying about the underlying functions. Statwing has users that never click the “advanced” tab that shows the statistical breakdown, Laughlin said. They just use the service, essentially, as a faster way of making charts than using Microsoft Excel, and the headline stating whether or not there’s a statistically significant relationship is all the info they need.
“That’s really exciting for us,” Laughlin said. “… It’s giving them the power of stats without them having to think about it.”
Paying the bills with bigger users
Of course, a startup can’t survive on free and unsophisticated users alone, so Statwing is ramping up its money-making efforts. For example, it has “just turned on the paywall in a really light way” by “maybe” charging really heavy users, Laughlin said. In the future, though, Statwing wants to add support for more variables and larger data sets (there’s a 5MB limit right now), and perhaps build in some predictive analytics.
“That kind of analysis is really powerful, really extensible,” he noted.
As the service grows, Laughlin sees the ideal paying user being someone who currently has to use statistical-analysis software like SPSS or R, but who doesn’t really go beyond the basic functions. That type of user has real business need for the software, he explained, but they don’t need all the complexity and arcane statistics dressing that comes along with that that type of product.
Some people don’t want advanced analytics democratized, Laughlin added, because they think people can’t ask the right questions. On the contrary, Statwing’s theory is that most people just struggle with the logistics of cleaning and formatting data and then knowing the terminology associated with the business questions they want to ask.
Back to football …
But forget business users — when will football coaches start caring about statistics?! Maybe not any time soon. Laughlin said a friend of his who works on the MIT Sloan Sports Analytics Conference sees a lot of interest in analytics from the higher levels in sports organizations, but noted that anecdotal evidence suggests most coaches aren’t too interested in letting data influence their decisions too heavily.
Think of a situation like fourth down and goal on the two-yard-line as akin to a CIO choosing between Oracle and some new whizbang database. Nobody ever got fired for buying Oracle, and nobody ever got fired for kicking a field goal.