Statwing is awarding $1,500 for the best insights from its massive social science dataset

Statistics startup Statwing has kicked off a competition to find the best insights from a 406-variable social science dataset. Entries will be voted on by the crowd, with the winner getting $1,000, second place getting $300 and third place getting $200. (Check out all the rules on the Statwing site.) Even if you don’t win, though, it’s a fun dataset to play with.

The data comes from the General Social Survey and dates back to 1972. It contains variables ranging from sex to feelings about education funding, from education level to whether respondents think homosexual men make good parents. I spent about an hour slicing and dicing variable within the Statwing service, and found some at least marginally interesting stuff. Contest entries can use whatever tools they want, and all 79 megabytes and 39,662 rows are downloadable from the contest page.

For example, men today seem to value the free speech rights of atheists slightly more than women do. Here is how women answered this question in 2012.

swathfem12The same question to men.

swathmenActually, men value the free speech rights of racists more than women do, too — and by a greater margin than for atheists. So maybe men are more into free speech, or maybe they’re just more racist. But everyone seems to think race is a more sensitive subject than religion. Of course, I don’t know — drawing conclusions from numbers is not always easy.


The same question to women.
swracef12Here’s a graph relating the highest grade level that people completed with their family income.

swgradeAnd one comparing education levels by geographic region. The stereotype holds true in that the south is the least educated, but not by a huge margin.


The way Statwing is set up and the way it has organized the data, users can filter and compare any variable by any other variable, which makes digging into any given question pretty fascinating and potentially time consuming. I think some machine learning algorithms might be called for to find the latent, but strong correlations in all the data.

Prize money aside, this type of data is still immensely valuable, and the service Statwing did in cleaning it up even more so (it occasionally does this for other datasets, including, last year, NFL data). Although the media and tech conferences — including our own Structure Data conference in March — like to focus on the cutting edge of data analysis and the biggest data collectors around, there’s still a lot to be said about the importance of just putting the right data and the right tools in the hands of the general public.

Given the right data, a lot of people with a lot of collective time on their hands could find a lot of really interesting, if not valuable, things and maybe even build some really good products.