<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>GigaOM &#187; statistics</title>
	<atom:link href="http://gigaom.com/tag/statistics/feed/" rel="self" type="application/rss+xml" />
	<link>http://gigaom.com</link>
	<description></description>
	<lastBuildDate>Sun, 19 May 2013 03:33:07 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='gigaom.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/0db8f6557d022075dbbf010c54d46d93?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>GigaOM &#187; statistics</title>
		<link>http://gigaom.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://gigaom.com/osd.xml" title="GigaOM" />
	<atom:link rel='hub' href='http://gigaom.com/?pushpress=hub'/>
		<item>
		<title>Predicting Twitter popularity is all about probability</title>
		<link>http://gigaom.com/2013/04/26/predicting-twitter-popularity-is-all-about-probability/</link>
		<comments>http://gigaom.com/2013/04/26/predicting-twitter-popularity-is-all-about-probability/#comments</comments>
		<pubDate>Sat, 27 Apr 2013 01:56:38 +0000</pubDate>
		<dc:creator>Amanda Alvarez</dc:creator>
				<category><![CDATA[prediction]]></category>
		<category><![CDATA[Social graph]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[Twitter]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=635097</guid>
		<description><![CDATA[How can you predict which tweets will get more views, and more viral retweets? A new study developed a statistical model to estimate the popularity of tweets.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=635097&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Tweets have the power to <a href="http://gigaom.com/2013/04/23/aps-twitter-account-suspended-after-hacking-incident-roils-markets/">decimate markets</a>, but they also have users and companies <a href="http://paidcontent.org/2013/02/21/the-massive-advertising-shift-that-twitter-is-trying-to-capitalize-on-with-its-api/">seeing dollar signs</a>. With huge marketing, political, and social mobilization potential, how can you predict which tweets will get more views, and which retweets will go viral? <a href="http://arxiv.org/abs/1304.6777">A new study</a> developed a statistical model that attempts to estimate the popularity of tweets, and thus how memes spread.</p>
<p>Starting with 52 “root” tweets from users both famous and obscure, the researchers first analyzed the dynamics of retweeting, like the speed and spread of a tweet from a user to followers and then their followers. The researchers, from the University of Washington, MIT, and Penn, used the Twitter API to collect all the retweet information and found that most retweets occurred within one hour of the original tweet. Not surprisingly, they also found that root tweets are retweeted more than the retweets themselves.</p>
<p>They then plugged the important variables –- number of followers, retweet speed, retweets of other tweets –- into a Bayesian model, a statistical approach that uses prior evidence (the root tweets) to calculate how the retweet graph evolves. They experimented with feeding the model different amounts of prior evidence to see how much was needed to make an accurate prediction. Using only 10 percent of the retweets to guide the model, they were able to reasonably accurately predict retweet time and volume, and the error decreased the more retweet data they included. The average retweet time was only 4.4 minutes.</p>
<p><img  alt="tweet-prediction-kimkardashian" src="http://gigaom2.files.wordpress.com/2013/04/tweet-prediction-kimkardashian.png?w=300&#038;h=228" width="300" height="228" class="aligncenter size-medium wp-image-635098" /></p>
<p>Throwing more information into the prediction engine (like whether a particular follower has a large numbers of followers of his or her own) could improve the accuracy. Their model was thrown off, it seems, by a few anomalous tweets with a very rapid onset and termination of retweets that didn’t follow the same pattern as the other tweets. (Though they don’t identify who sent those tweets, my bet is on @KimKardashian, whose followers’ actual and predicted retweet timecourse is pictured above.) The researchers didn’t even consider the time of day a tweet was posted, nor its content; there is likely huge potential to mine in those domains for what, and when, leads to trending.</p>
<p>With the abundance of the Twitterverse open to developers via API, this study represents just the tip of the iceberg in predicting tweeting behavior, something that <a href="http://gigaom.com/2013/04/09/blab-predicts-what-people-will-tweet-blog-and-report-on/">startups like Blab</a> are busily pursuing. It also shows that robust methods like Bayesian statistics can predict if a tweet has any retweet life left, and thus whether it can gather more eyeballs and clicks, something that is sure to prove very lucrative.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=635097&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=811916"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=811916" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=635097+predicting-twitter-popularity-is-all-about-probability&utm_content=neuroamanda">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/01/newnet-q4-platform-mania-and-social-commerce-shakeout/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=635097+predicting-twitter-popularity-is-all-about-probability&utm_content=neuroamanda">NewNet Q4: Platform mania and social commerce shakeout</a></li><li><a href="http://pro.gigaom.com/2012/01/newnet-q4-platform-mania-and-social-commerce-shakeout/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=635097+predicting-twitter-popularity-is-all-about-probability&utm_content=neuroamanda">NewNet Q4: Platform mania and social commerce shakeout</a></li><li><a href="http://pro.gigaom.com/2011/10/newnet-q3-facebook-remakes-headlines-in-social-media/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=635097+predicting-twitter-popularity-is-all-about-probability&utm_content=neuroamanda">NewNet Q3: Facebook remakes headlines in social media</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/26/predicting-twitter-popularity-is-all-about-probability/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2010/04/twitter5.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2010/04/twitter5.jpg?w=150" medium="image">
			<media:title type="html">How Much Energy Per Tweet?</media:title>
		</media:content>

		<media:content url="http://2.gravatar.com/avatar/e37323b74d1f383817d82c9f906b7bcf?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">neuroamanda</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/tweet-prediction-kimkardashian.png?w=300" medium="image">
			<media:title type="html">tweet-prediction-kimkardashian</media:title>
		</media:content>
	</item>
		<item>
		<title>Liking curly fries might not mean you&#8217;re smart: When mere data isn&#8217;t enough</title>
		<link>http://gigaom.com/2013/03/25/liking-curly-fries-might-not-mean-youre-smart-when-correlation-isnt-enough/</link>
		<comments>http://gigaom.com/2013/03/25/liking-curly-fries-might-not-mean-youre-smart-when-correlation-isnt-enough/#comments</comments>
		<pubDate>Tue, 26 Mar 2013 01:10:35 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[online data]]></category>
		<category><![CDATA[privacy]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=622699</guid>
		<description><![CDATA[A recent study found strong correlations between people's Facebook likes and a number of personal characteristics such as sexual orientation and intelligence. But relying on correlations as proof of anything is a questionable practice.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=622699&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>You might have heard recently about <a href="http://www.pnas.org/content/early/2013/03/06/1218772110.full.pdf">a study finding that liking “curly fries” on Facebook correlates strongly with high intelligence</a>. Publications such as Wired <a href="http://www.wired.com/gadgetlab/2013/03/facebook-like-research/">have written about it</a>. Quid Founder and CEO Sean Gourley <a href="http://gigaom.com/2013/03/20/data-science-is-not-enough-we-need-data-intelligence-too/">cited it during a presentation at Structure: Data</a> last week. A faction of the European Union parliament even <a href="http://www.socialistsanddemocrats.eu/gpes/public/detail.htm?id=138038&amp;section=NER&amp;category=NEWS&amp;startpos=6&amp;topicid=-1&amp;request_locale=EN">pointed to the study</a> as yet another reason to prohibit data mining by web companies.</p>
<p>However, if you’re like me, hearing anybody repeat that curly fries data point as fact likely sends shiver down your spine. It’s not that it’s not true — it very well might be — but that it’s nearly useless information without more background.</p>
<p>That’s right, the old correlation versus causation argument is front and center once again. In all the big data world, it’s probably the biggest fallacy there is, no matter how you look at it. No, getting value from big data always doesn’t require giving greater credence to correlation than causation. And, no, relying on correlation isn’t inherently some sort of an ethically or scientifically questionable practice.</p>
<p>Really, the choice between relying on correlation or striving to find causation probably depends on what you’re trying to do.</p>
<h2 id="when-theres-nothing-at-stake-c">When there’s nothing at stake, correlate away</h2>
<p>Let’s be honest: If all I’m concerned with doing is boosting clickthroughs, selling more products or <a href="http://gigaom.com/2012/06/14/netflix-analyzes-a-lot-of-data-about-your-viewing-habits/">predicting the movies you want to see</a>, correlations probably will work just fine. I don’t really care why, for example, <a href="http://gigaom.com/2011/11/22/big-data-reveals-mac-users-book-pricier-hotels/">Mac users book more-expensive rooms on Orbitz</a> — I just care that they do.</p>
<p>You visit my site, my system sees you’re using a Mac (or that you like curly fries, or any other attribute it can associate with you) and it shows you content that it thinks you’ll want to see. It’s not a perfect approach, but it’s probably a far cry better than the old method of just showing everybody the exact same content.</p>
<p>And when you’re collecting potentially petabytes of user data and trying to serve ads in near real time, strong correlations might be about the best things you can hope to find. It’s a volume-and-velocity business, and heavy examinations of why any two (or more) things are related to one another might not always provide a high return on investment.</p>
<p>A more extreme example of when correlations might suffice would be something like machine-to-machine systems that need to make decisions in real-time in order to prevent disasters. The people charged with running these systems might not know why a certain series of events often precedes a particular outcome, but it’s better safe than sorry.</p>
<h2 id="you-cant-make-a-difference-or-">You can’t make a difference — or real decisions — with correlations</h2>
<p>But if you’re trying to use big data to make a meaningful difference in the word or to make decisions that can have significant real-world consequences, mere correlations probably won’t cut it. This is what Evgeny Morozov <a href="http://www.nytimes.com/2013/03/24/opinion/sunday/morozov-imprisoned-by-innovation.html">warns about in relation to crime in a recent <em>New York Times </em>column</a>. It’s what Gourley had in mind when talking about <em>data science </em>versus <em>data intelligence</em>. It’s why the current discussion around machine learning <a href="http://gigaom.com/2013/03/20/its-not-skynet-yet-in-machine-learning-theres-still-a-role-for-humans/">almost always includes a human aspect</a>, as well.</p>
<p>Many of the reasons for not acting on correlations alone are based on privacy and a whole collection of civil, constitutional and human rights. You simply can’t profile and then arrest, for example, people based on what their Likes suggest they might be. You probably shouldn’t make decisions about <a href="http://gigaom.com/2012/11/19/where-machine-learning-and-human-artistry-meet-your-wallet/">people’s financial</a>, health or general well being based on mere correlations, either.</p>
<p>Heck, I wouldn’t even serve ads that delve into personal information such as health, sexual orientation or intelligence without a very strong reason to believe I was accurate (and express consent to serve those ads). And the Facebook-curly-fries study is <a href="http://www.pnas.org/content/suppl/2013/03/07/1218772110.DCSupplemental/st01.pdf">full of correlations that could be potential landmines</a>, a small portion of which are visible in the chart below.</p>
<div id="attachment_624177" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/03/curly-fries.jpg"><img alt='More correlations from the "curly fries" study. Source: Proceedings of the National Academy of Sciences' src="http://gigaom2.files.wordpress.com/2013/03/curly-fries.jpg?w=708&#038;h=600" width="708" height="600" class="size-large wp-image-624177"></a><p class="wp-caption-text">More correlations from the “curly fries” study. Source: Proceedings of the National Academy of Sciences</p></div>
<p>But these are all situations where the fear of incorrectly profiling someone occasionally — and being sued as a result — might overpower the desire to do good most of the time. The <a href="http://gigaom.com/2013/03/17/uber-data-darwinism-and-the-future-of-work/">data Darwinism that my colleague Om Malik wrote about recently</a> extends beyond just peer reviews and social-media ratings, and one shouldn’t take the role of playing God (or catalyst for evolutionary change, to continue the Darwin metaphor) lightly.</p>
<p>Sometimes, though, correlations aren’t enough because you really want to solve a problem or perhaps build a great product. As Gourley explained at Structure: Data, even using correlative data to predict insurgent attacks in a place like Iraq is relatively easy, but predicting the likelihood of events doesn’t stop them. Stopping them requires really understanding and addressing the root causes of the attacks.</p>
<p>The <a href="http://pro.gigaom.com/blog/why-the-next-front-in-big-data-might-be-psychological?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=622699+liking-curly-fries-might-not-mean-youre-smart-when-correlation-isnt-enough&amp;utm_content=dharrisstructure">same goes for stopping disease outbreaks, figuring out why programmers make more mistakes during certain seasons</a>, <a href="http://gigaom.com/2012/12/28/maybe-big-data-can-quell-gun-violence-but-not-in-the-way-you-think/">stopping gun violence</a>, or just capitalizing on that knowledge about curly fries or hotel-room bookers in order to build products that touch upon the deeper rationales for liking those things. You can fight the symptoms, so to speak, or you can cure the disease.</p>
<p>So feel free to try selling the next guy you see eating curly fries on a documentary about Dostoevsky, but don’t expect him to care. It might be that there’s some strong connection between curly fries and intelligence; of course, it might also be that intelligent people — entirely coincidentally — tend to live within walking distances of an Arby’s. But no one has asked about that.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-518107p1.html">Shutterstock user Tobias Arhelger</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=622699&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=954017"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=954017" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=622699+liking-curly-fries-might-not-mean-youre-smart-when-correlation-isnt-enough&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2011/11/connected-world-the-consumer-technology-revolution/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=622699+liking-curly-fries-might-not-mean-youre-smart-when-correlation-isnt-enough&utm_content=dharrisstructure">Connected world: the consumer technology revolution</a></li><li><a href="http://pro.gigaom.com/2012/01/newnet-q4-platform-mania-and-social-commerce-shakeout/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=622699+liking-curly-fries-might-not-mean-youre-smart-when-correlation-isnt-enough&utm_content=dharrisstructure">NewNet Q4: Platform mania and social commerce shakeout</a></li><li><a href="http://pro.gigaom.com/2012/01/newnet-q4-platform-mania-and-social-commerce-shakeout/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=622699+liking-curly-fries-might-not-mean-youre-smart-when-correlation-isnt-enough&utm_content=dharrisstructure">NewNet Q4: Platform mania and social commerce shakeout</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/03/25/liking-curly-fries-might-not-mean-youre-smart-when-correlation-isnt-enough/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/03/shutterstock_125949212.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/03/shutterstock_125949212.jpg?w=150" medium="image">
			<media:title type="html">curly fries</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/curly-fries.jpg?w=708" medium="image">
			<media:title type="html">More correlations from the &#34;curly fries&#34; study. Source: Proceedings of the National Academy of Sciences</media:title>
		</media:content>
	</item>
		<item>
		<title>ESPN should just hire Nate Silver already</title>
		<link>http://gigaom.com/2013/03/25/espn-should-just-hire-nate-silver-already/</link>
		<comments>http://gigaom.com/2013/03/25/espn-should-just-hire-nate-silver-already/#comments</comments>
		<pubDate>Mon, 25 Mar 2013 20:30:28 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[nate silver]]></category>
		<category><![CDATA[NCAA tournament]]></category>
		<category><![CDATA[SAP]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=623772</guid>
		<description><![CDATA[The first week of the NCAA tournament is in, and the results would suggest that Nate Silver is yet again the man when it comes to predicting the things Americans care about.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=623772&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><em>This story was corrected at 4:17 p.m. because the author incorrectly stated that Michigan State, one of the Final Four teams selected by the SAP model, had been eliminated from the tournament.<b><br />
</b></em></p>
<p>OK, so your NCAA tournament bracket has officially been busted. Don&#8217;t feel so bad. ESPN college basketball analyst Jay Bilas, <a href="http://gigaom.com/2012/11/07/why-nate-silver-and-others-predicted-the-election-perfectly/">stat-geek superstar Nate Silver</a> and even SAP&#8217;s vaunted predictive analytics software all missed the upsets, too. So did President Obama.</p>
<p>Three of the four correctly picked 11 of the Sweet 16 teams, while Bilas correctly chose 10. But despite the similariy in results between men and models, I&#8217;d follow Silver&#8217;s model-based forecast every time. Not only is it accurate, but it stands to make people a lot of money.</p>
<p>Just to be clear, though, Silver doesn&#8217;t actually pick winners and losers (at least not publicly, as far as I can tell). Rather, he <a href="http://fivethirtyeight.blogs.nytimes.com/2011/03/14/how-we-made-our-n-c-a-a-picks/">uses a model that takes into account a number of variables</a> &#8212; including a handful of popular computer rankings &#8212; and produces the probability of each team advancing through each round of the tournament. That&#8217;s what makes his forecast so effective if you&#8217;re a betting man: It&#8217;s easy enough to pick the winner and most of the final four by just choosing the top seeds (I&#8217;m looking at you, POTUS), but the way to accel past everyone else in points is to spot the Cinderellas.</p>
<p>If I were ESPN, I&#8217;d pay Silver a boatload of money to come on TV once a year and present his forecast to a March-Madness-obsessed nation. I&#8217;m fairly certain the network could extend the broadcast out to about three hours and charge Super-Bowl-like advertising rates. Here&#8217;s why.</p>
<h2 id="its-the-probabilities-stupid">It&#8217;s the probabilities, stupid</h2>
<p>As I was saying, anyone, including Silver, can spot the best teams in the tournament by watching enough basketball, settling on some important data points to analyze or just following the NCAA&#8217;s seeding. Here are the seeds my experts, data analysts and the leader of the free world chose for the Sweet 16:</p>
<ul>
<li><a href="http://insider.espn.go.com/mens-college-basketball/tournament/2013/story/_/id/9065660/jay-bilas-bracket-depth-pick-pick-2013-ncaa-tournament-advice">Bilas</a>: 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 5, 5</li>
<li><a href="http://games.espn.go.com/tournament-challenge-bracket/en/entry?entryID=4267886">Obama</a>: 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 5</li>
<li><a href="http://scn.sap.com/community/visual-intelligence/blog/2013/03/25/analysis-of-ncaa-march-madness-round-of-64-and-32">SAP</a>: 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 5, 5, 6, 7, 8</li>
<li><a href="http://fivethirtyeight.blogs.nytimes.com/2013/03/18/parity-in-n-c-a-a-means-no-commanding-favorite/">Silver</a>: 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 5</li>
</ul>
<p>Here <del datetime="2013-03-25T20:08:14+00:00"></del>are the actual seeds that advanced to the Sweet 16: 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 6, 9, 12, 13, 15.</p>
<div id="attachment_624020" class="wp-caption alignright" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/03/sapdatageekbracket2013.jpg"><img  alt="SAP's mid-seed-heavy bracket" src="http://gigaom2.files.wordpress.com/2013/03/sapdatageekbracket2013.jpg?w=300&#038;h=210" width="300" height="210" class="size-medium wp-image-624020" /></a><p class="wp-caption-text">SAP&#8217;s mid-seed-heavy bracket</p></div>
<p>The smart money is always on the higher seeds from a pure probability standpoint (although I have no idea how SAP built its model to get so many 5-8 seeds in the Sweet 16). But strange things can, and often do, happen in the NCAA tournament. This year, those strange things are called Wichita State (9-seed), Oregon (12-seed), LaSalle (13-seed) and Florida Gulf Coast University (15-seed).</p>
<p>So why am I so high on Silver if his Sweet 16 probabilities were just as off-base as the two non model-based human brackets and SAP&#8217;s model-based picks? Because if I were looking for a few upsets, he might have helped me spot them. Here some of his notable projections for lower-seeded teams most likely to advance:</p>
<ul>
<li>Arizona (6-seed): 38.1 percent of reaching the Sweet 16 &#8212; they made it (SAP picked this correctly)</li>
<li>Florida Gulf Coast (15-seed): 3.3 percent chance of making the Sweet 16 &#8212; they made it</li>
<li>Oregon (12-seed): 17.5 percent chance of making the Sweet 16 &#8212; they made it</li>
<li>Minnesota (11-seed): 61.9 percent chance of winning its first game &#8212; it won (Bilas, SAP and Obama picked this, too)</li>
<li>California (12-seed): 32.8 percent chance of winning its first game &#8212; it won (SAP picked this correctly)</li>
</ul>
<p><a href="http://gigaom2.files.wordpress.com/2013/03/538bracket-3-blog480.png"><img  alt="538bracket-3-blog480" src="http://gigaom2.files.wordpress.com/2013/03/538bracket-3-blog480.png?w=708"   class="aligncenter size-full wp-image-624008" /></a></p>
<p>And in my neck of the woods &#8212; Las Vegas &#8212; being smarter than the sportsbooks means big money. No. 12 Oregon and No. 13 LaSalle didn&#8217;t really sneak up on the oddsmakers (60-1 and 100-1 odds to make the Final Four, respectively), but No. 15 Florida Gulf Coast <a href="http://www.lasvegassun.com/blogs/talking-points/2013/mar/19/ncaa-tournament-odds-how-sports-books-see-south-re/">is paying out 1,000-1 should it reach the Final Four</a>.</p>
<p>I wouldn&#8217;t count on that happening, though. Silver <a href="http://www.nytimes.com/interactive/2013/03/18/sports/ncaabasketball/nate-bracket.html?_r=0">now gives those teams a 1 percent, 5.1 percent and 0.8 percent chance, respectively</a>, of making the Final Four. Louisville, Florida and Indiana look like locks to make it, and one of them should win the tournament.</p>
<h2 id="men-vs-models-lets-call-it-a-d">Men vs. models: Let&#8217;s call it a draw</h2>
<p>If you&#8217;re looking at these selections as some sort of man-versus-machine competition, I don&#8217;t think you&#8217;ll find a clear winner. Although Silver comes out looking the best of the four brackets I analyzed, his projections aren&#8217;t that much different than Bilas&#8217;s picks. <del>And although SAP&#8217;s picks fall apart in the end &#8212; two of its Final Four selections (including its national champion pick) are out &#8212; it did correctly pick a couple upsets.</del> President Obama, well, he pretty much picked chalk. The jury is still out on SAP&#8217;s model, which has three Final Four picks alive but made what seems like a risky choice by choosing Michigan State over Louisville.</p>
<p>The better way to look at these results is probably as further evidence that man and machine need to work together more closely, something <a href="http://gigaom.com/2013/03/22/5-ways-big-data-is-going-to-blow-your-mind-and-change-your-world/">we highlighted heavily at our Structure: Data conference last week.</a> Men create models, but men probably don&#8217;t crunch the numbers. And when there&#8217;s pride or money on the line, knowing which No. 15 seed has the best chances of making a run is probably what matters most.</p>
<p>Your chances of picking every upset without a little help: not good at all.</p>
<span class='embed-youtube' style='text-align:center; display: block;'><iframe class='youtube-player' type='text/html' width='604' height='370' src='http://www.youtube.com/embed/O6Smkv11Mj4?version=3&#038;rel=1&#038;fs=1&#038;showsearch=0&#038;showinfo=1&#038;iv_load_policy=1&#038;wmode=transparent' frameborder='0'></iframe></span>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=623772&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=874237"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=874237" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=623772+espn-should-just-hire-nate-silver-already&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/10/helix-nebula-and-the-future-of-europes-cloud/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=623772+espn-should-just-hire-nate-silver-already&utm_content=dharrisstructure">Helix Nebula and the future of Europe&#8217;s cloud</a></li><li><a href="http://pro.gigaom.com/2012/09/listening-platforms-finding-the-value-in-social-media-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=623772+espn-should-just-hire-nate-silver-already&utm_content=dharrisstructure">Listening platforms: finding the value in social media data</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=623772+espn-should-just-hire-nate-silver-already&utm_content=dharrisstructure">A near-term outlook for big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/03/25/espn-should-just-hire-nate-silver-already/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/03/screen-shot-2013-03-10-at-5-09-13-pm.png?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/03/screen-shot-2013-03-10-at-5-09-13-pm.png?w=150" medium="image">
			<media:title type="html">Nate Silver FiveThirtyEight SXSW speaking election data</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/sapdatageekbracket2013.jpg?w=300" medium="image">
			<media:title type="html">SAP&#039;s mid-seed-heavy bracket</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/538bracket-3-blog480.png" medium="image">
			<media:title type="html">538bracket-3-blog480</media:title>
		</media:content>
	</item>
		<item>
		<title>Statwing wants to make your data &#8212; and armchair quarterback &#8212; dreams come true</title>
		<link>http://gigaom.com/2013/03/17/statwing-wants-to-make-your-data-and-armchair-quarterback-dreams-come-true/</link>
		<comments>http://gigaom.com/2013/03/17/statwing-wants-to-make-your-data-and-armchair-quarterback-dreams-come-true/#comments</comments>
		<pubDate>Sun, 17 Mar 2013 17:00:06 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[cloud services]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[NFL]]></category>
		<category><![CDATA[sports]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[Statwing]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=621137</guid>
		<description><![CDATA[Many of us yell at the TV while watching our favorite sports teams. Many of us also want to get better at working with data. Statwing thinks it can help with both.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=621137&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>There’s nothing quite like getting settling into the couch on Sunday afternoon (or morning on west coast), cracking open a beer and yelling at a football coach who gets paid millions of dollars a year to do his job. After all, the guy’s clearly an idiot. Who would run it up the middle on third down and eight? And why does the team still punt the ball all the time? You never punt in <em>Madden NFL</em>, and you win all the time.</p>
<p>You probably think I’m being sarcastic, but I’m not. Statistically speaking, football teams should go for it more often, they shouldn’t run on third and long and they’re almost certainly better off going for two-point conversions. The guys behind Statwing <a href="http://blog.statwing.com/nfl-play-by-play-data-analyzed-visualized-and-quizzified/">laid it all out in a blog post on Monday</a>. What’s more, they’ve uploaded an entire data set of NFL statistics to their service that users can play around with for free to analyze a huge number of occurrences and correlations.</p>
<div id="attachment_621280" class="wp-caption aligncenter" style="width: 582px"><a href="http://gigaom2.files.wordpress.com/2013/03/likelihood-of-getting-1st-down-by-play-type2.png"><img alt="From the blog. One of countless analyses available with the data set." src="http://gigaom2.files.wordpress.com/2013/03/likelihood-of-getting-1st-down-by-play-type2.png?w=708"   class="size-full wp-image-621280"></a><p class="wp-caption-text">From the blog. One of countless analyses available with the data set.</p></div>
<h2 id="its-all-about-democratizing-da">It’s all about democratizing data</h2>
<p><a href="https://www.statwing.com/">Statwing</a>, you might recall, is <a href="http://gigaom.com/2013/01/31/data-for-dummies-5-data-analysis-tools-anyone-can-use/">one of the “data for dummies” tools</a> I highlighted in a January post about advanced analytics tools so simple anyone can use them. Right now, it’s one of the simplest there is. Here’s how I described Statwing then — although it actually performs more types of analyses:</p>
<blockquote id="quote-you-upload-data-chec"><p>“You upload data, check the variables you’re concerned with, and it plots their relationship. (It also can describe the variables by highlighting the sample size, minimum, maximum, mean, median and standard deviation.) Graphs are accompanied by explanations as to how strong the correlation is based on various statistical metrics, as well as the results of a linear regression model.”</p></blockquote>
<div id="attachment_621285" class="wp-caption alignright" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/03/laughlin-copy.jpg"><img alt="Greg Laughlin (courtesy of his Twitter profile)." src="http://gigaom2.files.wordpress.com/2013/03/laughlin-copy.jpg?w=300&#038;h=201" width="300" height="201" class="size-medium wp-image-621285"></a><p class="wp-caption-text">Greg Laughlin (courtesy of his Twitter profile).</p></div>
<p>The ease of use is by design, says Statwing co-founder Greg Laughlin. “There’s a general zeitgeist that people should care about data now,” he told me during a recent call, but they don’t always know to get started or really even see how all the hype around data relates to them. Early on its existence, Statwing is trying to answer both of those concerns by building an easy-to-use service that also happens to teach users about statistics, and by offering up some interesting data sets for people to play around with.</p>
<p>The latter part is easy, but valuable. Data sets like the NFL data or one about the Titanic’s passengers let other people into the data game and get them thinking statistically. They get people saying, “‘Oh, I grok that. I see how this interesting, I see how this is useful,’” Laughlin explained.</p>
<p>Building a data-analysis service that’s actually usable by mere mortals is a bit tougher. At its core, Statwing relies on a rules engine that considers the type of data uploaded and the types of variables (a maximum of two right now) a user wants to relate to each other. It can handle between 10 and 15 different analyses right now depending on how one defines them, Laughlin said, but at any rate they’re the ones used most often.</p>
<p>He credits Cloudera co-founder and Chief Scientist Jeff Hammerbacher (with whom, along with Greylock’s DJ Patil, I’ll be doing a fireside chat at <a href="http://event.gigaom.com/structuredata/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=621137+statwing-wants-to-make-your-data-and-armchair-quarterback-dreams-come-true&amp;utm_content=dharrisstructure">Structure: Data</a> on Thursday) with helping Statwing decide to make the rules engine the service’s core.</p>
<p>That has been a wise decision because it lets lay users get what they need out of the service without worrying about the underlying functions. Statwing has users that never click the “advanced” tab that shows the statistical breakdown, Laughlin said. They just use the service, essentially, as a faster way of making charts than using Microsoft Excel, and the headline stating whether or not there’s a statistically significant relationship is all the info they need.</p>
<p>“That’s really exciting for us,” Laughlin said. “… It’s giving them the power of stats without them having to think about it.”</p>
<p> </p>
<div id="attachment_621283" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/03/2ptcon.jpg"><img alt="Just one view of how Statwing presents results." src="http://gigaom2.files.wordpress.com/2013/03/2ptcon.jpg?w=708&#038;h=126" width="708" height="126" class="size-large wp-image-621283"></a><p class="wp-caption-text">Just one view of how Statwing presents results.</p></div>
<h2 id="paying-the-bills-with-bigger-u">Paying the bills with bigger users</h2>
<p>Of course, a startup can’t survive on free and unsophisticated users alone, so Statwing is ramping up its money-making efforts. For example, it has “just turned on the paywall in a really light way” by “maybe” charging really heavy users, Laughlin said. In the future, though, Statwing wants to add support for more variables and larger data sets (there’s a 5MB limit right now), and perhaps build in some predictive analytics.</p>
<p>“That kind of analysis is really powerful, really extensible,” he noted.</p>
<p>As the service grows, Laughlin sees the ideal paying user being someone who currently has to use statistical-analysis software like SPSS or R, but who doesn’t really go beyond the basic functions. That type of user has real business need for the software, he explained, but they don’t need all the complexity and arcane statistics dressing that comes along with that that type of product.</p>
<p>Some people don’t want advanced analytics democratized, Laughlin added, because they think people can’t ask the right questions. On the contrary, Statwing’s theory is that most people just struggle with the logistics of cleaning and formatting data and then knowing the terminology associated with the business questions they want to ask.</p>
<h2 id="back-to-football">Back to football …</h2>
<p>But forget business users — when will football coaches start caring about statistics?! Maybe not any time soon. Laughlin said a friend of his who works on the MIT Sloan Sports Analytics Conference sees a lot of interest in analytics from the higher levels in sports organizations, but noted that anecdotal evidence suggests most coaches aren’t too interested in letting data influence their decisions too heavily.</p>
<p>Think of a situation like fourth down and goal on the two-yard-line as akin to a CIO choosing between Oracle and some new whizbang database. Nobody ever got fired for buying Oracle, and nobody ever got fired for kicking a field goal.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=621137&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=715695"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=715695" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=621137+statwing-wants-to-make-your-data-and-armchair-quarterback-dreams-come-true&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=621137+statwing-wants-to-make-your-data-and-armchair-quarterback-dreams-come-true&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/01/newnet-q4-platform-mania-and-social-commerce-shakeout/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=621137+statwing-wants-to-make-your-data-and-armchair-quarterback-dreams-come-true&utm_content=dharrisstructure">NewNet Q4: Platform mania and social commerce shakeout</a></li><li><a href="http://pro.gigaom.com/2012/01/newnet-q4-platform-mania-and-social-commerce-shakeout/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=621137+statwing-wants-to-make-your-data-and-armchair-quarterback-dreams-come-true&utm_content=dharrisstructure">NewNet Q4: Platform mania and social commerce shakeout</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/03/17/statwing-wants-to-make-your-data-and-armchair-quarterback-dreams-come-true/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/03/laughlin-copy.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/03/laughlin-copy.jpg?w=150" medium="image">
			<media:title type="html">laughlin copy</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/likelihood-of-getting-1st-down-by-play-type2.png" medium="image">
			<media:title type="html">From the blog. One of countless analyses available with the data set.</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/laughlin-copy.jpg?w=300" medium="image">
			<media:title type="html">Greg Laughlin (courtesy of his Twitter profile).</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/2ptcon.jpg?w=708" medium="image">
			<media:title type="html">Just one view of how Statwing presents results.</media:title>
		</media:content>
	</item>
		<item>
		<title>For big data analytics, recall the tried and true old-school rules</title>
		<link>http://gigaom.com/2013/03/08/for-big-data-analytics-recall-the-tried-and-true-old-school-rules/</link>
		<comments>http://gigaom.com/2013/03/08/for-big-data-analytics-recall-the-tried-and-true-old-school-rules/#comments</comments>
		<pubDate>Fri, 08 Mar 2013 18:27:21 +0000</pubDate>
		<dc:creator>Jordan Novet</dc:creator>
				<category><![CDATA[big data analytics]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[Structure Data 2013]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=618509</guid>
		<description><![CDATA[As companies implement big data analytics strategies, they ought to consider some of the best practices in place before the rise of the term "big data."<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=618509&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Data analysis didn’t start with <a href="http://gigaom.com/2013/03/08/hadoop-through-the-years-a-gigaom-retrospective">Hadoop</a>. Companies have been working with data to get insights for decades. While technology has changed, some of the rules from the past still apply, or ought to, as data gets bigger and bigger.</p>
<p>Jack Rivkin, an occasional blogger with deep investment experience, recently <a href="http://blog.contracarbon.com/2013/02/18/what-is-the-big-deal-about-big-data/">shared some of the best practices </a>he was exposed to early in his career working on economic forecasts. He shared some sage suggestions for enterprises to bear in mind as they consider and implement big data strategies. Among his insights:</p>
<ul><li>Forecasting models can only be as good as the data inputs.</li>
<li>Be skeptical and hedge when sharing the models by noting factors that could lead to different results.</li>
<li>The less time it takes to process data, the more valuable it is.</li>
<li>Constantly improve models and inputs.</li>
</ul><p>Of course, big data isn’t wholly evolutionary — it does bring its own all-new opporunities and risks. Some of the world’s leading data scientists, IT executives and business users will address them at <a href="http://event.gigaom.com/structuredata/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=618509+for-big-data-analytics-recall-the-tried-and-true-old-school-rules&amp;utm_content=gigajordan">GigaOM’s Structure:Data conference</a> in New York on March 20-21.</p>
<p><em>Feature image courtesy of <a href="http://www.flickr.com/photos/75279887@N05/6914441342/">Flickr user luckey_sun</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=618509&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=448920"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=448920" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=618509+for-big-data-analytics-recall-the-tried-and-true-old-school-rules&utm_content=gigajordan">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2011/12/whats-driving-the-next-phase-of-the-e-commerce-evolution/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=618509+for-big-data-analytics-recall-the-tried-and-true-old-school-rules&utm_content=gigajordan">What&#8217;s driving the next phase of the e-commerce evolution</a></li><li><a href="http://pro.gigaom.com/2013/01/ces-2013-flash-analysis-disruptions-and-disappointments-from-consumer-techs-biggest-show/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=618509+for-big-data-analytics-recall-the-tried-and-true-old-school-rules&utm_content=gigajordan">GigaOM Research highs and lows from CES 2013</a></li><li><a href="http://pro.gigaom.com/2013/01/how-hr-can-make-the-case-for-workforce-analytics/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=618509+for-big-data-analytics-recall-the-tried-and-true-old-school-rules&utm_content=gigajordan">How HR can make the case for workforce analytics</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/03/08/for-big-data-analytics-recall-the-tried-and-true-old-school-rules/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/03/luckey_sun-data.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/03/luckey_sun-data.jpg?w=150" medium="image">
			<media:title type="html">Luckey_sun data</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/c00ab753df107b639e76ed4c3ab07ba7?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigajordan</media:title>
		</media:content>
	</item>
		<item>
		<title>Google&#8217;s flu snafu and the reliability of web data</title>
		<link>http://gigaom.com/2013/02/14/googles-flu-snafu-and-the-reliability-of-web-data/</link>
		<comments>http://gigaom.com/2013/02/14/googles-flu-snafu-and-the-reliability-of-web-data/#comments</comments>
		<pubDate>Thu, 14 Feb 2013 21:54:55 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[flu]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[health care]]></category>
		<category><![CDATA[polling]]></category>
		<category><![CDATA[search engine]]></category>
		<category><![CDATA[social media]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[surveys]]></category>
		<category><![CDATA[Twitter]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=610607</guid>
		<description><![CDATA[Google Flu Trends significantly overestimated the number of Americans afflicted with flu-like symptoms during the season's peak a couple months ago, but assuring accuracy is a big part of the puzzle any time we're talking about web data.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=610607&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>The web is full of data &#8212; much of it meaningful &#8212; but there&#8217;s some question as to how much we should actually rely on it. The latest evidence comes at Google&#8217;s expense, with some researchers questioning the validity of <a href="http://www.google.org/flutrends/us/#US">Google&#8217;s Flu Trends</a> algorithm. They say the service, which estimates the number of flu cases around the world by analyzing trends on Google&#8217;s search engine, vastly overestimated this year&#8217;s season in the United States compared with more-traditional methods of measuring flu cases.</p>
<p>But this snafu is just a microcosm of a broader debate over how much stock we should put in web and social media data, and in what cases it&#8217;s most valid. It&#8217;s hard to figure out how much we should value speed and scale over quality of data. Millions of (presumably) younger people proactively searching or tweeting about a topic provides a huge and theoretically unbiased dataset, while traditional methods of phone calls or focus groups reach a smaller number of (presumably) older people who know they&#8217;re being observed, but who also are answering questions directly relevant to the research at hand.</p>
<h2 id="whos-more-accurate-google-twit">Who&#8217;s more accurate: Google, Twitter or your neighbors?</h2>
<p>The exact details of the discrepancy are <a href="http://www.nature.com/news/when-google-got-flu-wrong-1.12413">explained in a <em>Nature</em> article published on Wednesday,</a> but it appears to be a case of a lot of data that didn&#8217;t mean what Google thought it meant. Google&#8217;s search data covers almost the entirety of the web-surfing world and, in theory, can see outbreaks coming before they hit because it can watch the flu-related searches intensify in volume in real time. The Centers for Disease Control and Prevention says Google Flu Trends usually tracks very closely with its own data and can deliver results days faster, <em>Nature</em> writer Declan Butler reported.</p>
<p>Researchers think this year&#8217;s discrepancy might have something to do with hyped-up media reports leading to a volume of web searches for flu-related terms that was disproportionate &#8212; almost double, nationwide &#8212; to the actual number of cases. The CDC claims about 6 percent of the U.S. population was affected with flu-like symptoms during the peak period.</p>
<div id="attachment_610917" class="wp-caption aligncenter" style="width: 602px"><a href="http://gigaom2.files.wordpress.com/2013/02/flu-copy.jpg"><img  alt="flu copy" src="http://gigaom2.files.wordpress.com/2013/02/flu-copy.jpg?w=708"   class="size-full wp-image-610917" /></a><p class="wp-caption-text">Google estimated more than 10 percent of the U.S. population had flu-like symtoms.</p></div>
<p>On the other hand, one project called <a href="https://flunearyou.org/">Flu Near You</a>, which relies on volunteers to report cases of flu among their friends and family, estimated a number closer to (albeit lower than) the CDC&#8217;s official statistics, perhaps because the data is based on clinical definitions of &#8220;influenza&#8221; and relies on people expressly reporting known cases. However, Flu Near You claims less than 45,000 participants and, according to <em>Nature</em>, covers only 70,000 people.</p>
<div id="attachment_610922" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/02/flunearyou1.jpg"><img  alt="flunearyou1" src="http://gigaom2.files.wordpress.com/2013/02/flunearyou1.jpg?w=708&#038;h=299" width="708" height="299" class="size-large wp-image-610922" /></a><p class="wp-caption-text">Flu Like You&#8217;s peak estimate was abour 4.5 percent of the population.</p></div>
<p>Responding to my inquiry about the discrepancy, a Google spokesperson sent the following statement:</p>
<blockquote id="quote-flu-trends-is-meant-"><p>&#8220;Flu Trends is meant to be a complementary tool to the surveillance systems used by the CDC. Since its initial launch in 2008 and through this flu season, Flu Trends has accurately predicted the start and peak time of flu season. However, this season our models estimated a higher influenza like illness rate than the Centers for Disease Control in some regions. As we do each year, we will be performing a model analysis and potential model update to improve the accuracy of the tool.&#8221;</p></blockquote>
<p>And while Google&#8217;s predictions might be prone to the undue influence of a fear-mongering media environment, CDC researcher Lyn Finelli told <em>Nature</em> she&#8217;s even more skeptical of efforts to track flu outbreaks using Twitter data. She cites a low signal-to-noise ratio and a population of largely young-adult users that doesn&#8217;t align with the country&#8217;s overall demographic makeup.</p>
<p>To the contrary, however, Johns Hopkins University computer scientist Michael Paul told <em>Nature</em> that he&#8217;s a big believer in Twitter data, especially because it generates a large dataset that&#8217;s less susceptible to sample errors than smaller-scale projects such as Flu Near You. He claims to have developed a model that can accurately track the flu using Twitter, something <a href="http://gigaom.com/2011/07/07/can-you-crowdsource-health-information-via-twitter/">a handful of other projects</a> are already working on.</p>
<h2 id="pollsters-struggle-with-the-we">Pollsters struggle with the web, too</h2>
<p>But flu statistics aside, questions over the validity of Twitter, Google and other web sites as data sources are nothing new. Last year, for example, I <a href="http://gigaom.com/2012/02/10/how-social-media-is-making-polling-obsolete/">profiled a company called the Dachis Group</a> that has devised a method for tracking companies&#8217; presences, buzz and sentiment on social media. It claims its algorithms for ranking the buzz around Super Bowl XLVI advertisers were far more accurate &#8212; or at least yielded drastically different results &#8212; than <em>USA Today</em>&#8216;s traditional AdMeter rankings of Super Bowl ads based on phone-based polling.</p>
<p>Although people appear generally willing to do away with phone surveys and other marketing-based polling efforts, there&#8217;s a lot more skepticism when it comes to using the web to predict political elections and gauge response to culturally popular events such as presidential debates or the Olympics. I <a href="http://gigaom.com/2012/10/02/why-the-trick-to-twitter-as-a-data-source-is-more-data/">covered both sides of the debate in October</a>, as pre-election fever was in full force and many people were atwitter about Twitter&#8217;s tweets-per-minute counts during the presidential debates. What side experts fall on seems to depend on how much they trust the demographics, the subjects themselves, the sample size and how well someone can actually analyze sentiment in text.</p>
<p>Even on Google, politics has proven that interest doesn&#8217;t necessarily signify intent. Leading up to the presidential election in November, Mitt Romney was trending quite a bit higher than Barack Obama in search volume. Election night, however, was a different story, with Obama winning in a landslide.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/02/obamrom.jpg"><img  alt="obamrom" src="http://gigaom2.files.wordpress.com/2013/02/obamrom.jpg?w=708&#038;h=285" width="708" height="285" class="aligncenter size-large wp-image-610916" /></a></p>
<p>Perhaps the best advice on how to deal with web data comes from Harvard epidemiologist John Brownstein, who told <em>Nature</em>, “You need to be constantly adapting these models, they don’t work in a vacuum. You need to recalibrate them every year.”</p>
<p>As web usage and users change along with the world around them, there&#8217;s really no guarantee that a single data point means the same thing or has the same effect from year to year. Even search is under attack by companies <a href="http://gigaom.com/2013/02/07/the-future-of-search-is-gravitational-content-will-come-to-you/">trying to proactively surface content</a> for consumers before they know to look for it.</p>
<p>When accuracy is paramount, no place &#8212; Twitter, Google, the telephone or the wisdom of crowds &#8212; is the holy grail; they&#8217;ll all have to play a role.</p>
<p><em>Feature image courtesy of the Centers for Disease Control and Prevention.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=610607&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=718366"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=718366" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=610607+googles-flu-snafu-and-the-reliability-of-web-data&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=610607+googles-flu-snafu-and-the-reliability-of-web-data&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2011/11/connected-world-the-consumer-technology-revolution/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=610607+googles-flu-snafu-and-the-reliability-of-web-data&utm_content=dharrisstructure">Connected world: the consumer technology revolution</a></li><li><a href="http://pro.gigaom.com/2012/09/listening-platforms-finding-the-value-in-social-media-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=610607+googles-flu-snafu-and-the-reliability-of-web-data&utm_content=dharrisstructure">Listening platforms: finding the value in social media data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/02/14/googles-flu-snafu-and-the-reliability-of-web-data/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/02/b00526_h1n1_flu_med.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/02/b00526_h1n1_flu_med.jpg?w=150" medium="image">
			<media:title type="html">flu virus</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/flu-copy.jpg" medium="image">
			<media:title type="html">flu copy</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/flunearyou1.jpg?w=708" medium="image">
			<media:title type="html">flunearyou1</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/obamrom.jpg?w=708" medium="image">
			<media:title type="html">obamrom</media:title>
		</media:content>
	</item>
		<item>
		<title>Why big data matters and data-ism doesn&#8217;t</title>
		<link>http://gigaom.com/2013/02/06/why-big-data-matters-and-data-ism-doesnt/</link>
		<comments>http://gigaom.com/2013/02/06/why-big-data-matters-and-data-ism-doesnt/#comments</comments>
		<pubDate>Thu, 07 Feb 2013 03:00:40 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[Visualization]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=608132</guid>
		<description><![CDATA[Not all data analysis is created equal, and understanding the difference is critical as our society places a greater value on listening to the data. Using big data to cure disease is one thing, using statistics to ruin my sports-watching is quite another.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=608132&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>There has been something of a data backlash happening lately, and I think I’ve figured out why: Data for the sake of data has a tendency to sanitize experiences we’d rather leave a little bit dirty. But there’s a big, meaningful difference that’s worth knowing between <em>big data </em>and <em>just plain data.</em></p>
<p>David Brooks’s recent column in the <em>New York Times</em> is a good example of this. He <a href="http://www.nytimes.com/2013/02/05/opinion/brooks-the-philosophy-of-data.html?_r=0">coined the term “data-ism”</a> (which is quite apt) to describe a newfound penchant for reducing everything in our worlds into a number or statistic. Skeptical of this data worship, he is — rightfully — inclined to rebel.</p>
<p>But everything Brooks mentions in his article is really just statistics, the stuff academicians and businesspeople have been doing for years. It doesn’t take any revolutionary technological advances to measure the effect of political spending on campaign results or the idiosyncrasies in how a president speaks. At the best, these types of analyses are enlightening; at the worst, they’re overkill.</p>
<h2 id="data-can-be-an-unwelcome-disin">Data can be an unwelcome disinfectant</h2>
<p>Like Brooks’ pointer to a study about whether there’s such a thing as hot hand in basketball. Or a recent debate (that got an incredible amount of undeserved digital ink <a href="http://deadspin.com/5975490/h-y-and-z-as-concealed-weapons-we-apply-google+inspired-math-to-scrabbles-flawed-points-system">from Deadspin</a> and <a href="http://www.roughtype.com/?p=2499">Nick Carr</a>) about whether to adjust the points and frequency of Scrabble tiles based on what letters actually appear most in the English language. Right or wrong, who cares?</p>
<p>Unless you’re a professional gambler or in the sports business, sports are supposed to be fun; an escape from reality. Buying into things like hot hands, sweet spots, <a href="http://en.wikipedia.org/wiki/Curse_of_the_Billy_Goat">ancient curses</a> and <a href="http://books.google.com/books?id=uTu6L3c_YZQC&amp;pg=PA118&amp;lpg=PA118&amp;dq=dave+barry+%22concern+rays%22&amp;source=bl&amp;ots=_dLKWvUFix&amp;sig=fLxSgDYeJiZG2NlyCLXYeiJOvT8&amp;hl=en&amp;sa=X&amp;ei=wgkTUYbuA4i8iwLNzYCADQ&amp;ved=0CC0Q6AEwAA#v=onepage&amp;q=dave%20barry%20%22concern%20rays%22&amp;f=false">concern rays</a> (thanks, Dave Barry) are part of the rooting experience. If it wasn’t for <a href="http://sports.espn.go.com/espn/page2/story?page=easterbrook%2F090922&amp;sportCat=nfl">coaches’ insistence on punting on fourth down</a>, I could watch an entire football game and not think about probabilites once.</p>
<p>As for Scrabble, well, it’s a game and it’s fun. People like it as it is. What’s next, lobbying to change the distribution of resource cards in Settlers of Catan to account for the relative value of each given recent drought conditions?</p>
<span class="embed-youtube" style="text-align:center; display: block;"><iframe class="youtube-player" type="text/html" width="604" height="370" src="http://www.youtube.com/embed/h8Kgjid4-u0?version=3&amp;rel=1&amp;fs=1&amp;showsearch=0&amp;showinfo=1&amp;iv_load_policy=1&amp;wmode=transparent" frameborder="0"></iframe></span>
<p>Likewise, while Max Levchin’s vision of the future recently <a href="http://www.roughtype.com/?p=2718">had Nick Carr concerned about big brother</a> (read my colleague <a href="http://gigaom.com/2013/02/01/the-increasingly-blurry-line-between-big-data-and-big-brother/">Mathew Ingram’s take on it here</a>) my takeaway from Carr’s blog post was more about the threat of a sterilized world. Human beings are not rational actors, and many of us don’t want to be — regardless of what the data says. We buy enormous sodas even though we don’t finish them, we demand all-you-can-eat data plans <a href="http://www.engadget.com/2012/09/28/npd-android-users-chew-average-870mb-of-cellular-data-per-month/">even though we don’t consume that much data</a> and, directly addressing one of Levchin’s predictions, I bet many of us would willingly pay more for flat-rate auto insurance even if utility-style billing based on our real-time driving behavior would save us money.</p>
<p>Reducing the things we like — watching sports, eating, web surfing, driving — to data points ruins the experience of living carefree and exposes our optimistic anything-can-happen attitudes to a cold, surgical light. If I thought these were the pinnacle of data’s achievements, I’d rebel, too.</p>
<h2 id="datas-real-promise-is-innovati">Data’s real promise is innovation</h2>
<p>Thankfully, however, I’ve been lucky enough to spend my days speaking with some of the smartest data minds around and covering some truly revolutionary technologies. If there’s one thing I’ve learned, it’s that the real value of data isn’t just in uncovering statistical realities, but in finding methods for doing so where it was hitherto impossible and in creating entirely new products that change the way we interact with our world.</p>
<p>Big data is a technological revolution centered around <a href="http://gigaom.com/2012/02/06/what-it-really-means-when-someone-says-hadoop/">collecting, storing and processing</a> more data <a href="http://gigaom.com/2012/08/13/geospatial-big-data-startup-spacecurve-nets-another-3-5m/">of more types</a> than ever before. It’s also about doing all this stuff faster than ever before <a href="http://gigaom.com/2012/12/06/how-telcos-are-using-big-data-to-set-prices-and-maybe-make-bills-better/">as data streams in</a> from sensors, servers, Twitter, web surfing and however else we’re generating data. Data scientists are <a href="http://gigaom.com/2012/09/17/5-ideas-to-help-everyone-make-the-most-of-big-data/">thinking up clever ways to stitch this data together</a>, apply statistical techniques and do all sorts of things. They’re <a href="http://gigaom.com/2013/01/29/you-might-also-like-to-know-how-online-recommendations-work/">optimizing commerce</a>, <a href="http://gigaom.com/2012/07/20/hey-los-angeles-xerox-thinks-it-can-clear-traffic-on-i-10/">clearing traffic</a>, <a href="http://gigaom.com/2012/05/02/how-climate-corp-is-pitting-big-data-against-mother-nature/">insuring against inclement weather</a> and even <a href="http://gigaom.com/2013/01/16/has-ayasdi-turned-machine-learning-into-a-magic-bullet/">detecting genetic markers</a> that might lead to a cure for cancer.</p>
<div id="attachment_608305" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/02/climate_ch3-1_v2-1.jpg"><img alt="Climate Corporation's policies are based on some incredible data science." src="http://gigaom2.files.wordpress.com/2013/02/climate_ch3-1_v2-1.jpg?w=708&#038;h=298" width="708" height="298" class="size-large wp-image-608305"></a><p class="wp-caption-text">Climate Corporation’s policies are based on some incredible data science.</p></div>
<p>If you want to hear a lot more about what’s possible, come to our <a href="http://event.gigaom.com/structuredata/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=608132+why-big-data-matters-and-data-ism-doesnt&amp;utm_content=dharrisstructure">Structure: Data conference</a> March 20-21 in New York.</p>
<p>Yes, there’s some value to what David Brooks calls data-ism — there’s a lot to be learned simply from monitoring new data sources, and a renewed focus on visualization means interesting data is now presented in ways that <a href="http://gigaom.com/2013/01/31/data-for-dummies-5-data-analysis-tools-anyone-can-use/">anyone can and might actually want to digest</a>. But the real reason people are, or should be, excited about data is the promise of <a href="http://gigaom.com/2013/01/02/why-big-data-might-be-more-about-automation-than-insights/">doing important things faster and better</a> than previously possible (where those things were even possible before).</p>
<p>Talk to me when you’re able to predict a flu outbreak in real time based on <a href="http://gigaom.com/2012/11/18/why-better-traffic-data-means-more-than-just-a-faster-commute/">automobile traffic patterns</a>, <a href="http://gigaom.com/2012/01/02/12-smart-grid-startups-to-watch-in-2012/">smart grid data</a> on heater usage and an <a href="http://gigaom.com/2011/07/07/can-you-crowdsource-health-information-via-twitter/">uptick in illness references on Twitter</a>. If you just wanna tell me that, statistically speaking, chicken soup doesn’t actually appear to affect the longevity of the common cold, well, I think I’ll pass. Chicken soup makes me feel better.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-212179p1.html">Shutterstock user Jirsak</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=608132&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=947297"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=947297" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=608132+why-big-data-matters-and-data-ism-doesnt&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=608132+why-big-data-matters-and-data-ism-doesnt&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/09/listening-platforms-finding-the-value-in-social-media-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=608132+why-big-data-matters-and-data-ism-doesnt&utm_content=dharrisstructure">Listening platforms: finding the value in social media data</a></li><li><a href="http://pro.gigaom.com/2012/07/cloud-computing-and-trickle-down-analytics/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=608132+why-big-data-matters-and-data-ism-doesnt&utm_content=dharrisstructure">Cloud computing and trickle-down analytics</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/02/06/why-big-data-matters-and-data-ism-doesnt/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/02/shutterstock_73419799.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/02/shutterstock_73419799.jpg?w=150" medium="image">
			<media:title type="html">sanitizing</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/climate_ch3-1_v2-1.jpg?w=708" medium="image">
			<media:title type="html">Climate Corporation&#039;s policies are based on some incredible data science.</media:title>
		</media:content>
	</item>
		<item>
		<title>Data for dummies: 6 data-analysis tools anyone can use</title>
		<link>http://gigaom.com/2013/01/31/data-for-dummies-5-data-analysis-tools-anyone-can-use/</link>
		<comments>http://gigaom.com/2013/01/31/data-for-dummies-5-data-analysis-tools-anyone-can-use/#comments</comments>
		<pubDate>Thu, 31 Jan 2013 17:00:54 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[BigML]]></category>
		<category><![CDATA[cloud services]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[Infogram]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[Statwing]]></category>
		<category><![CDATA[tableau]]></category>
		<category><![CDATA[Visualization]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=599476</guid>
		<description><![CDATA[Not everyone is drowning in big data or has the know-how to deal with it if they were. Here are six free web services that help mere mortals analyze and visualize their own data.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=599476&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>If you care only about the cutting edge of machine learning and how to manage petabytes of big data, you might want to quit reading now and just come to our <a href="http://event.gigaom.com/structuredata/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=599476+data-for-dummies-5-data-analysis-tools-anyone-can-use&amp;utm_content=dharrisstructure">Structure:Data conference</a> in March. But if you’re a normal person dealing with mere normal data, you’ll probably want to stick around. Although your data might not be that big or complex, that doesn’t mean it isn’t worth looking at in a new light.</p>
<p>With that in mind, here are six of the best free tools I’ve come across for helping we mere mortals analyze our data without having to know too much about, well, anything (I’d keep an eye on <a href="http://gigaom.com/2012/05/31/data-hero-aims-to-turn-us-all-into-analytics-stars/">the still-under-wraps Datahero</a>, too). I’ve gathered some personal data and tracked down some interesting public data sets to help demonstrate what a novice can do with them. Someone with more skills can certainly do a lot more, and larger datasets will provide greater statistical significance.</p>
<h2 id="bigml">BigML</h2>
<p><a href="https://bigml.com/dashboard/sources">BigML</a> is to machine learning what <a href="http://www.bluemoonbrewingcompany.com/">Blue Moon</a> is to Belgian ales: a simple approach to something generally more complex — but also rather accessible and good enough to do the job in a pinch. I explained the service more thoroughly in recent post about it being <a href="http://gigaom.com/2013/01/25/how-to-succeed-on-kickstarter-find-35-people-and-ask-for-less-than-9000/">used to generate predictions of Kickstarter success</a>, but here’s how it works, in a nutshell: Users upload and format data (which is actually pretty easy), BigML discovers the myriad relationships between the variables and creates a predictive model, and users enter hypothetical data and receive a prediction.</p>
<p>I’m pretty bad when it comes to entering my data into Fitbit <em>(see disclosure)</em>, but I was <em>relatively</em> good for a month this summer as I prepped for the <a href="http://www.warriordash.com/">Warrior Dash</a>, and that’s the data I used to demonstrate BigML. This prediction of how many calories I can expect to burn in a day would work a lot better if I had a bigger sample size and hadn’t occasionally forgotten to log calories and hours slept, but you get the point. The first image is the model the service generated; the second is the prediction interface.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/01/cals-bigml.jpg"><img alt="cals bigml" src="http://gigaom2.files.wordpress.com/2013/01/cals-bigml.jpg?w=708&#038;h=470" width="708" height="470" class="aligncenter size-large wp-image-605870"></a><a href="http://gigaom2.files.wordpress.com/2013/01/predict.jpg"><img alt="predict" src="http://gigaom2.files.wordpress.com/2013/01/predict.jpg?w=708&#038;h=553" width="708" height="553" class="aligncenter size-large wp-image-605874"></a></p>
<h2 id="google-fusion-tables">Google Fusion Tables</h2>
<p>The user interface for <a href="http://www.google.com/drive/start/apps.html#fusiontables">Google Fusion Tables</a>  isn’t what I’d call pretty (“sparse” is probably a better description), but the still-in-experimental-mode visualization tool sure is easy if your data is nicely formatted. I created this interactive map simply by uploading <a href="http://www.guardian.co.uk/news/datablog/2012/jul/22/gun-homicides-ownership-world-list#data">a publicly available dataset about gun violence</a> and clicking the button to create a map:</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/01/fusion.jpg"><img alt="fusion" src="http://gigaom2.files.wordpress.com/2013/01/fusion.jpg?w=708&#038;h=302" width="708" height="302" class="aligncenter size-large wp-image-605627"></a></p>
<p>For this simple comparison of gun ownership and gun homicide rates, I just checked the countries by which I wanted to filter the chart. Easy.:</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/01/gunscomp.jpg"><img alt="gunscomp" src="http://gigaom2.files.wordpress.com/2013/01/gunscomp.jpg?w=708&#038;h=319" width="708" height="319" class="aligncenter size-large wp-image-605629"></a></p>
<h2 id="infogram">Infogram</h2>
<p>If you have really simple data — like a few columns and a handful of rows — <a href="http://infogr.am/beta/">Infogram</a> might be the easiest to use of the bunch. The company <a href="http://gigaom.com/2012/05/23/infogram-wants-to-help-you-make-beautiful-infographics/">launched last year with a variety of infographic templates</a>, but it has since expanded to include a large number of charts and graphs, too (including line, pie, pictorial, treemap and bubble). Furthermore, it gives sample data, which you can use as an example to enter your own or format the table you want to upload, and the interactive charts embed nicely into web pages (ours, at least).</p>
<p>Here are the top 10 things I ate during the time I was logging food via Fitbit, excluding copious amounts of beer, water, coffee and Diet Pepsi that I didn’t record.</p>
<iframe style="border: none;" src="http://infogr.am/What-I-ate-473875" height="829" width="550" frameborder="0" scrolling="no"></iframe>
<div style="width:550px;border-top:1px solid #acacac;padding-top:3px;font-family:Arial;font-size:10px;text-align:center;"><a style="color:#acacac;text-decoration:none;" href="http://infogr.am/What-I-ate-473875" target="_blank">What I ate</a> | <a style="color:#acacac;text-decoration:none;" href="http://infogr.am" target="_blank">Create infographics</a></div>
<p>In July, I made this chart with Infogram <a href="http://gigaom.com/2012/07/27/chart-apple-facebook-spending-a-lot-on-infrastructure/">comparing infrastructure spending trends</a> among internet companies.</p>
<iframe style="border: none;" src="http://infogr.am/Who-spent-what-on-CAPEX" height="736" width="604" frameborder="0" scrolling="no"></iframe>
<div style="width:604px;border-top:1px solid #acacac;padding-top:3px;font-family:Arial;font-size:10px;text-align:center;"><a style="color:#acacac;text-decoration:none;" href="http://infogr.am/Who-spent-what-on-CAPEX" target="_blank">Who spent what on infrastructure</a> | <a style="color:#acacac;text-decoration:none;" href="http://infogr.am" target="_blank">Create infographics</a></div>
<p>And here’s a sample of the simplest chart in the world.</p>
<iframe style="border: none;" src="http://infogr.am/I-am-this-far-through-my-to-do-list" height="593" width="550" frameborder="0" scrolling="no"></iframe>
<div style="width:550px;border-top:1px solid #acacac;padding-top:3px;font-family:Arial;font-size:10px;text-align:center;"><a style="color:#acacac;text-decoration:none;" href="http://infogr.am/I-am-this-far-through-my-to-do-list" target="_blank">I am this far through my to-do list</a> | <a style="color:#acacac;text-decoration:none;" href="http://infogr.am" target="_blank">Create infographics</a></div>
<h2 id="many-eyes">Many Eyes</h2>
<p><a href="https://www-958.ibm.com/software/analytics/manyeyes/login">Many Eyes</a> is a free web service run by IBM that includes a wide variety of visualizations ranging from maps to pie charts to scatter plots. But what makes it stand apart from the others is the suite of text-analysis tools it offers — not only are they fairly novel, but all they require users to do is paste a page of plain text into the web interface and press a button to visualize it. I used it to analyze the last 15 posts I’ve written for GigaOM.</p>
<p>What did I find? For starters, I use the words “data,” “Facebook” and “users” a lot.</p>
<p style="text-align:center;"><a href="http://gigaom2.files.wordpress.com/2013/01/words-1.jpg"><img alt="words 1" src="http://gigaom2.files.wordpress.com/2013/01/words-1.jpg?w=708&#038;h=330" width="708" height="330" class="wp-image-605619 aligncenter"></a></p>
<p>When it comes to two-word combinations, “big data,” “data centers” and “hard drives” are among the biggies.</p>
<p style="text-align:center;"><a href="http://gigaom2.files.wordpress.com/2013/01/words-2.jpg"><img alt="words 2" src="http://gigaom2.files.wordpress.com/2013/01/words-2.jpg?w=708&#038;h=320" width="708" height="320" class="wp-image-605620 aligncenter"></a></p>
<p>This one is particularly interesting, showing how I tend to form phrases around certain words with common conjunctions, or just a space, in between.</p>
<p style="text-align:center;"><a href="http://gigaom2.files.wordpress.com/2013/01/data.jpg"><img alt="data" src="http://gigaom2.files.wordpress.com/2013/01/data.jpg?w=708&#038;h=354" width="708" height="354" class="wp-image-605621 aligncenter"></a></p>
<p>Apparently, out of 10,013 words, I only used “cloud” 20 times. I usually followed it up with “provider,” “servers,” “computing,” “-based” and “providers.”</p>
<p style="text-align:center;"><a href="http://gigaom2.files.wordpress.com/2013/01/cloud2.jpg"><img alt="cloud2" src="http://gigaom2.files.wordpress.com/2013/01/cloud2.jpg?w=708&#038;h=331" width="708" height="331" class="wp-image-605622 aligncenter"></a></p>
<p style="text-align:left;">For fun, I also made a word cloud based on couple month’s worth of Fitbit food logs. It turns out, you can take the boy out of Wisconsin, but …</p>
<p style="text-align:left;"><a href="http://gigaom2.files.wordpress.com/2013/01/wordcloud.jpg"><img alt="wordcloud" src="http://gigaom2.files.wordpress.com/2013/01/wordcloud.jpg?w=708&#038;h=315" width="708" height="315" class="aligncenter size-large wp-image-604743"></a></p>
<h2 id="statwing">Statwing</h2>
<p><a href="https://www.statwing.com/">Statwing</a> might be my favorite of the bunch, if only because it’s so simple yet actually tries to teach users about statistics. You upload data, check the variables you’re concerned with, and it plots their relationship. (It also can describe the variables by highlighting the sample size, minimum, maximum, mean, median and standard deviation.) Graphs are accompanied by explanations as to how strong the correlation is based on various statistical metrics, as well as the results of a linear regression model.</p>
<p>To demonstrate Statwing, I went back to the Fitbit data. Of the variables that Fitbit tracks, some correlations are easy to predict (e.g., steps and calories burned), but I was kind of surprised to see that the 86 minutes a day I spent being fairly active really weren’t that good of an expenditure of my time.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/01/statwing.jpg"><img alt="statwing" src="http://gigaom2.files.wordpress.com/2013/01/statwing.jpg?w=708&#038;h=310" width="708" height="310" class="aligncenter size-large wp-image-605833"></a></p>
<h2 id="tableau-public">Tableau Public</h2>
<p><a href="http://www.tableausoftware.com/public/">Tableau Public</a>, the only free version of the <a href="http://gigaom.com/2012/02/23/thanks-to-consumerization-its-ipo-season-in-analytics/">popular business-intelligence software</a>, was clearly designed with business users in mind. It expects a lot of structure in the data, and although you can edit almost every aspect of it within the application to get it into usable shape, the service doesn’t allow much guidance if you don’t speak the language of BI (it also requires Windows). But the software is very good at deciphering the characteristics of different variables, the drag-and-drop operation makes it <em>kind of</em> easy to experiment and the wide array of visualizations look really nice.</p>
<p>Using my Fitbit data (and here’s where you see how lax I am at data entry), I created a line graph comparing the calories I ate each day with the calories I burned. Assuming I didn’t go crazy eating on the days I forgot to make entries, the good news is I never ate more calories than I burned. (Note: Although these are static images, Tableau Public actually lets you embed interactive charts, which I’ve used in the past on several occasions, but they don’t always fit well within our pages.)</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/01/cal-tab.jpg"><img alt="cal tab" src="http://gigaom2.files.wordpress.com/2013/01/cal-tab.jpg?w=708&#038;h=297" width="708" height="297" class="aligncenter size-large wp-image-605860"></a>Here’s one I played around with a while back charting <a href="http://gigaom.com/2011/10/26/dont-look-now-but-aws-might-be-a-billion-dollar-biz/">Amazon’s “Other” revenue</a> againt the number of objects stored in Amazon S3.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/01/aws-objrev.jpg"><img alt="aws objrev" src="http://gigaom2.files.wordpress.com/2013/01/aws-objrev.jpg?w=708&#038;h=338" width="708" height="338" class="aligncenter size-large wp-image-605861"></a>Finally, here is my first-ever (I think) Tableau chart, which uses the raw data on government takedown requests that Google provided along with its Transparency Report in October 2011. You can <a href="http://gigaom.com/2011/10/25/google-shows-the-limits-of-a-free-web/">read that post and play with the interactive version here</a>.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/01/goog-trans.jpg"><img alt="goog trans" src="http://gigaom2.files.wordpress.com/2013/01/goog-trans.jpg?w=708&#038;h=360" width="708" height="360" class="aligncenter size-large wp-image-605863"></a></p>
<p> </p>
<p><strong>There is, however, one disclaimer that applies to all of these tools:</strong> I didn’t get into cleaning and formatting data, which can be a somewhat arduous process. Many tools expect some sort of structure to the data — the X axis to be in columns and the Y axis in rows, measurements without units (e.g., grams), etc. — that just isn’t present if you’re downloading an Excel or CSV file rather than creating it yourself. Sometimes, with comprehensive datasets like your Fitbit Premium data, you’ll have to separate or combine the relevant data into new spreadsheet tables before uploading it to a service. But once you have the data ready to go, these tools can help you analyze it, visualize it and hopefully glean some insights from it.</p>
<p><em>Disclosure: Fitbit is backed by True Ventures, a venture capital firm that is an investor in the parent company of this blog, Giga Omni Media. Om Malik, founder of Giga Omni Media, is also a venture partner at True.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=599476&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=913326"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=913326" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=599476+data-for-dummies-5-data-analysis-tools-anyone-can-use&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=599476+data-for-dummies-5-data-analysis-tools-anyone-can-use&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2012/12/sector-roadmap-health-care-and-big-data-in-2012/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=599476+data-for-dummies-5-data-analysis-tools-anyone-can-use&utm_content=dharrisstructure">Health care and big data in 2012</a></li><li><a href="http://pro.gigaom.com/2012/06/cloud-computing-infrastructure-2012-and-beyond/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=599476+data-for-dummies-5-data-analysis-tools-anyone-can-use&utm_content=dharrisstructure">Cloud computing infrastructure: 2012 and beyond</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/01/31/data-for-dummies-5-data-analysis-tools-anyone-can-use/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/01/data1-e1359613608925.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/01/data1-e1359613608925.jpg?w=150" medium="image">
			<media:title type="html">data</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/01/cals-bigml.jpg?w=708" medium="image">
			<media:title type="html">cals bigml</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/01/predict.jpg?w=708" medium="image">
			<media:title type="html">predict</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/01/fusion.jpg?w=708" medium="image">
			<media:title type="html">fusion</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/01/gunscomp.jpg?w=708" medium="image">
			<media:title type="html">gunscomp</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/01/words-1.jpg?w=708" medium="image">
			<media:title type="html">words 1</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/01/words-2.jpg?w=708" medium="image">
			<media:title type="html">words 2</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/01/data.jpg?w=708" medium="image">
			<media:title type="html">data</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/01/cloud2.jpg?w=708" medium="image">
			<media:title type="html">cloud2</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/01/wordcloud.jpg?w=708" medium="image">
			<media:title type="html">wordcloud</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/01/statwing.jpg?w=708" medium="image">
			<media:title type="html">statwing</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/01/cal-tab.jpg?w=708" medium="image">
			<media:title type="html">cal tab</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/01/aws-objrev.jpg?w=708" medium="image">
			<media:title type="html">aws objrev</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/01/goog-trans.jpg?w=708" medium="image">
			<media:title type="html">goog trans</media:title>
		</media:content>
	</item>
		<item>
		<title>Maybe big data can quell gun violence &#8212; but not in the way you think</title>
		<link>http://gigaom.com/2012/12/28/maybe-big-data-can-quell-gun-violence-but-not-in-the-way-you-think/</link>
		<comments>http://gigaom.com/2012/12/28/maybe-big-data-can-quell-gun-violence-but-not-in-the-way-you-think/#comments</comments>
		<pubDate>Fri, 28 Dec 2012 19:00:32 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Crime]]></category>
		<category><![CDATA[gun control]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=597763</guid>
		<description><![CDATA[Big data might not be able to predict when a mass murderer is about to strike, but perhaps it can shed some light on why certain countries have such high murder rates. Are there factors not related to gun control that inspire a willingness to kill?<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=597763&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><strong>Updated: </strong>If the United States really wants to solve its problem with gun deaths, it might want to look at the data. But the process won&#8217;t be easy and almost certainly won&#8217;t provide a magic model by which to predict mass murders before they happen. The issue appears might be more about the American psyche than about guns themselves, so the solution might require broad thinking and long-term solutions to fundamental problems far removed from gun control.</p>
<p>On Thursday, Barnes &amp; Noble VP Marc Parrish <a href="http://www.theatlantic.com/politics/archive/2012/12/how-big-data-can-solve-americas-gun-problem/266633/">wrote a provocative guest post for </a><em><a href="http://www.theatlantic.com/politics/archive/2012/12/how-big-data-can-solve-americas-gun-problem/266633/">The Atlantic</a> </em>explaining how big data technologies could help identify mass murderers such as James Holmes and Adam Lanza before they actually commit their heinous acts. As much I like to prescribe big data as the solution to various problems &#8212; and as much I wish Parrish&#8217;s solution was the right answer &#8212; his assessment is probably a bit fantastical.</p>
<p>There are a whole slew of reasons Parrish&#8217;s hypothesis might fall short, the most obvious of which was pointed out early and often by commenters to the post: There just aren&#8217;t enough incidents of gun-powered mass murder to draw strong assumptions about what types of behavior typically precede such an attack. Here&#8217;s an excerpt from the most-popular comment, from JLR84:</p>
<blockquote><p>&#8220;The &#8216;patterns&#8217; that you think of indicative of a spree-killer in the making are far more common than you think, meaning that the whole thing would be rife with false positives. So many that the authorities would never be able to follow up on them, and the system would quickly be ignored. &#8230; What you think is a &#8216;large amount of ammunition&#8217; isn&#8217;t. &#8230; Spree shooters use relatively small quantities of ammunition compared to the average enthusiast, all things considered. Regular violent criminals, even less.&#8221;</p></blockquote>
<p>Another strong argument has to do with ownership &#8212; who actually owns and purchases the guns used in mass murders, or any other homicide, for that matter? If a shooter steals guns or uses his father&#8217;s gun, for example, the shooter&#8217;s name might never find its way into a government database. Without other evidence <a href="http://www.lasvegassun.com/news/2012/dec/27/fbi-warned-possible-plot-against-las-vegas-high-sc/">linking the possession of guns with intent to do harm</a>, trying to predict who&#8217;ll commit horrific crimes with guns might be a fruitless task.</p>
<p>But that doesn&#8217;t mean there isn&#8217;t a less glamorous way to use data as a means for curbing violence by guns. Perhaps &#8212; if someone were willing to undertake a massive data collection effort, carefully selecting, gathering and analyzing international data on topics such as poverty rates, mental health, gun laws, drug laws, violence in the media, known information about those who have committed murder, family composition, health care, etc. &#8212; we could actually identify commonalities or anomalies that shed some light on why certain countries have higher murder rates than others. It&#8217;s possible that Americans&#8217; easy access to guns only facilitates a willingness to kill that has been cultivated by other factors and extends far beyond the small fraction of deaths attributable to mass murder.</p>
<div id="attachment_597854" class="wp-caption aligncenter" style="width: 543px"><a href="http://gigaom2.files.wordpress.com/2012/12/homicide-rates.jpg"><img  alt="By way of comparison, the Canada has a rate of 1.6; the UK is 1.2. (Source: Wikipedia/UNODC) " src="http://gigaom2.files.wordpress.com/2012/12/homicide-rates.jpg?w=708"   class="size-full wp-image-597854" /></a><p class="wp-caption-text">By way of comparison, Canada has a rate of 1.6; the United Kingdom is 1.2. (Source: Wikipedia/UNODC)</p></div>
<p>Guns certainly make it easier to kill, but they probably don&#8217;t, by their mere presence, inspire violent tendencies. At 4.2 murders per 100,000 people, <a href="http://en.wikipedia.org/wiki/List_of_countries_by_intentional_homicide_rate">according to the United Nations Office on Drugs and Crime</a>, we&#8217;re well above peers such as Canada, Australia and Western European countries &#8212; and even above many African, Middle Eastern, Asian and Eastern European countries. And although the percentage of homicides committed with guns is high in the United States (<a href="http://www.unodc.org/unodc/en/data-and-analysis/homicide.html">the UNODC says 68 percent</a>, or 9,960 murders, in 2010, while the <em>Guardian</em>&#8216;s Datablog <a href="http://www.guardian.co.uk/news/datablog/2012/jul/22/gun-homicides-ownership-world-list">uses data showing 60 percent</a>), there&#8217;s no guarantee many of those murders wouldn&#8217;t have happened or have been attempted by other means.</p>
<div id="attachment_598662" class="wp-caption alignright" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2012/12/gun-homicides-vs-gun-ownership-large-gdp-1.jpg"><img  alt="Gun homicide rate per 100,000 people by country. Updated: the X axis now correctly reads &quot;Guns/100 people.&quot; (Source: KDnuggets)" src="http://gigaom2.files.wordpress.com/2012/12/gun-homicides-vs-gun-ownership-large-gdp-1.jpg?w=300&#038;h=267" width="300" height="267" class="size-medium wp-image-598662" /></a><p class="wp-caption-text">Gun homicide rate per 100,000 people by country. Updated: the X axis now correctly reads &#8220;Guns/100 people.&#8221; (Source: KDnuggets)</p></div>
<p>Looking at statistics about guns alone does little to answer the question. Over at KDnuggets, an online community dedicated to data mining, there <a href="http://www.kdnuggets.com/2012/12/poll-results-gun-ownership-gun-deaths-connection.html">has been some discussion</a> about the correlation between the number of guns in a country and the number of gun deaths. Excluding the United States &#8212; which easily tops the charts in terms of guns per <del>capita</del> 100 people and gun homicide rate (among countries with a per capita GDP of more than $20,000) &#8212; it&#8217;s hard to say with any statistical certainty that having more guns actually does lead to more murder by guns.</p>
<p>So maybe big data really can help solve America&#8217;s penchant for killing by helping us understand why, exactly, so many of our citizens feel so compelled to do so. Instead of trying to figure out <em>when</em> people are going to pull the trigger, let&#8217;s focus on answering <em>why </em>people are so willing to kill in a country that appears to have so much.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-10642p1.html">Shutterstock user Sascha Burkard</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=597763&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=14436"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=14436" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=597763+maybe-big-data-can-quell-gun-violence-but-not-in-the-way-you-think&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/09/listening-platforms-finding-the-value-in-social-media-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=597763+maybe-big-data-can-quell-gun-violence-but-not-in-the-way-you-think&utm_content=dharrisstructure">Listening platforms: finding the value in social media data</a></li><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=597763+maybe-big-data-can-quell-gun-violence-but-not-in-the-way-you-think&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=597763+maybe-big-data-can-quell-gun-violence-but-not-in-the-way-you-think&utm_content=dharrisstructure">A near-term outlook for big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/12/28/maybe-big-data-can-quell-gun-violence-but-not-in-the-way-you-think/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/12/shutterstock_74403016.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/12/shutterstock_74403016.jpg?w=150" medium="image">
			<media:title type="html">bullet holes</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/12/homicide-rates.jpg" medium="image">
			<media:title type="html">By way of comparison, the Canada has a rate of 1.6; the UK is 1.2. (Source: Wikipedia/UNODC) </media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/12/gun-homicides-vs-gun-ownership-large-gdp-1.jpg?w=300" medium="image">
			<media:title type="html">Gun homicide rate per 100,000 people by country. Updated: the X axis now correctly reads &#34;Guns/100 people.&#34; (Source: KDnuggets)</media:title>
		</media:content>
	</item>
		<item>
		<title>Why Nate Silver and others predicted the election perfectly</title>
		<link>http://gigaom.com/2012/11/07/why-nate-silver-and-others-predicted-the-election-perfectly/</link>
		<comments>http://gigaom.com/2012/11/07/why-nate-silver-and-others-predicted-the-election-perfectly/#comments</comments>
		<pubDate>Wed, 07 Nov 2012 18:39:32 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[Election 2012]]></category>
		<category><![CDATA[nate silver]]></category>
		<category><![CDATA[Politics]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=581799</guid>
		<description><![CDATA[Guess what, accurately predicting the outcomes of elections really isn't a partisan affair. What Nate Silver and several others accomplished in perfectly predicting the election isn't about finding data to support their desired outcomes. It's about processing reams of imperfect data and figuring out what matters.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=581799&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>This chart <a href="http://simplystatistics.org/post/35187901781/nate-silver-does-it-again-will-pundits-finally-accept">by Rafa Irizarry at Simply Statistics</a> pretty much sums up the amount of egg on the faces of anyone who questioned Nate Silver&#8217;s prediction that President Obama had a greater than 90 percent chance of winning reelection on Tuesday night. By and large, you&#8217;ll notice, Silver&#8217;s predicted chances of victory in any given state also align nicely with the percentage vote the president received in each state. The bottom line: True data analysis <a href="http://gigaom.com/data/data-doesnt-play-politics-and-most-of-it-suggests-obama-will-win/">doesn&#8217;t care about politics</a>, it cares about being correct.</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/11/silver.jpg"><img  title="silver" alt="" src="http://gigaom2.files.wordpress.com/2012/11/silver.jpg?w=604&#038;h=604" height="604" width="604" class="aligncenter size-large wp-image-581804" /></a></p>
<p>It&#8217;s worth mentioning that Silver wasn&#8217;t the only statistician to perfectly predict the presidential race, either. In terms of Electoral College votes, <a href="http://www.huffingtonpost.com/simon-jackman/pollster-predictions_b_2081013.html">Simon Jackman of Pollster did so</a>, <a href="http://frontloading.blogspot.com/">as did Josh Putnam of Davidson University</a>. Save for Florida, Sam Wang of the <a href="http://election.princeton.edu/" target="_blank">Princeton Election Consortium</a> fared very well, too, <a href="http://election.princeton.edu/">and actually nailed the popular vote split</a>. Slate <a href="http://www.slate.com/articles/news_and_politics/politics/2012/11/pundit_scorecard_checking_pundits_predictions_against_the_actual_results.html">has a nice interactive chart</a> showing how various statisticians and pundits fared in their predictions; there certainly are more predictions and models floating around that haven&#8217;t been included.</p>
<p>The important takeaway, however, is that the people who nailed the outcome <a href="http://simplystatistics.org/post/34635539704/on-weather-forecasts-nate-silver-and-the">didn&#8217;t achieve their results by cherry-picking data</a> that served their political interests. They did it because they&#8217;re professional statisticians whose success depends on accurately predicting the outcomes of events, not on cheerleading for the outcome they might personally desire or that will drive the highest ratings. Even if the data they&#8217;re working with is somewhat biased &#8212; <a href="http://gigaom.com/data/data-doesnt-play-politics-and-most-of-it-suggests-obama-will-win/#comments">as some individuals</a> and organizations suggested to me is the case &#8212; the science comes in being able to take the data sources for what they are and accurately weigh their relevancy.</p>
<blockquote class="twitter-tweet tw-align-center"><p>Wrong. Data ALWAYS does, starting w/ collection | Data doesn’t play politics &amp; says Obama win <a title="http://is.gd/piGdIb" href="http://t.co/Gd16kSEg">is.gd/piGdIb</a> by @<a href="https://twitter.com/derrickharris">derrickharris</a></p>
<p>— Liberationtech (@Liberationtech) <a href="https://twitter.com/Liberationtech/status/265966130634567680" data-datetime="2012-11-06T23:57:21+00:00">November 6, 2012</a></p></blockquote>
<p>In business, <a href="http://gigaom.com/cloud/the-biggest-obstacle-to-embracing-big-data-you/">this is the shift in thinking that&#8217;s driving the movement</a> toward big data and advanced analytics. Forward-thinking companies want to use data to make the right decisions, not to back up their predetermined decisions based largely on gut instinct. But there&#8217;s an unprecedented amount of data at their disposal &#8212; some good, some bad &#8212; which is why data scientists who can figure out what sources to use and how to use them are in such high demand right now.</p>
<p>So in 2014 and and 2016, pollsters are going to keep polling, statisticians are going to keep analyzing those polls (and whatever other factors they choose to include) and, maybe, pundits and the media will pay some attention to what they&#8217;re saying. Probabilities aren&#8217;t promises etched in stone, and a vote either way can change the face of close elections like this one. But no one should be surprised when someone whose only job is to get it right does just that.</p>
<p><em>Feature image courtesy of <a href="http://www.flickr.com/photos/carolyncoles/2389407045/">Flickr user Carolyn Coles</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=581799&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=760147"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=760147" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=581799+why-nate-silver-and-others-predicted-the-election-perfectly&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=581799+why-nate-silver-and-others-predicted-the-election-perfectly&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/09/listening-platforms-finding-the-value-in-social-media-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=581799+why-nate-silver-and-others-predicted-the-election-perfectly&utm_content=dharrisstructure">Listening platforms: finding the value in social media data</a></li><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=581799+why-nate-silver-and-others-predicted-the-election-perfectly&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/11/07/why-nate-silver-and-others-predicted-the-election-perfectly/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/11/monalisa-egg-e1323204297726.jpeg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/11/monalisa-egg-e1323204297726.jpeg?w=150" medium="image">
			<media:title type="html">monalisa egg on face</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/11/silver.jpg?w=604" medium="image">
			<media:title type="html">silver</media:title>
		</media:content>
	</item>
	</channel>
</rss>
