<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>GigaOM &#187; kaggle</title>
	<atom:link href="http://gigaom.com/tag/kaggle/feed/" rel="self" type="application/rss+xml" />
	<link>http://gigaom.com</link>
	<description></description>
	<lastBuildDate>Sun, 19 May 2013 01:28:26 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='gigaom.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/0db8f6557d022075dbbf010c54d46d93?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>GigaOM &#187; kaggle</title>
		<link>http://gigaom.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://gigaom.com/osd.xml" title="GigaOM" />
	<atom:link rel='hub' href='http://gigaom.com/?pushpress=hub'/>
		<item>
		<title>On Kaggle, GE finds data science solutions for patients and pilots</title>
		<link>http://gigaom.com/2013/04/03/on-kaggle-ge-finds-data-science-solutions-for-patients-and-pilots/</link>
		<comments>http://gigaom.com/2013/04/03/on-kaggle-ge-finds-data-science-solutions-for-patients-and-pilots/#comments</comments>
		<pubDate>Wed, 03 Apr 2013 13:00:54 +0000</pubDate>
		<dc:creator>Jordan Novet</dc:creator>
				<category><![CDATA[data scientists]]></category>
		<category><![CDATA[GE]]></category>
		<category><![CDATA[kaggle]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=626845</guid>
		<description><![CDATA[GE, an airline and a health system have crowdsourced data science questions using Kaggle and are now paying out $600,000 to winners of two competitions.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=626845&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Even GE respects the wisdom of the crowd. The manufacturer joined up with Alaska Airlines, the Ochsner Health System and <a href="http://gigaom.com/2012/06/06/kaggle-is-now-crowdsourcing-data-science-creativity/">Kaggle</a> in November to ask outside data scientists and designers to help give pilots actionable data and make hospital visits and subsequent care more efficient.</p>
<p>The organizers of the first <a href="http://gigaom.com/2012/11/29/ge-needs-the-data-analytics-minds-of-the-valley-and-knows-it/">Industrial Internet Quests</a> have since received more than 3,000 submissions and were expecting to announce on Wednesday the contestants who will receive a total of $600,000. One submission for the flight competition has earned $100,000 for its developers, a five-person team from Singapore.</p>
<p>Kaggle has hosted data-science competitions for several <a href="http://www.kaggle.com/solutions/customers">other</a> brand-name companies, from Facebook to Ford. Its publicly available <a href="http://gigaom.com/2012/09/12/can-kaggle-make-data-science-a-spectator-sport/">leaderboards</a> make data science a bit like a spectator sport, and open-source education on machine learning and natural-language processing makes it <a href="http://gigaom.com/2012/10/14/why-becoming-a-data-scientist-might-be-easier-than-you-think/">possible</a> for lots of people to compete. </p>
<p>Demand is sky-high for data scientists and application developers, and farming out one-off projects is a common practice in all sorts of industries. That&#8217;s why it&#8217;s not surprising to see even big companies like GE turning to the crowd for data science solutions. And it&#8217;s why this sort of news could become more common in the future.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=626845&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=860575"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=860575" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=626845+on-kaggle-ge-finds-data-science-solutions-for-patients-and-pilots&utm_content=gigajordan">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/01/12-tech-leaders-resolutions-for-2012/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=626845+on-kaggle-ge-finds-data-science-solutions-for-patients-and-pilots&utm_content=gigajordan">12 tech leaders’ resolutions for 2012</a></li><li><a href="http://pro.gigaom.com/2011/11/connected-world-the-consumer-technology-revolution/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=626845+on-kaggle-ge-finds-data-science-solutions-for-patients-and-pilots&utm_content=gigajordan">Connected world: the consumer technology revolution</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=626845+on-kaggle-ge-finds-data-science-solutions-for-patients-and-pilots&utm_content=gigajordan">A near-term outlook for big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/03/on-kaggle-ge-finds-data-science-solutions-for-patients-and-pilots/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/04/data-scientist-job-board.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/04/data-scientist-job-board.jpg?w=150" medium="image">
			<media:title type="html">Data scientist job board</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/c00ab753df107b639e76ed4c3ab07ba7?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigajordan</media:title>
		</media:content>
	</item>
		<item>
		<title>Meet Kaggle Connect: matchmaker for data scientists and companies that need them</title>
		<link>http://gigaom.com/2013/03/05/kaggle-connect-matchmaker-for-data-scientists-and-companies-that-need-them/</link>
		<comments>http://gigaom.com/2013/03/05/kaggle-connect-matchmaker-for-data-scientists-and-companies-that-need-them/#comments</comments>
		<pubDate>Tue, 05 Mar 2013 16:00:56 +0000</pubDate>
		<dc:creator>Barb Darrow</dc:creator>
				<category><![CDATA[Anthony Goldbloom]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[kaggle]]></category>
		<category><![CDATA[Kaggle Connect]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=616635</guid>
		<description><![CDATA[Putting the right brains on the right problems is the goal of Kaggle Connect. <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=616635&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Here are two certainties about big data. One is that companies need good data scientists. The other is that identifying good data scientists ain&#8217;t easy. That&#8217;s why Kaggle, the <a href="http://gigaom.com/2012/09/12/can-kaggle-make-data-science-a-spectator-sport/">data science competition platform</a>, is launching Kaggle Connect to link proven data science performers with companies willing to pay for their expertise.</p>
<p>Everyone calls himself a data scientist now &#8212; and <a href="http://gigaom.com/2012/02/17/big-data-skills-bring-big-dough/">there&#8217;s a reason for that.</a> The title &#8220;gets you 40 percent more money,&#8221; says Kaggle CEO <a href="http://www.kaggle.com/careers/team">Anthony Goldbloom.</a> &#8221;The problem is that it&#8217;s hard to know how good someone really is until six months down the road when you realize they haven&#8217;t done anything.&#8221;</p>
<p>His argument is that folks who have done well in Kaggle competitions over the past two years &#8212; insurance actuaries, mathematicians, students, chemists &#8212; have proven they have what it takes.</p>
<p>And Kaggle bona fides are becoming currency. <a href="http://www.linkedin.com/jobs?viewJob=&amp;jobId=4902477">This job posting</a> for a<em> New York Times</em> data scientist lists participation in a Kaggle competition as a key criterion.</p>
<h2 id="connecting-the-right-data-scie">Connecting the right data scientists with the right problems</h2>
<p>With Kaggle Connect, the company is making its two top tiers of competitors &#8212; it&#8217;s an invitation-only list &#8212; available to companies on an individual basis. &#8220;If Pfizer comes to us with a problem that is maybe not well specified enough and needs more iteration than a competition would allow, we can provide a data scientist that suits that problem,&#8221; Goldbloom said.</p>
<p><a href="http://gigaom.com/2013/03/05/kaggle-connect-matchmaker-for-data-scientists-and-companies-that-need-them/kaggleranks2-2/" rel="attachment wp-att-616775"><img  alt="kaggleranks2" src="http://gigaom2.files.wordpress.com/2013/03/kaggleranks21.jpg?w=708&#038;h=428" width="708" height="428" class="aligncenter size-full wp-image-616775" /></a></p>
<p>The customer pays a subscription cost of somewhere between $30,000 and $100,000 per month to gain access to appropriate data science resources. Kaggle gets a cut of that money and the data scientist gets the rest &#8212; although Kaggle is not breaking out the percentages.</p>
<p>In the interactive chart below, click on the map to bring up the name, picture and profile of the Kaggle Connect member.<br />
<iframe src="http://kaggle.cartodb.com/tables/kaggle_connect_members/embed_map?title=true&amp;description=true&amp;search=false&amp;shareable=false&amp;cartodb_logo=true&amp;sql=&amp;zoom=0&amp;center_lat=39.36827914916011&amp;center_lon=150.46875" height="400" width="400" frameborder="0"></iframe></p>
<p>What Kaggle brings to the table is a roster of people who have performed well in its competitions. What the companies provide is a juicy problem to solve and data to use in that quest. In some ways this is an extension of what Kaggle has already done with <a href="http://gigaom.com/2012/10/23/greenplum-kaggle-play-big-data-matchmakers/">EMC&#8217;s Greenplum division</a>, although that project required the use of Greenplum&#8217;s Chorus toolset.<br />
<a href="http://gigaom.com/?attachment_id=616731" rel="attachment wp-att-616731"><img  alt="kaggleuserspecialty" src="http://gigaom2.files.wordpress.com/2013/03/kaggleuserspecialty.jpg?w=708&#038;h=395" width="708" height="395" class="aligncenter size-full wp-image-616731" /></a><br />
The top two of eight total tiers of 80,000 contestants will initially serve as the invitation-only talent pool for Kaggle Connect. That&#8217;s about 1,500 Kagglers (if that&#8217;s a word). Kaggle began running data science competitions in early 2012 and started <a href="http://gigaom.com/2012/09/12/can-kaggle-make-data-science-a-spectator-sport/">publishing its leaderboard</a> of top big data problem solvers last September.</p>
<p>We&#8217;ll see how this all proves out, but if Kaggle success is really a predictor of big data chops writ large, expect to see a lot more Kaggle boasts on resumes going forward.</p>
<p><em>Feature photo courtesy of Shutterstock user Dirk Ercken.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=616635&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=217270"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=217270" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=616635+kaggle-connect-matchmaker-for-data-scientists-and-companies-that-need-them&utm_content=gigabarb">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=616635+kaggle-connect-matchmaker-for-data-scientists-and-companies-that-need-them&utm_content=gigabarb">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/01/12-tech-leaders-resolutions-for-2012/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=616635+kaggle-connect-matchmaker-for-data-scientists-and-companies-that-need-them&utm_content=gigabarb">12 tech leaders’ resolutions for 2012</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=616635+kaggle-connect-matchmaker-for-data-scientists-and-companies-that-need-them&utm_content=gigabarb">Infrastructure Q1: Cloud and big data woo enterprises</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/03/05/kaggle-connect-matchmaker-for-data-scientists-and-companies-that-need-them/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/03/shutterstock_125574617.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/03/shutterstock_125574617.jpg?w=150" medium="image">
			<media:title type="html">Big Data</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/4af03439988d64f816da72496325cb73?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigabarb</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/kaggleranks21.jpg" medium="image">
			<media:title type="html">kaggleranks2</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/kaggleuserspecialty.jpg" medium="image">
			<media:title type="html">kaggleuserspecialty</media:title>
		</media:content>
	</item>
		<item>
		<title>Greenplum and Kaggle launch big data matchmaking service</title>
		<link>http://gigaom.com/2012/10/23/greenplum-kaggle-play-big-data-matchmakers/</link>
		<comments>http://gigaom.com/2012/10/23/greenplum-kaggle-play-big-data-matchmakers/#comments</comments>
		<pubDate>Tue, 23 Oct 2012 12:03:13 +0000</pubDate>
		<dc:creator>Barb Darrow</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Gnip]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[kaggle]]></category>
		<category><![CDATA[tableau]]></category>
		<category><![CDATA[Twitter]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=576147</guid>
		<description><![CDATA[EMC's Greenplum division hopes to encourage users of its Chorus big data application to reach out to the Kaggle community of data scientists to do real-world work. The company also inked partnerships with Gnip and Tableau and open-sourced a version of Chorus.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=576147&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Sometimes it&#8217;s hard for data scientists and big data sets to find each other. That&#8217;s the problem that EMC&#8217;s Greenplum division and Kaggle are taking on with a new partnership. <a href="http://gigaom.com/data/can-kaggle-make-data-science-a-spectator-sport/">Kaggle</a> is a predictive modeling platform that sponsors competitions in which data scientists compete to solve big data problems.</p>
<p>Under the new alliance, Kaggle&#8217;s community of big data eggheads can use <a href="http://gigaom.com/cloud/emc-gets-it-big-data-needs-apps-too/">Greenplum&#8217;s Chorus big data application</a> to solve real-world problems.</p>
<p>&#8220;We&#8217;ve had good adoption of Chorus and companies&#8217; internal data workers are using it to do data science so they now have the tools, but honestly they don&#8217;t have all the people they need,&#8221; said Josh Klahr,  VP of product management for Greenplum. &#8220;Now you can search the Kaggle community based on rank, expertise, location and invite them to work on your challenge using Greenplum Chorus.&#8221;</p>
<h2>Playing Yenta to big data players</h2>
<p>Kaggle ranks its participants much the way the USTA ranks tennis players. And that community is growing fast &#8212; when Kaggle started fundraising in August, there were 11,000 members, now there are close to 60,000, said Anthony Goldbloom, CEO of Kaggle, who said this is the first such vendor partnership Kaggle has done. (<a href="http://gigaom.com/data/forget-your-fancy-data-science-try-overkill-analytics/">GigaOM has worked with Kaggle and Splunk </a>on the <a href="http://www.kaggle.com/c/predict-wordpress-likes">GigaOM WordPress Challenge: Splunk Innovation Prospect</a>.)</p>
<p>Also on the partner ecosystem front, Greenplum inked a deal that gives Chorus users access to <a href="http://gnip.com/">Gnip&#8217;s</a> historical Twitter feeds and will let Chorus users import Twitter streams into their Chorus sandbox for analysis. And finally, Chorus is partnering with <a href="http://www.tableau.com/">Tableau</a>, the popular analytics tool so that users can provision Tableau workbooks from their Chorus data sources.</p>
<p>Big data is one area where building a broad ecosystem of data providers is incredibly important. Putting good data scientists together with great data sets is incredibly important, said Ben Woo, managing director of research firm <a href="http://www.neuralytics.com/">Neuralytics</a>. &#8220;Big data is awfully short on the kinds of people who&#8217;ve done this work before and, frankly, people who give a damn. This sort of matchmaking is valuable.&#8221;</p>
<h2>Goal: Melding public and private data to spark new insights</h2>
<p>This convergence of publicly available &#8220;sentiment&#8221; data from sources like Twitter and internal business data lets data scientists ask interesting questions or find interesting questions to ask. For example, a pharmaceutical company has lots of its own data on a new drug. What it may not have is the sort of information about unforeseen side effects that might surface on Twitter or blogs after the drug is released.  &#8221;If you can match Twitter feeds and patient forums, you can find out unexpected things &#8212; see that maybe people are switching from your drug to another. Analyzing that discussion can be incredibly important,&#8221;  Goldbloom said.</p>
<p>In related news, Greenplum, as promised last spring, is open sourcing Chorus as the <a href="http://www.openchorus.org/">OpenChorus Project</a> under the Apache 2.0 license.</p>
<p><em><a title="Attribution License" href="http://creativecommons.org/licenses/by/2.0/">Feature photo courtesy of</a> Flickr user <a href="http://www.flickr.com/photos/kevinkrejci/">Kevin Krejci</a></em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=576147&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=906359"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=906359" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=576147+greenplum-kaggle-play-big-data-matchmakers&utm_content=gigabarb">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/01/12-tech-leaders-resolutions-for-2012/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=576147+greenplum-kaggle-play-big-data-matchmakers&utm_content=gigabarb">12 tech leaders’ resolutions for 2012</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=576147+greenplum-kaggle-play-big-data-matchmakers&utm_content=gigabarb">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=576147+greenplum-kaggle-play-big-data-matchmakers&utm_content=gigabarb">The importance of putting the U and I in visualization</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/10/23/greenplum-kaggle-play-big-data-matchmakers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/10/6259499293_b577b94cfd_z.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/10/6259499293_b577b94cfd_z.jpg?w=150" medium="image">
			<media:title type="html">big data</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/4af03439988d64f816da72496325cb73?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigabarb</media:title>
		</media:content>
	</item>
		<item>
		<title>Why becoming a data scientist might be easier than you think</title>
		<link>http://gigaom.com/2012/10/14/why-becoming-a-data-scientist-might-be-easier-than-you-think/</link>
		<comments>http://gigaom.com/2012/10/14/why-becoming-a-data-scientist-might-be-easier-than-you-think/#comments</comments>
		<pubDate>Mon, 15 Oct 2012 05:02:46 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big datam data science]]></category>
		<category><![CDATA[Coursera]]></category>
		<category><![CDATA[edx]]></category>
		<category><![CDATA[kaggle]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[online education]]></category>
		<category><![CDATA[online education startups]]></category>
		<category><![CDATA[Udacity]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=572844</guid>
		<description><![CDATA[Several novice programmers who signed up for a free machine-learning class on Coursera have gone on recently to win predictive-modeling competitions.  Maybe it's not that hard to mint new data scientists after all.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=572844&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Maybe the business world has jumped the gun with all the talk about a looming skills shortage in big data and advanced analytics. There&#8217;s mounting evidence that it doesn&#8217;t take much to turn a novice programmer or statistician into a perfectly capable data scientist. Maybe all it takes is just some cheap cloud computing servers, or a few weeks studying machine learning with Stanford professor Andrew Ng on Coursera.</p>
<p>Much of this evidence comes via Kaggle, a platform where companies and organizations award prizes for the best solutions to their predictive-modeling needs. In September, for example, I <a href="http://gigaom.com/data/forget-your-fancy-data-science-try-overkill-analytics/">covered a first-time Kaggle user and admitted data science neophyte</a> named Carter S. who won a competition using a simple but effective method he dubbed &#8220;overkill analytics.&#8221;</p>
<p>Impressive, sure, but Carter builds insurance-industry risk models for a living. While he&#8217;s able to learn new techniques such as natural-language processing and social network analysis as he goes, he&#8217;s no stranger to a linear regression. But what if someone&#8217;s only formal experience with computer science was a single undergraduate programming course?</p>
<p>Ask <a href="https://www.kaggle.com/users/23759/luis-tandalla">Luis Tandalla</a>. That was his case before he took a handful of free online classes last year on <a href="https://www.coursera.org/">Coursera</a>. Yet the University of New Orleans senior recently scored his first victory in a <a href="http://www.kaggle.com/c/asap-sas/details/preliminary-winners">Kaggle competition hosted by the Hewlett Foundation</a> where he had to devise a model for accurately grading short-answer questions on exams. Not bad for a college senior who didn&#8217;t really know what artificial intelligence and machine learning were before he signed up to learn them.</p>
<p>Once Tandalla got started, he told me, he got passionate about learning more. So he also took Coursera classes on natural-language processing and probabilistic models, began studying on his own outside the online lectures and even got active on Kaggle (this was his first victory in five competitions). He&#8217;ll receive his bachelor&#8217;s degree in mechanical engineering in May 2013, but now Tandalla says he wants to pursue a master&#8217;s degree in machine learning and start his own predictive-software company</p>
<h2>The Coursera connection</h2>
<p>Maybe Tandalla isn&#8217;t so unique after all. The second- and third-place finishers in the Heritage Foundation competition, it turns out, also learned machine learning on Coursera. The latter, <a href="https://www.kaggle.com/users/17379/gxav-xavier-conort">Xavier Conort</a>, is a 39-year-old actuary from Singapore who just decided to become a data scientist last year and is now Kaggle&#8217;s top-ranked competitor.</p>
<div id="attachment_573215" class="wp-caption alignright" style="width: 266px"><a href="http://gigaom2.files.wordpress.com/2012/10/andrew_ng.jpg"><img  title="Andrew_Ng" alt="" src="http://gigaom2.files.wordpress.com/2012/10/andrew_ng.jpg?w=708"   class="size-full wp-image-573215" /></a><p class="wp-caption-text">Andrew Ng</p></div>
<p>Stanford professor and Coursera co-founder Andrew Ng &#8212; who teaches the machine-learning class that all three top finishers took &#8212; doesn&#8217;t think their success is just coincidence. If you&#8217;re not trying to make the types of contacts students at top universities are after, and your goal isn&#8217;t to perform advanced research, he explained, online education platforms such as Coursera (and, I&#8217;ll add, <a href="http://www.udacity.com/">Udacity</a> and <a href="https://www.edx.org/">EdX</a>), can be incredibly valuable.</p>
<p>In particular, Ng said, &#8220;Machine learning has matured to the point by where if you take one class you can actually become pretty good at applying it.&#8221; Familiarity with algebra and probabilities are certainly helpful, he added, but the only real prerequisite to his course is a basic understanding of programming.</p>
<p>And with machine learning becoming &#8220;one of the more highly sought-after skills in Silicon Valley,&#8221; Ng said, corporate recruiters say just completing a single course can significantly boost someone&#8217;s salary and job prospects at companies where such knowledge is still in short supply.</p>
<p>&#8220;I bet many students are going on to [do] great things because of these courses [even if we never hear about it],&#8221; Ng said.</p>
<h2>Why it works, and why it could change the world</h2>
<p>Ng thinks the current incarnation of online education platforms work so well because they&#8217;re essentially nurturing the already-talented students who seek them out. Some professionals, he explained, take courses to learn skills such as machine learning or iOS programming that weren&#8217;t in vogue or didn&#8217;t even exist when they earned their computer science degrees just a decade ago.</p>
<p>Furthermore, with students able to learn at their own pace, there&#8217;s a lot of valuable information disseminated in the discussion forums.</p>
<p>Free access to the best teachers around doesn&#8217;t hurt either. Ng said he couldn&#8217;t teach his course so well if he hadn&#8217;t spent so much time living in Silicon Valley learning best practices from some of the smartest computer scientists on the planet. That experience lets him spend less time teaching algorithms for the sake of algorithms and more time talking about how one might actually apply machine learning in the field.</p>
<p>Ng says that&#8217;s a more important than just understanding the algorithms in a vacuum. He compares it to learning how to write a computer program instead of just learning the syntax of a programming language but not being able to string commands together into something useful. This approach isn&#8217;t entirely unique among the new order of online educators: On Udacity, for example, Google VP and Stanford professor Sebastian Thrun, centers the Computer Science 101 curriculum around learning Python in the context of building a working search engine.</p>
<p><iframe src="https://www.youtube.com/embed/e0WKJLovaZg?feature=player_detailpage" height="360" width="640"></iframe></p>
<p>The value of this opportunity wasn&#8217;t lost on Tandalla. He said he can feel the passion that professors have even through the pre-recorded video lectures, and it feels good knowing you&#8217;re learning from the people who literally wrote the book on the subject you&#8217;re studying.</p>
<h2>Who knows who&#8217;s the next Einstein</h2>
<p>But ultimately, minting new data scientists &#8212; even Kaggle winners &#8212; is low-hanging fruit. Ng said we don&#8217;t yet know how much impact online educations platforms like Coursera can have. In all fields, there are talented people all over the world who just need an avenue to hone their skills and a chance to distinguish themselves.</p>
<p>&#8220;It makes me wonder,&#8221; Ng said, &#8220;if the next Albert Einstein is a little girl in Afghanistan who just needs [the opportunity to access quality education].&#8221;</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=572844&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=527946"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=527946" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=572844+why-becoming-a-data-scientist-might-be-easier-than-you-think&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=572844+why-becoming-a-data-scientist-might-be-easier-than-you-think&utm_content=dharrisstructure">Takeaways from the second quarter in cloud and data</a></li><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=572844+why-becoming-a-data-scientist-might-be-easier-than-you-think&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2012/04/aws-storage-gateway-jolts-cloud-storage-ecosystem/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=572844+why-becoming-a-data-scientist-might-be-easier-than-you-think&utm_content=dharrisstructure">AWS Storage Gateway jolts cloud-storage ecosystem</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/10/14/why-becoming-a-data-scientist-might-be-easier-than-you-think/feed/</wfw:commentRss>
		<slash:comments>24</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/05/machine-learning.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/05/machine-learning.jpg?w=150" medium="image">
			<media:title type="html">machine learning</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/10/andrew_ng.jpg" medium="image">
			<media:title type="html">Andrew_Ng</media:title>
		</media:content>
	</item>
		<item>
		<title>NASA tries to free creativity with Big Data Challenge</title>
		<link>http://gigaom.com/2012/10/03/nasa-tries-to-free-creativity-with-big-data-challenge/</link>
		<comments>http://gigaom.com/2012/10/03/nasa-tries-to-free-creativity-with-big-data-challenge/#comments</comments>
		<pubDate>Wed, 03 Oct 2012 19:51:44 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[crowdsourcing]]></category>
		<category><![CDATA[data munging]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[InnoCentive]]></category>
		<category><![CDATA[kaggle]]></category>
		<category><![CDATA[NASA]]></category>
		<category><![CDATA[TopCoder]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=569477</guid>
		<description><![CDATA[NASA and a couple other government agencies have kicked off a series of TopCoder challenges designed to find innovative solutions to the government's big data problems. The first contest is all about making disparate, incompatible data sets usable and actually valuable across agencies.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=569477&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Some of the U.S. government&#8217;s most research-intensive agencies want your help to come up with better ways to analyze their expansive data sets. NASA, along with the National Science Foundation and the Department of Energy, launched a competition on <a href="http://topcoder.com">TopCoder</a> called the <a href="http://community.topcoder.com/coeci/nitrd/">Big Data Challenge</a> series. Essentially, it&#8217;s a competition to crowdsource a solution to the very big problem of fragmented and incompatible federal data.</p>
<p>The <a href="http://community.topcoder.com/tc?module=ProjectDetail&amp;pj=30030561">first contest in the series</a> involves answering a question, albeit a difficult one. From the contest page:</p>
<blockquote><p>&#8220;How can we make heterogeneous (dissimilar and incompatible) data sets homogeneous (uniformly accessible, compatible, able to be grouped and/or matched) so usable information can be extracted? How can information then be converted into real knowledge that can inform critical decisions and solve societal challenges?&#8221;</p></blockquote>
<p>This a problem that&#8217;s magnified in government agencies, but that plagues companies of all types that try to get started with big data. While future big data strategies might mandate a particular data format or other standards, the past and, often, the present is a messy pile of stuff created by different divisions within different agencies or departments. The dream of creating spectacular algorithms, beautiful visualizations and uncovering hidden insights often only comes after untold man-hours spent cleaning and <a href="http://en.wikipedia.org/wiki/Data_munging">munging data</a> (often with Hadoop) into formats that software systems can work with.</p>
<p>The registration deadline for the first contest is Oct. 13, and the submission deadline is Oct. 19. The 1st place prize is $1,000. Later contests will focus on more domain-specific fields such as energy, health care and earth science.</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/10/topcoder.jpg"><img  title="topcoder" src="http://gigaom2.files.wordpress.com/2012/10/topcoder.jpg?w=300&#038;h=231" alt="" width="300" height="231" class="alignleft size-medium wp-image-569538" /></a>Although the possibility of influencing big data strategies within some of the country&#8217;s most advanced agencies might be novel, crowdsourcing solutions to these types of difficult problems is becoming rather common. TopCoder exists as a competitive platform for solving application development and design issues, and the Big Data Challenge is just the latest it&#8217;s hosting for NASA via the agency&#8217;s <a href="http://community.topcoder.com/coeci/">Center of Excellence for Collaborative Innovation</a> and <a href="http://www.nasa.gov/directorates/heo/ntl/">NASA Tournament Lab</a>. There&#8217;s also general-purpose platform <a href="http://www.innocentive.com/">InnoCentive</a> and <a href="http://gigaom.com/data/can-kaggle-make-data-science-a-spectator-sport/">wildly popular data science platform Kaggle</a>.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-80956p1.html">Shutterstock user Prokhorova Nadila</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=569477&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=597447"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=597447" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=569477+nasa-tries-to-free-creativity-with-big-data-challenge&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2011/12/defining-work-in-the-digital-age-an-analysis-by-gigaom-pro/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=569477+nasa-tries-to-free-creativity-with-big-data-challenge&utm_content=dharrisstructure">Defining work in the digital age: an analysis by GigaOM Pro</a></li><li><a href="http://pro.gigaom.com/2012/01/12-tech-leaders-resolutions-for-2012/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=569477+nasa-tries-to-free-creativity-with-big-data-challenge&utm_content=dharrisstructure">12 tech leaders’ resolutions for 2012</a></li><li><a href="http://pro.gigaom.com/2012/11/sector-roadmap-crowd-labor-platforms-in-2012/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=569477+nasa-tries-to-free-creativity-with-big-data-challenge&utm_content=dharrisstructure">Examining the rise of crowd labor platforms in 2012</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/10/03/nasa-tries-to-free-creativity-with-big-data-challenge/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/10/shutterstock_110216492.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/10/shutterstock_110216492.jpg?w=150" medium="image">
			<media:title type="html">Network of stars</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/10/topcoder.jpg?w=300" medium="image">
			<media:title type="html">topcoder</media:title>
		</media:content>
	</item>
		<item>
		<title>Forget your fancy data science, try overkill analytics</title>
		<link>http://gigaom.com/2012/09/21/forget-your-fancy-data-science-try-overkill-analytics/</link>
		<comments>http://gigaom.com/2012/09/21/forget-your-fancy-data-science-try-overkill-analytics/#comments</comments>
		<pubDate>Fri, 21 Sep 2012 17:00:24 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[kaggle]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=565355</guid>
		<description><![CDATA[Carter S. won his first-ever Kaggle competition -- our own GigaOM WordPress Challenge -- using a brute force method of data science he calls overkill analytics. Rather than spend untold hours perfecting complex models, Carter used simple algorithms and let powerful microprocessors do the rest.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=565355&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Meet Carter S. He used to be a lawyer, but now he writes predictive models for an insurance company. Admittedly green in certain new or advanced modeling methods, he prefers to use simple algorithms and throw as much computing power as possible problems. He <a href="http://www.overkillanalytics.net/about-overkill-analytics/">calls the technique &#8220;overkill analytics,&#8221;</a> and it just won him his first contest on Kaggle, defeating more than 80 other competitors in the <a href="http://www.kaggle.com/c/predict-wordpress-likes">GigaOM WordPress Challenge: Splunk Innovation Prospect</a>  <em>(see disclosure)</em>.</p>
<p>Not only was this Carter&#8217;s first win, it was also his first contest. You can <a href="http://www.overkillanalytics.net/kaggles-wordpress-challenge-the-like-graph/">read the detailed explanation of his victory</a> on his blog, but the gist is that he didn&#8217;t get too involved with complex social graphing to determine relationships or natural language processing to determine topics readers liked. He figured out that most of what people liked came from blogs they&#8217;ve already read, and that the vast majority of posts people liked fell within a three-node radius on a simple social graph.</p>
<p>Statistically speaking, he did a <a href="http://en.wikipedia.org/wiki/Generalized_linear_model">generalized linear regression model</a>, followed by a <a href="http://en.wikipedia.org/wiki/Random_forest">random forest model</a> and averaged the results. &#8220;I&#8217;m not sure it&#8217;s a very unique technique,&#8221; he told me, &#8220;but it&#8217;s certainly a very powerful one.&#8221;</p>
<div id="attachment_565426" class="wp-caption aligncenter" style="width: 590px"><a href="http://gigaom2.files.wordpress.com/2012/09/blog-wordpress-centralitylift-580x295.jpg"><img  title="blog-wordpress-centralitylift-580x295" src="http://gigaom2.files.wordpress.com/2012/09/blog-wordpress-centralitylift-580x295.jpg?w=708" alt=""   class="size-full wp-image-565426" /></a><p class="wp-caption-text">Source: Overkill Analytics</p></div>
<p>And therein lies the beauty of overkill analytics, a term that Carter might have coined, but that appears to be catching on &#8212; especially in the world of web companies and big data. Carter says he doesn&#8217;t want to spend a lot of time fine-tuning models, writing complex algorithms or pre-analyzing data to make it work for his purposes. Rather, he wants to utilize some simple models, reduce things to numbers and process the heck out of the data set on as much hardware as is possible.</p>
<p>It&#8217;s not about big data so much as it is about big computing power, he said. There&#8217;s still work to be done on smaller data sets like the majority of the world deals with, but Hadoop clusters and other architectural advances let you do more to that data in a faster time than was previously possible. Now, Carter said, as long as you account for the effects of overprocessing data, you can create a black-box-like system and run every combination of simple techniques on data until you get the most-accurate answer.</p>
<p>I <a href="http://gigaom.com/data/5-ideas-to-help-everyone-make-the-most-of-big-data/">wrote about the same general theory recently</a> in explaining why Sparked.com&#8217;s Daniel Wiesenthal believes that big data (i.e., lots and lots of data combined with new storage and processing technologies) improves the practice of data science (i.e., the application of statistical techniques to data). The gist of his theory is that although complex models are great for small data sets, simple models can close the accuracy gap when applied to large data sets. Combine that with infrastructure that can process a lot of data relatively fast and support a wide variety of jobs, and you have a simpler, faster equally effective method.</p>
<p>Still, Carter said he didn&#8217;t get involved in Kaggle just to prove the effectiveness of overkill analytics. He does hope to get exposed to new data science techniques that haven&#8217;t yet caught on in the insurance industry, and he also wants to make a name for himself. When you work for a company with little turnover, he said, your professional network doesn&#8217;t grow too much, but doing Kaggle competitions is a great way to meet other data scientists &#8212; and <a href="http://gigaom.com/data/can-kaggle-make-data-science-a-spectator-sport/">winning is a great way to earn respect</a>.</p>
<p>Ali Ahmad (username Xali) won the separate Splunk Innovation portion of the contest. According to a statement from Splunk, he &#8220;used Splunk&#8217;s built in statistical and visualization features to map out the relationship between blogs containing YouTube videos with those that are most likely to be viral, as measured by likes and shares. As a bonus, he fed the data into an app to view the YouTube videos most commonly liked and shared via WordPress blogs!&#8221;</p>
<p><em><strong>Disclosure</strong>: Automattic, maker of WordPress.com, is backed by True Ventures, a venture capital firm that is an investor in the parent company of this blog, GigaOm. Om Malik, founder of GigaOm, is also a venture partner at True.</em></p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-674152p1.html">Shutterstock user nasirkhan</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=565355&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=176375"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=176375" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=565355+forget-your-fancy-data-science-try-overkill-analytics&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=565355+forget-your-fancy-data-science-try-overkill-analytics&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=565355+forget-your-fancy-data-science-try-overkill-analytics&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=565355+forget-your-fancy-data-science-try-overkill-analytics&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/09/21/forget-your-fancy-data-science-try-overkill-analytics/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/09/shutterstock_86909912-e1348241105800.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/09/shutterstock_86909912-e1348241105800.jpg?w=150" medium="image">
			<media:title type="html">workflow</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/09/blog-wordpress-centralitylift-580x295.jpg" medium="image">
			<media:title type="html">blog-wordpress-centralitylift-580x295</media:title>
		</media:content>
	</item>
		<item>
		<title>Can Kaggle make data science a spectator sport?</title>
		<link>http://gigaom.com/2012/09/12/can-kaggle-make-data-science-a-spectator-sport/</link>
		<comments>http://gigaom.com/2012/09/12/can-kaggle-make-data-science-a-spectator-sport/#comments</comments>
		<pubDate>Wed, 12 Sep 2012 16:18:47 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[kaggle]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[predictive modeling]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=561817</guid>
		<description><![CDATA[Data science competition platform Kaggle is opening up the leaderboards for its invitation-only private competitions, meaning anyone can watch and see how the world's best data scientists are faring in these special challenges. Can data science actually become a spectator sport in the analytics community?<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=561817&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Updated: Don&#8217;t worry if you don&#8217;t yet have a favorite data scientist, I don&#8217;t either. But maybe that&#8217;s just because we haven&#8217;t known who to root for.</p>
<p><a href="http://kaggle.com">Kaggle</a> hopes to change that with a twist on its <a href="http://gigaom.com/cloud/kaggle-is-now-crowdsourcing-data-science-creativity/">predictive-modeling competition platform</a> that makes public the competitors in invite-only private competitions. Think of it like watching a major tournament in golf or tennis, where you can watch the best in the world shoot it out to see whose algorithms are king. Kaggle&#8217;s tagline is &#8220;We&#8217;re making data science a sport.&#8221; Maybe now it can make data science a spectator sport.</p>
<div id="attachment_561901" class="wp-caption alignleft" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2012/09/gigaom-kaggle.jpg"><img  title="gigaom kaggle" src="http://gigaom2.files.wordpress.com/2012/09/gigaom-kaggle.jpg?w=300&#038;h=155" alt="" width="300" height="155" class="size-medium wp-image-561901" /></a><p class="wp-caption-text">Top five in the GigaOM/Splunk competition.</p></div>
<p>Actually, Kaggle has been running private competitions &#8212; in which customers&#8217; generally remain anonymous and keep their challenge descriptions vague except for those invited to work on the data &#8212; since about the beginning of 2012. In the past, though, even the competitors remained a mystery. It also posts leaderboards for all public competitions and a cumulative leaderboard. But as any sports fan knows, there&#8217;s nothing quite like watching a tournament where only only the best of the best can play, and where the pressure is on.</p>
<p>Now, says Kaggle Founder and CEO Anthony Goldbloom, private competitions are more like running the U.S. Open in that others can watch the leaderboard and see how the invited data scientists are faring. It&#8217;s primarily a feature so other data scientists on the Kaggle platform can gauge their relative performance and get a little more motivation to step up their game and make it to the invitation-only competitions, but I think it could become geek spectator sport under the right conditions.</p>
<p>If you&#8217;re wondering which U.S. Open he&#8217;s talking about (golf or tennis), don&#8217;t fret &#8212; had Goldbloom been asked whether Kaggle is more like golf or tennis before it launched, even he might have guessed wrong. He&#8217;d probably have guessed tennis, in which certain players excel on certain types of courts, like Roger Federer on the grass court at Wimbledon, or Rafael Nadal on the clay court at the French Open. So, someone who works in biotech might naturally prevail in those competitions, while a natural-language processing specialist might do best in competitions with lots of text to mine.</p>
<p>It turns out Kaggle is more like golf, in which a dominant player like Tiger Woods can win on pretty much any course he plays. Newcomers can still win, especially because there are plenty of good data scientists still making their way to the nascent Kaggle platform, but, Goldbloom says, the really good ones will adapt their skill sets to whatever is necessary for any given competition.</p>
<p>The first private competition open to public viewership began on Wednesday, and is somewhat unique in that the sponsor is willing to share its name and its challenge. It&#8217;s insurance provider Allstate, and it&#8217;s trying to predict customer churn. According to Goldbloom, <del>the prohibitive favorite is <a href="http://www.kaggle.com/users/7052/jason-tigg">Jason Tigg</a>, an Oxford physicist turned hedge fund manager, but</del> Indianapolis actuary <a href="http://www.kaggle.com/users/10694/shea-parkes">Shea Parkes</a> and apparent mystery man Jonathan Peters are names to watch.</p>
<p>There you have it, sports fans. Place your bets accordingly.</p>
<p><em>Note: This story was updated to reflect that Oxford physicist Jason Tigg decided not to take part in the first private competition open to public viewership.</em></p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-625468p1.html">Shutterstock user photofriday</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=561817&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=575490"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=575490" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=561817+can-kaggle-make-data-science-a-spectator-sport&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2011/11/connected-world-the-consumer-technology-revolution/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=561817+can-kaggle-make-data-science-a-spectator-sport&utm_content=dharrisstructure">Connected world: the consumer technology revolution</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=561817+can-kaggle-make-data-science-a-spectator-sport&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/01/12-tech-leaders-resolutions-for-2012/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=561817+can-kaggle-make-data-science-a-spectator-sport&utm_content=dharrisstructure">12 tech leaders’ resolutions for 2012</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/09/12/can-kaggle-make-data-science-a-spectator-sport/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/09/shutterstock_68905108.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/09/shutterstock_68905108.jpg?w=150" medium="image">
			<media:title type="html">Golf tournament</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/09/gigaom-kaggle.jpg?w=300" medium="image">
			<media:title type="html">gigaom kaggle</media:title>
		</media:content>
	</item>
		<item>
		<title>GigaOM Data Challenge: Predict which stories get read, win $10K</title>
		<link>http://gigaom.com/2012/06/20/gigaom-meets-kaggle-predict-wholl-read-what-win-10k/</link>
		<comments>http://gigaom.com/2012/06/20/gigaom-meets-kaggle-predict-wholl-read-what-win-10k/#comments</comments>
		<pubDate>Wed, 20 Jun 2012 16:30:18 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[kaggle]]></category>
		<category><![CDATA[splunk]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=534422</guid>
		<description><![CDATA[In publishing, there's a constant struggle to determine who'll read what posts, what the ideal headline might is and when is the best time to publish. GigaOM is teaming with Splunk to find some answers via a Kaggle competition worth a total of $25,000 in prizes.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=534422&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom2.files.wordpress.com/2012/06/shutterstock_53433448.jpg"><img title="shutterstock_53433448" src="http://gigaom2.files.wordpress.com/2012/06/shutterstock_53433448.jpg?w=300&#038;h=200" alt="" width="300" height="200" class="alignleft size-medium wp-image-534438"></a>In publishing, analytics matter a lot. There’s a constant struggle to determine who will read what posts or articles, what the ideal headline might be and when publishing makes the most sense. That’s why GigaOM is teaming with <a href="http://www.splunk.com/">Splunk</a> to help find that answer.</p>
<p>We’re <a href="https://www.kaggle.com/c/predict-wordpress-likes">hosting a competition</a> on <a href="http://kaggle.com">Kaggle’s data science platform</a> to find the best models around likely readership across the WordPress <em>(see disclosure) </em>ecosystem of blogs. Here are the details:</p>
<blockquote><p>The challenge is to predict whether a particular user will like a particular WordPress blog post.  The data consists of eight weeks of posts collected by WordPress, along with anonymized user responses to each post.  This challenge is an interesting mix of natural language processing (the raw blog posts) and metadata on the blogs and users. Contestants can download the data and submit prediction through the Kaggle platform, but a <strong>new feature for this competition</strong> is that they will also have free access to a Splunk server containing all the data, which they can employ for data exploration, visualization, feature extraction and modeling.</p></blockquote>
<p>Aside from offering resources to work on the data, Splunk is also putting up $25,000 in prize money. The winning model will receive $10,000, second place $5,000, third place $3,000 and fourth place $2,000.</p>
<p>There’s also a $5,000 Splunk Innovation Prize for the most innovative use of data science, whether that comes in the form of a visualization, app, business model, you name it. Submissions for the latter track can be submitted through <a href="http://gigaom.com/cloud/kaggle-is-now-crowdsourcing-data-science-creativity/">Kaggle’s new Prospect platform</a>. Winners for both competitions will be announced at <a href="http://event.gigaom.com/mobilize/?utm_source=cloud&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=534422+gigaom-meets-kaggle-predict-wholl-read-what-win-10k&amp;utm_content=dharrisstructure">GigaOM Mobilize</a> in September.</p>
<p>You can find out <a href="https://www.kaggle.com/c/predict-wordpress-likes">more about the competition here</a>. Good luck!</p>
<p><em><strong>Disclosure:</strong> Automattic, maker of WordPress.com, is backed by True Ventures, a venture capital firm that is an investor in the parent company of this blog, GigaOM. Om Malik, founder of GigaOM, is also a venture partner at True.</em></p>
<p><em>Image courtesy of <a href="http://www.shutterstock.com/gallery-421981p1.html">Shutterstock user sukiyaki</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=534422&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=240962"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=240962" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=534422+gigaom-meets-kaggle-predict-wholl-read-what-win-10k&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=534422+gigaom-meets-kaggle-predict-wholl-read-what-win-10k&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/01/infrastructure-q4-big-data-gets-bigger-and-saas-startups-shine/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=534422+gigaom-meets-kaggle-predict-wholl-read-what-win-10k&utm_content=dharrisstructure">Infrastructure Q4: Big data gets bigger and SaaS startups shine</a></li><li><a href="http://pro.gigaom.com/2012/12/big-data-2013-key-trends-and-companies-to-watch/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=534422+gigaom-meets-kaggle-predict-wholl-read-what-win-10k&utm_content=dharrisstructure">Big data 2013: key trends and companies to watch</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/06/20/gigaom-meets-kaggle-predict-wholl-read-what-win-10k/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/06/shutterstock_53433448.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/06/shutterstock_53433448.jpg?w=150" medium="image">
			<media:title type="html">shutterstock_53433448</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/06/shutterstock_53433448.jpg?w=300" medium="image">
			<media:title type="html">shutterstock_53433448</media:title>
		</media:content>
	</item>
		<item>
		<title>Kaggle is now crowdsourcing big data creativity</title>
		<link>http://gigaom.com/2012/06/06/kaggle-is-now-crowdsourcing-data-science-creativity/</link>
		<comments>http://gigaom.com/2012/06/06/kaggle-is-now-crowdsourcing-data-science-creativity/#comments</comments>
		<pubDate>Wed, 06 Jun 2012 12:00:43 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[crowdsourcing]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[kaggle]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[predictive analytics]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=529313</guid>
		<description><![CDATA[In a move to expand its utility beyond simply finding better answers to known statistical problems, hot data-science startup Kaggle is now letting its stable of expert data scientists compete to tell companies how they can improve their businesses using machine learning.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=529313&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom2.files.wordpress.com/2012/06/data-brain.jpg"><img  title="data brain" src="http://gigaom2.files.wordpress.com/2012/06/data-brain-e1338974487390.jpg?w=300&#038;h=227" alt="" width="300" height="227" class="alignright size-medium wp-image-529345" /></a>In a move to expand its utility beyond simply <a href="http://gigaom.com/2012/03/21/prediction-competitions-adding-the-human-touch-to-big-data-problems-structure-data-2012/">finding better answers to known statistical problems</a>, hot data-science startup <a href="http://gigaom.com/2011/11/03/kaggle-funding-max-levchin/">Kaggle</a> is now letting its stable of expert data scientists compete to tell companies how they can improve their businesses with machine learning. It&#8217;s part of a natural evolution of Kaggle from a plucky startup to an IT company with legs, but it&#8217;s actually more like a prequel to Kaggle&#8217;s flagship predictive modeling competitions than it is a sequel.</p>
<p>What Kaggle has come to realize, <del>Co-Founder</del> President and Chief Scientist Jeremy Howard told me, is that very few people in the world know more about machine learning than do the roughly 40,000 individuals who compete in Kaggle&#8217;s competitions. At the same time, the big data movement is putting <a href="http://gigaom.com/cloud/is-machine-learning-coming-to-a-system-near-you/">the relatively complex but rewarding practice of machine learning</a> on the radar of more companies, and they want to know how to leverage it for their businesses.</p>
<p>But without the right knowledge, Howard said, companies new to machine learning will likely either under-reach and lose out on possible benefits or overreach and risk overwhelming themselves. Who else better to educate them than the world&#8217;s largest (and, presumably, best) online data scientist community?</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/06/prospect-screenshot.jpg"><img  title="prospect-screenshot" src="http://gigaom2.files.wordpress.com/2012/06/prospect-screenshot.jpg?w=300&#038;h=281" alt="" width="300" height="281" class="size-medium wp-image-529344 alignleft" /></a>The way <a href="http://www.kaggle.com/prospect">Prospect</a> works is rather simple: Customers submit a sample of the data they want to better analyze, and competitors play around with the data and propose ideas on how to best use it. When the competition deadline comes, a winner is selected by some combination of community voting and customer selection. Customers pay their fee to Kaggle and the reward to the winner, and are free to do what they want with the advice.</p>
<p>&#8220;The worst case scenario is you spend a reasonably small amount of money and decide this isn&#8217;t a product for you,&#8221; Howard said, although he suspects most companies will end up wanting to take the next step and actually develop and implement models.</p>
<p>Inaugural Prospect customer <a href="http://www.practicefusion.com/">Practice Fusion</a>, at least, plans to do just that. The electronic health records provider is offering a $500 prize for the best idea on how to use its data, and then will conduct a $10,000 competition to actually develop a model based on that idea.</p>
<p>Howard said this process is part of the &#8220;analytics value chain&#8221; that Kaggle hopes to create as its own business model matures. Companies can use the platform to figure out how to best use their data, then build and train their models, figure out how to implement them in production and at scale, and finally maintain and improve them. &#8220;You have to get everything right in order to actually leverage your data,&#8221; he said, and Kaggle wants to help at every step.</p>
<p>Kaggle thinks it can become the go-to source for anyone needing help with machine-learning because its approach forces the platform to prove its worth. Fairly or not, Howard characterizes traditional analytics consulting firms as &#8220;having quite a lot of snake oil salesmen&#8221; that often don&#8217;t &#8220;have anything to back up their pretty pictures and words.&#8221; On the contrary, he says Kaggle&#8217;s competition approach &#8220;is by far the best method in the world for actually building and training [predictive] models.&#8221;</p>
<p>He acknowledges, however, that although he thinks the approach will continue to be the best one as Kaggle grows out its value chain, it might not. But Kaggle is a data science startup, he said, so it&#8217;s not about to act solely on his intuition: &#8220;We have to test these hypotheses.&#8221;</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-803866p1.html">Shutterstock user VLADGRIN</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=529313&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=610328"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=610328" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=529313+kaggle-is-now-crowdsourcing-data-science-creativity&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=529313+kaggle-is-now-crowdsourcing-data-science-creativity&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=529313+kaggle-is-now-crowdsourcing-data-science-creativity&utm_content=dharrisstructure">Takeaways from the second quarter in cloud and data</a></li><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=529313+kaggle-is-now-crowdsourcing-data-science-creativity&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/06/06/kaggle-is-now-crowdsourcing-data-science-creativity/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/06/data-brain-e1338974487390.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/06/data-brain-e1338974487390.jpg?w=150" medium="image">
			<media:title type="html">data brain</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/06/data-brain-e1338974487390.jpg?w=300" medium="image">
			<media:title type="html">data brain</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/06/prospect-screenshot.jpg?w=300" medium="image">
			<media:title type="html">prospect-screenshot</media:title>
		</media:content>
	</item>
		<item>
		<title>How big data predicted Eurovision — and offended Malta</title>
		<link>http://gigaom.com/2012/05/28/big-data-eurovision/</link>
		<comments>http://gigaom.com/2012/05/28/big-data-eurovision/#comments</comments>
		<pubDate>Mon, 28 May 2012 11:02:44 +0000</pubDate>
		<dc:creator>Bobbie Johnson</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[Europe]]></category>
		<category><![CDATA[Eurovision]]></category>
		<category><![CDATA[Eurovision Song Contest]]></category>
		<category><![CDATA[kaggle]]></category>
		<category><![CDATA[Martin O'Leary]]></category>
		<category><![CDATA[Sweden]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=526217</guid>
		<description><![CDATA[When a data scientist crunched enough numbers to predict that Sweden would win this weekend's Eurovision Song Contest, he felt fairly confident. But he didn't expect that the biggest noise would be the inaccurate prediction that Malta would do well -- something he's now apologized for.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=526217&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom2.files.wordpress.com/2012/05/loreen-screengrab.jpg"><img src="http://gigaom2.files.wordpress.com/2012/05/loreen-screengrab.jpg?w=300&#038;h=200" alt="" title="loreen screengrab" width="300" height="200"  class="alignright size-medium wp-image-526220" /></a>On Saturday night millions of people all over Europe &#8212; all over the world, in fact &#8212; tuned in to the annual <a href="http://en.wikipedia.org/wiki/Eurovision_Song_Contest">Eurovision Song Contest</a>, a cheesy televisual explosion that many worship as a festival of camp, cross-border craziness. In a three-and-a-half hour live broadcast from Azerbaijan (yes, really), viewers from 42 countries listened to 26 songs and voted for which one they liked the most.</p>
<p>Amid all that, you might think that guessing the winner would be hard. But one man predicted the result… sort of.</p>
<p>Meet <a href="http://aoss.engin.umich.edu/people/olearym">Martin O&#8217;Leary</a>, a glaciologist and data nerd who works at the University of Michigan. O&#8217;Leary, who calls himself a &#8220;recovering mathematician,&#8221; decided to use statistical analysis on Eurovision to try and <a href="http://mewo2.github.com/nerdery/2012/05/24/eurovision-statistics-final-predictions/">understand which country would win</a>:</p>
<blockquote><p>Sweden’s going to win, unless it’s Malta, or maybe somebody else. If you average together the taste in pop music of all of Europe, you get a Hungarian. Don’t trust the scores on Saturday night, they’re just toying with your emotions.</p></blockquote>
<p>And guess what? Sweden won!</p>
<p>You can read O&#8217;Leary&#8217;s <a href="http://mewo2.github.com/nerdery/2012/05/20/ive-got-eurosong-fever-ted/">entire</a> <a href="http://mewo2.github.com/nerdery/2012/05/23/eurovision-statistics-post-semifinal-update/">series</a> <a href="http://mewo2.github.com/nerdery/2012/05/24/eurovision-statistics-final-predictions/">of</a> <a href="http://mewo2.github.com/nerdery/2012/05/27/eurovision-statistics-after-the-final/">posts</a> to understand how he arrived at that conclusion, but here&#8217;s the quick version.</p>
<p>He did it by taking performing a Bayesian analysis on a wide range of previous Eurovision results, taking into account a few important factors with his model. First, the recognition that while Eurovision is a song competition, the results are not really based on the quality of song &#8212; although it can play a part. Then there&#8217;s the fact that there are semi-finals (held to whittle the number of contestants down) that allow some songs to be tested in public. </p>
<p><a href="http://gigaom2.files.wordpress.com/2012/05/martinoleary.jpg"><img src="http://gigaom2.files.wordpress.com/2012/05/martinoleary.jpg?w=300&#038;h=200" alt="" title="martin o&#039;leary" width="300" height="200"  class="alignleft size-medium wp-image-526221" /></a>And then, most importantly, there&#8217;s the recognition that Eurovision is heavily influenced by transnational politics: which countries like which other countries plays a <em>big</em> part in voting. Entrants from the Balkans, for example, tend to trade votes with each other. Greece nearly always awards maximum points to Cyprus and vice versa. Big European powers like the U.K, France and Germany perform less well than smaller countries with lots of positive sentiment toward them.</p>
<p>But while O&#8217;Leary&#8217;s number-crunching enabled him to predict Sweden&#8217;s victory &#8212; and claim a victory for data modeling &#8212; it wasn&#8217;t infallible.</p>
<p>In particular, his prediction that Malta would be in the mixed seems to have caused some consternation. His guess was so exciting to the Maltese that <a href="http://www.timesofmalta.com/articles/view/20120526/local/sweden-tops-eurovision-predictions-but-is-malta-second.421407">it even made the newspapers</a>, but in the end the country&#8217;s entry came in a measly 21st out of 26.</p>
<p>This was clearly upsetting to the Maltese, so he issued an apology:</p>
<blockquote><p>This prediction caused quite a stir in Malta, with a story in the Times of Malta and over 16,000 pageviews from Malta1 on Saturday alone. Many took this as good evidence that Malta were going to do well in the contest, and some people were rather annoyed with me when they did not.</p>
<p>I’d like to apologise if I misled anyone. I didn’t expect anyone to take the model predictions particularly seriously, and if I had known, I would have included some more caveats and explanations of exactly what the model was predicting. Instead, I was fairly loose and jokey about the model results, and didn’t really talk about what they meant in real terms. Sorry, guys.</p></blockquote>
<p>But will they forgive him?</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=526217&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=95965"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=95965" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=europe&utm_medium=editorial&utm_campaign=auto3&utm_term=526217+big-data-eurovision&utm_content=bobbiejohnson">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2011/11/dissecting-the-data-5-issues-for-our-digital-future/?utm_source=europe&utm_medium=editorial&utm_campaign=auto3&utm_term=526217+big-data-eurovision&utm_content=bobbiejohnson">Dissecting the data: 5 issues for our digital future</a></li><li><a href="http://pro.gigaom.com/2012/10/helix-nebula-and-the-future-of-europes-cloud/?utm_source=europe&utm_medium=editorial&utm_campaign=auto3&utm_term=526217+big-data-eurovision&utm_content=bobbiejohnson">Helix Nebula and the future of Europe&#8217;s cloud</a></li><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=europe&utm_medium=editorial&utm_campaign=auto3&utm_term=526217+big-data-eurovision&utm_content=bobbiejohnson">The importance of putting the U and I in visualization</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/05/28/big-data-eurovision/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/05/martinoleary.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/05/martinoleary.jpg?w=150" medium="image">
			<media:title type="html">martin o&#039;leary</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/6e5c23eccd5022fef0059f01c98c2ea4?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">bobbiejohnson</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/05/loreen-screengrab.jpg?w=300" medium="image">
			<media:title type="html">loreen screengrab</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/05/martinoleary.jpg?w=300" medium="image">
			<media:title type="html">martin o&#039;leary</media:title>
		</media:content>
	</item>
	</channel>
</rss>
