<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>GigaOM &#187; machine-learning</title>
	<atom:link href="http://gigaom.com/tag/machine-learning/feed/" rel="self" type="application/rss+xml" />
	<link>http://gigaom.com</link>
	<description></description>
	<lastBuildDate>Wed, 22 May 2013 00:19:23 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='gigaom.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/0db8f6557d022075dbbf010c54d46d93?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>GigaOM &#187; machine-learning</title>
		<link>http://gigaom.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://gigaom.com/osd.xml" title="GigaOM" />
	<atom:link rel='hub' href='http://gigaom.com/?pushpress=hub'/>
		<item>
		<title>Google, NASA quantum computing project could bring stronger machine learning to the masses</title>
		<link>http://gigaom.com/2013/05/16/google-nasa-quantum-computing-project-could-bring-stronger-machine-learning-to-the-masses/</link>
		<comments>http://gigaom.com/2013/05/16/google-nasa-quantum-computing-project-could-bring-stronger-machine-learning-to-the-masses/#comments</comments>
		<pubDate>Thu, 16 May 2013 16:51:07 +0000</pubDate>
		<dc:creator>Jordan Novet</dc:creator>
				<category><![CDATA[Google]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[NASA]]></category>
		<category><![CDATA[quantum computing]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=646142</guid>
		<description><![CDATA[Google said Thursday it is establishing a Quantum Artificial Intelligence Lab to trigger the next phase of machine learning with the power of quantum computers. The efforts could trickle down to ordinary people.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=646142&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>It&#8217;s been almost two decades since Peter Shor came up with a <a href="http://en.wikipedia.org/wiki/Shor's_algorithm">a breakthrough algorithm</a> for finding the prime factors of a number with a quantum computer, sparking great interest in quantum computing. But commercial adoption has been pretty much nonexistent. On Thursday, though, Google came forward with news that it&#8217;s launching a Quantum Artificial Intelligence Lab that will include a quantum computer, apparently making it the second company to pay for a quantum computer. The development suggests that quantum computing could finally be taking off.</p>
<p>Earlier this year Lockheed Martin <a href="http://gigaom.com/2013/03/22/lockheed-martin-wants-to-use-a-quantum-computer-to-develop-radar-aircraft-systems/">shared details</a> of its implementation of a D-Wave Systems quantum computer, which reportedly cost $10 million: The contractor is using the computer to develop new aircraft, radar and space systems.</p>
<p>Now Google is taking steps at incorporating more quantum computing into its operations with the Quantum Artificial Intelligence Lab, which will be located at the NASA Ames Research Center in Moffett Field, Calif. Researchers from the Universities Space Research Association will be able to use the machine 20 percent of the time, Forbes <a href="http://www.forbes.com/sites/alexknapp/2013/05/16/nasa-and-google-partner-to-purchase-a-d-wave-quantum-computer/">reports</a>. That could lead to lots of interdisciplinary thinking and collaboration.</p>
<p>For Google, though, the goal of the initiative is to make strides in machine learning, according to a Thursday Google Research <a href="http://googleresearch.blogspot.com/2013/05/launching-quantum-artificial.html">blog post</a>. The best results could trickle down to end users, perhaps in search results and speech-recognition applications.</p>
<h2 id="quantum-computing-could-mean-s">Quantum computing could mean smarter smartphones</h2>
<p>Google has already assembled machine-learning algorithms that involve quantum elements, Hartmut Neven, a Google director of engineering, explained in the post:</p>
<blockquote id="quote-one-produces-very-co"><p>One produces very compact, efficient recognizers &#8212; very useful when you&#8217;re short on power, as on a mobile device. Another can handle highly polluted training data, where a high percentage of the examples are mislabeled, as they often are in the real world.
</p></blockquote>
<p>It&#8217;s not hard to imagine how quantum computing could inform machine learning on a smartphone with just a drop of battery life left. It could be that a smarter smartphone one day will take a minuscule amount of input and determine with a high probability who a user wants to talk to or what information it needs right away, rather than forcing the user to cycle through a string of commands and risking the death of the battery altogether.</p>
<p>The applications might have arisen after Google&#8217;s earlier partnership with D-Wave, which came to light in a <a href="http://googleresearch.blogspot.com/2009/12/machine-learning-with-quantum.html">different blog post</a> from Neven in 2009. </p>
<p>Google has already used machine learning to <a href="http://gigaom.com/2012/06/25/how-google-is-teaching-computers-to-see/">recognize faces and other things</a> in photos and videos. New technology Google executives talked about at the <a href="http://gigaom.com/2013/05/14/google-io-2013-roundup/">Google I/O developer conference</a> in San Francisco on Wednesday also appears to use machine learning to stitch together photos and clean them up.</p>
<p>What Google has learned so far is the best results come from blending regular binary computing using ones and zeros with quantum style computing. Quantum computing accommodates the space between a one and a zero with quantum bits of information, or qubits. It can express likelihood as well as take shortcuts by approximating when handling certain kinds of workloads. Given what Google has observed thus far, it could decide to build hardware combining quantum and classical computing capabilities.</p>
<p>For now, though, Google is diving deeper into quantum computing with the D-Wave machine. The move could kick off a sort of arms race for webscale companies to buy quantum computers and come up with new notions by way of probabilistic logic. In this way, Google could help push the development of quantum computing much like its invention of MapReduce changed the way firms do distributed data processing.</p>
<p>In any case, quantum computing has a long way to go before reaching commercial viability. That could take decades (so far it has). But because the organization at the helm of the quantum research is Google and not IBM or Bell Labs, regular people could start seeing much more of the advantages in just a few years&#8217; time, which in turn could drive commercialization.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-608548p1.html">Shutterstock user pixeldreams.eu</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=646142&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=839936"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=839936" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=646142+google-nasa-quantum-computing-project-could-bring-stronger-machine-learning-to-the-masses&utm_content=gigajordan">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2011/11/connected-world-the-consumer-technology-revolution/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=646142+google-nasa-quantum-computing-project-could-bring-stronger-machine-learning-to-the-masses&utm_content=gigajordan">Connected world: the consumer technology revolution</a></li><li><a href="http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=646142+google-nasa-quantum-computing-project-could-bring-stronger-machine-learning-to-the-masses&utm_content=gigajordan">Takeaways from the second quarter in cloud and data</a></li><li><a href="http://pro.gigaom.com/2011/08/gigaom-euro-20-the-european-startups-to-watch/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=646142+google-nasa-quantum-computing-project-could-bring-stronger-machine-learning-to-the-masses&utm_content=gigajordan">GigaOM Euro 20: the European startups to watch</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/16/google-nasa-quantum-computing-project-could-bring-stronger-machine-learning-to-the-masses/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/01/shutterstock_72722758.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/01/shutterstock_72722758.jpg?w=150" medium="image">
			<media:title type="html">brain and gears</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/c00ab753df107b639e76ed4c3ab07ba7?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigajordan</media:title>
		</media:content>
	</item>
		<item>
		<title>This is why big data is the sweet spot for SaaS</title>
		<link>http://gigaom.com/2013/05/14/this-is-why-big-data-is-the-sweet-spot-for-saas/</link>
		<comments>http://gigaom.com/2013/05/14/this-is-why-big-data-is-the-sweet-spot-for-saas/#comments</comments>
		<pubDate>Wed, 15 May 2013 01:10:22 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[BloomReach]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[marketing]]></category>
		<category><![CDATA[natural language processing]]></category>
		<category><![CDATA[saas]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=645189</guid>
		<description><![CDATA[When it comes to using big data technology effectively, there's a lot to like about SaaS. When companies like BloomReach create and analyze massive web-wide data sets, they automate insights that almost no individual company could discover on its own.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=645189&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>People often ask me where the smart money is in big data. I often tell them that’s a foolish question, because I’m not an investor — but if I were, I’d look to software as a service.</p>
<p>There are two primary reasons why, the first of which is obvious: Companies are tired of managing applications and infrastructure, so something that optimizes a common task using techniques they don’t know on servers they don’t have to manage is probably compelling. It’s called cloud computing.</p>
<p>The other reason is that <a href="http://gigaom.com/2013/04/29/google-research-director-and-ai-expert-peter-norvig-elected-into-aaas/">the <em>big </em>part of big data really is important</a> if you want to get a really clear picture of what’s happening in any given space. While no single end-user company can (or likely would) address search-engine optimization, for example, by building a massive store comprised of data from hundreds or thousands of companies as well as the entire web, a cloud service dedicated to that specific task can.</p>
<p>From <a href="http://gigaom.com/2012/11/28/log-data-startup-sumo-logic-raises-30m/">web security</a> to <a href="http://gigaom.com/2012/06/21/how-collective-intelligence-is-reshaping-systems-management/">systems management</a>, we’re already seeing how centralized data stores provide SaaS companies a broad view into what’s happening that can then be filtered down to serve each individual customer’s specific situation. <a href="http://www.bloomreach.com/">BloomReach</a>, a SaaS startup that helps companies optimize web-page content, is another good example of this principle in action.</p>
<h2 id="how-do-you-say-cotton-maxi-dre">How do <em>you</em> say, “cotton maxi dress”</h2>
<p>Ideally, BloomReach Head of Marketing Joelle Kaufman told me, the company wants to help customers ensure they get found in web searches by making sure they’re not invisible (buried deep down), irrelevant (not saying anything meaningful on their sites) or incompatible (not speaking their consumers’ language). On Tuesday, the company <a href="http://www.bloomreach.com/buzz/media-center-pr/continuous-quality-management/">announced a new feature called Continuous Quality Management</a>, which lets customers continuously monitor their pages to ensure they’re still featuring the right products and the right terminology. It’s the latest addition to a seemingly useful service that’s built atop a big data foundation few — if any — of its customers would ever attempt to build themselves.</p>
<p>BloomReach is able to help companies optimize their sites because it’s constantly crawling the web in order to figure out how everyone else is describing their content, laying out their pages and structuring their links. Running on the Amazon Web Services cloud, BloomReach runs more than 1,000 Hadoop jobs a day that process about 5 terabytes of data and a billion data points about users’ site behavior. With the latter, co-founder and CTO Ashutosh Garg explained, the company is trying to figure out who’s visiting sites, what they’re doing, how long they’re spending there and how they’re related in terms of behavior.</p>
<p>“You need to have the right amount of data and from the right places before we can do anything with it,” he said. “… It’s a massive machine learning problem.”</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/05/br-stack.png"><img alt="BR stack" src="http://gigaom2.files.wordpress.com/2013/05/br-stack.png?w=708&#038;h=531" width="708" height="531" class="aligncenter size-large wp-image-645359"></a></p>
<p>When you consider all the possible ways something could be described or formatted, the scale of the problem becomes more evident. Simple semantic analysis like associating “desk” and “table” is easy, Garg explained, but what if some wants a lightweight camera and you only have its exact weight listed without any indication of how it compares to other options? What if people searching for “smartphones” really mean “Android phones,” but you’re top-loading your results with BlackBerry phones and Windows phones?</p>
<p>Another of Garg’s hypotheticals has to do with consumers’ presentation biases. If, for example, they’re looking at a lot of websites that look the same or focus on the same things (e.g., megapixels for digital cameras), they’ll expect to see the same things from every site.</p>
<h2 id="10-nonillion-possibilities-cho">10 nonillion possibilities: Choose 1.</h2>
<p>From a sheer numbers perspective, things get even hairier when you’re trying to determine the relationship between any two pages in order to figure out the best path for links to to take. Garg said this is what computer scientists call an <a href="http://en.wikipedia.org/wiki/NP-complete">NP-complete problem</a>, which means the amount of time it takes to process the results is exponentially greater than the amount of content you’re analyzing. So, for example, analyzing 40 pages doesn’t take 10 times as long as analyzing 4 pages, but more like 100 times longer.</p>
<p>Actually, BloomReach CEO Raj De Datta gave me another example of this problem <a href="http://gigaom.com/2012/02/22/bloomreach-wants-to-save-your-site-with-big-data/">when we spoke in early 2012</a>. Here’s how I described it then:</p>
<blockquote id="quote-if-a-company-wants-t"><p>[I]f a company wants to display just 1,000 products across 100 pages, De Datta explained, there are 10-to-the-28th-power (10 octillion) possibilities for how to do that. When it comes time to describe those products, there are 10-to-the-30th-power (10 nonillion) possibilities.</p></blockquote>
<p>If a website has a million pages, Garg said, “it will take you longer than the life of the universe to solve that problem.”</p>
<p>Where this type of problem arises, BloomReach turns to <a href="http://en.wikipedia.org/wiki/Monte_Carlo_method">Monte Carlo simluations</a>, a favorite technique of physicists and Wall Street quants. The method involves running lots of simulations over large data sets in order to determine approximate results in a reasonable time frame. (And if all this isn’t enough computer science and cloud infrastructure for you, I suggest attending our <a href="http://event.gigaom.com/structure/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=645189+this-is-why-big-data-is-the-sweet-spot-for-saas&amp;utm_content=dharrisstructure">Structure conference</a> in June, which features a who’s who list of speakers, including Google’s Jeff Dean, Facebook’s Jay Parikh and Netflix’s Adrian Cockroft.)</p>
<h2 id="different-queries-different-pa">Different queries, different pages</h2>
<p>Things get even trickier when you’re trying to change the content of web pages in real time as people are searching for things. This isn’t the best method for organic search, where pages need to stay pretty consistent with the indexed versions, but it can be ideal in situations such as paid search and mobile. There are millions of ways to segment buyers, Garg explained, and how accurately you assess their intent and display your content can make the all the difference. Whether someone is a new or repeat visitor often matters, as does whether someone is price-conscious (e.g., the query included “cheap”) or perhaps searching for a particular brand.</p>
<div id="attachment_645358" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/05/llbean.png"><img alt="Source: BloomReach" src="http://gigaom2.files.wordpress.com/2013/05/llbean.png?w=708&#038;h=531" width="708" height="531" class="size-large wp-image-645358"></a><p class="wp-caption-text">Source: BloomReach</p></div>
<p>Around the holidays, the company actually realized something interesting: The bounce rate on queries for things like “gifts for dad” or “gifts for co-workers” was pretty high, but so was the conversion rate. The time to conversion was relatively fast, as well. It turns out, Garg explained, that people don’t like to overthink certain gifts too much, so if something is presented in a visually appealing manner and is within their price range, they’ll buy.</p>
<p>But creating these types of models involves more than meets the eye. For all the talk about machine learning — and machines do a majority of the work for BloomReach — people also play a critical role. A person might know better than a machine whether something was likely purchased as gift, Garg explained, or they might spot the offensive content on the T-shirt the machine decided was ideal.</p>
<p>“Humans are really good at creativity, thinking through stuff,” he said.</p>
<p>Smart humans are also good at knowing when they’re overmatched, which is why SaaS is so valuable in the big data era. CMOs could try doing what BloomReach or <a href="http://gigaom.com/2012/04/24/datapop-scores-7m-for-custom-built-ads/">similar companies such as DataPop</a> are doing, or they could pay someone to do it much better. Guess which route the smart ones will take.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-54269p1.html">Shutterstock user Andrea Danti</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=645189&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=539332"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=539332" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=645189+this-is-why-big-data-is-the-sweet-spot-for-saas&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=645189+this-is-why-big-data-is-the-sweet-spot-for-saas&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/06/cloud-computing-infrastructure-2012-and-beyond/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=645189+this-is-why-big-data-is-the-sweet-spot-for-saas&utm_content=dharrisstructure">Cloud computing infrastructure: 2012 and beyond</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=645189+this-is-why-big-data-is-the-sweet-spot-for-saas&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/14/this-is-why-big-data-is-the-sweet-spot-for-saas/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/05/shutterstock_119782672.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/05/shutterstock_119782672.jpg?w=150" medium="image">
			<media:title type="html">collective intelligence</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/br-stack.png?w=708" medium="image">
			<media:title type="html">BR stack</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/llbean.png?w=708" medium="image">
			<media:title type="html">Source: BloomReach</media:title>
		</media:content>
	</item>
		<item>
		<title>We&#8217;re witnessing the rise of the graph in big data</title>
		<link>http://gigaom.com/2013/05/14/were-witnessing-the-rise-of-the-graph-in-big-data/</link>
		<comments>http://gigaom.com/2013/05/14/were-witnessing-the-rise-of-the-graph-in-big-data/#comments</comments>
		<pubDate>Tue, 14 May 2013 14:33:33 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[graph analysis]]></category>
		<category><![CDATA[graph database]]></category>
		<category><![CDATA[GraphLab]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=645059</guid>
		<description><![CDATA[Graph databases and graph-processing applications have been popping up all over the place lately, and now they're starting to go commercial. On Tuesday, popular open source project GraphLab joined the ranks of graph startups.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=645059&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>GraphLab, a popular <a href="http://graphlab.org/">open source project</a> dedicated to graph analysis and machine learning, is trying to capitalize on the excitement around graphs by spinning off a commercial entity, <a href="http://graphlab.com/">GraphLab Inc.</a> GraphLab creator &#8212; and University of Washington machine learning professor &#8212; Carlos Guestrin will lead the new Seattle-based company, which has raised $6.75 million from Madrona Venture Group and NEA.</p>
<p>Graph analysis is among the hottest techniques around for making sense of large datasets, primarily by determining how tightly different data points are related or how similar they are. The term &#8220;graph&#8221; came into the broader lexicon along with social networks, which built social graphs to <a href="http://gigaom.com/2013/03/14/facebook-tweaks-its-algorithms-to-improve-graph-search-comment-search-coming/">assess the relationships among their millions of users</a>, but the technique has much broader uses.</p>
<div id="attachment_645089" class="wp-caption aligncenter" style="width: 677px"><a href="http://gigaom2.files.wordpress.com/2013/05/lnkdmap-1.jpg"><img  alt="My LinkedIn social graph" src="http://gigaom2.files.wordpress.com/2013/05/lnkdmap-1.jpg?w=708"   class="size-full wp-image-645089" /></a><p class="wp-caption-text">My LinkedIn social graph</p></div>
<p>Guestrin said GraphLab&#8217;s algorithms are used in a lot of recommender systems, but he also cites fraud detection in banking networks and intrusion detection in computer networks as potential applications. We&#8217;ve covered graphs as the analytical model of choice for everything <a href="http://gigaom.com/2013/04/22/how-hbase-converted-myspaces-mysql-champion-and-is-driving-hadoop-mainstream/">from content recommendation</a> to <a href="http://gigaom.com/2013/01/22/biotech-startup-syapse-wants-to-be-salesforce-com-for-our-genomes/">tracking lab work in genomics</a>. Really, though &#8212; especially when combined with machine learning &#8212; graph analysis <a href="http://gigaom.com/2013/01/16/has-ayasdi-turned-machine-learning-into-a-magic-bullet/">can be applied to anything</a> where there&#8217;s too much data for a person to possibly analyze the relationships between every point.</p>
<div id="attachment_601469" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/01/ayasdi-product-image-2-e1358295341371.jpg"><img  alt="One of Ayasdi's graph-like data maps" src="http://gigaom2.files.wordpress.com/2013/01/ayasdi-product-image-2-e1358295341371.jpg?w=708&#038;h=472" width="708" height="472" class="size-large wp-image-601469" /></a><p class="wp-caption-text">One of Ayasdi&#8217;s graph-like data maps</p></div>
<p>Google also famously uses <a href="http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html">a graph-processing system called Pregel</a> as part of PageRank. Although a number of graph databases and other projects have popped up in the past few years, Guestrin said GraphLab is actually a contemporary of Pregel. He and some colleagues at Carnegie Mellon built a small system for their lab about five years ago, then released it into the open-source world with few expectations that it would catch on. Now, he added, Pandora and WalmartLabs are among the project&#8217;s user base.</p>
<p>Among those other projects are graph databases such as <a href="http://giraph.apache.org/">Giraph</a> (an open source, Hadoop-based Pregel clone developed at Facebook) and <a href="http://www.neo4j.org/">Neo4j</a> (which also has a commercial arm, <a href="http://gigaom.com/2012/11/02/graph-startup-neo-raises-11m-as-specialized-databases-take-hold/">called Neo Technology</a>), as well as <a href="http://engineering.twitter.com/2012/03/cassovary-big-graph-processing-library.html">Twitter&#8217;s Cassovary</a> and fellow University of Washington project <a href="http://www.cs.washington.edu/node/4217/">Grappa</a>. Guestrin said GraphLab can work with most of them, particularly if they&#8217;re not designed to do machine learning at scale like GraphLab is. Some efforts, he noted, are focused on simply storing data in graph form (e.g., databases) or in providing simple graph analysis.</p>
<p>As for when we&#8217;ll actually see the results of the effort to commercialize GraphLab, Guestrin said it will be a while. Right now, he&#8217;s focused on the next open source release of GraphLab in July. However, the company will begin engaging with commercial users over the next several months to determine what types of features they would expect in commercial graph-analysis software.</p>
<p>The bigger question to come out of all this graph activity, though, is how big a market we&#8217;ll ultimately see for graph-analysis or any other specific technique. As companies get more comfortable with big data from a technical standpoint, they&#8217;re getting more interested in the different types of analysis it allows for too. This is evidenced by the <a href="http://gigaom.com/2013/03/07/5-reasons-why-the-future-of-hadoop-is-real-time-relatively-speaking/">quest to make Hadoop support myriad processing frameworks</a> aside from MapReduce.</p>
<p>We already have a handful of commercial graph products on the market &#8212; including an industrial grade one called <a href="http://www.yarcdata.com/">YarcData</a> from supercomputer maker Cray &#8212; but how many will there eventually be? And if graph analysis is all the rage right now, what comes next?</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=645059&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=947297"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=947297" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=645059+were-witnessing-the-rise-of-the-graph-in-big-data&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=645059+were-witnessing-the-rise-of-the-graph-in-big-data&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/01/12-tech-leaders-resolutions-for-2012/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=645059+were-witnessing-the-rise-of-the-graph-in-big-data&utm_content=dharrisstructure">12 tech leaders’ resolutions for 2012</a></li><li><a href="http://pro.gigaom.com/2011/11/connected-world-the-consumer-technology-revolution/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=645059+were-witnessing-the-rise-of-the-graph-in-big-data&utm_content=dharrisstructure">Connected world: the consumer technology revolution</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/14/were-witnessing-the-rise-of-the-graph-in-big-data/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/05/graphics2-3_final_cartoon.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/05/graphics2-3_final_cartoon.jpg?w=150" medium="image">
			<media:title type="html">graphics2-3_final_cartoon</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/lnkdmap-1.jpg" medium="image">
			<media:title type="html">My LinkedIn social graph</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/01/ayasdi-product-image-2-e1358295341371.jpg?w=708" medium="image">
			<media:title type="html">One of Ayasdi&#039;s graph-like data maps</media:title>
		</media:content>
	</item>
		<item>
		<title>How MailChimp learned to treat data like orange juice and rethink email in the process</title>
		<link>http://gigaom.com/2013/05/05/how-mailchimp-learned-to-treat-data-like-orange-juice-and-rethink-email-in-the-process/</link>
		<comments>http://gigaom.com/2013/05/05/how-mailchimp-learned-to-treat-data-like-orange-juice-and-rethink-email-in-the-process/#comments</comments>
		<pubDate>Sun, 05 May 2013 23:09:53 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[artificial intelligence]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[clustering]]></category>
		<category><![CDATA[email marketing]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[MailChimp]]></category>
		<category><![CDATA[natural language processing]]></category>
		<category><![CDATA[predictive models]]></category>
		<category><![CDATA[semantic analysis]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=642316</guid>
		<description><![CDATA[MailChimp wasn't always a big data company, but 12 years into its existence the company is using its mountains of email data to do everything from modeling spam to connecting subscribers.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=642316&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>MailChimp Chief Data Scientist John Foreman likes to talk about orange juice. On the surface, it&#8217;s a strange way to start a discussion about data, but it all starts to make sense when you peel back the rind. It&#8217;s a way of thinking that&#8217;s letting MailChimp &#8212; which sends about 35 billion emails a year on behalf of roughly 3 million users &#8212; transform itself into a data-driven business 12 years into its existence.</p>
<p>When you&#8217;re in Atlanta, as I was during a recent trip, the obvious place to start talking about orange juice and data is with Coca-Cola. Foreman can tell you all about how the beverage giant &#8212; whose headquarters tower over the city just a just a mile away from MailChimp&#8217;s office &#8212; <a href="http://www.businessweek.com/articles/2013-01-31/coke-engineers-its-orange-juice-with-an-algorithm">uses advanced algorithms and giant vats of different juices</a> to ensure the proper flavor of its Simply Orange line of orange juice. However, it&#8217;s something else Coca-Cola is doing that inspired the way Foreman thinks about data and that&#8217;s helping MailChimp re-imagine what it means to engage with fans, readers and customer through their inboxes.</p>
<p>Anyone familiar with how large web companies <a href="http://gigaom.com/2013/03/04/the-history-of-hadoop-from-4-nodes-to-the-future-of-data/">came to pioneer the practice of what we now call &#8220;big data&#8221;</a> should appreciate the analogy. Coca-Cola, which also owns Minute Maid, produces a lot of excess pulp when it makes orange juice. For decades, presumably, it had just been throwing that pulp away, but in 2006 it decided to make use of it by launching a new product called Minute Maid Pulpy. Sold primarily in Asian countries, Pulpy <a href="http://www.ajc.com/news/business/coca-colas-minute-maid-pulpy-reaches-1-billion-in-/nQqFM/">has become a billion-dollar business</a> for Coca-Cola.</p>
<p>Once MailChimp is done with its primary business of sending emails, it has a lot of pulp of its own in the form of data. And rather than just ignoring it or writing up some cute blog posts (<a href="http://blog.mailchimp.com/author/jforeman/">which he also does</a>), Foreman and his bosses want to turn that data into revenue.</p>
<h2 id="first-things-first-making-bett">First things first: Making better orange juice</h2>
<div id="attachment_642357" class="wp-caption alignleft" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/05/20130424_121443.jpg"><img  alt="Neil Bainton" src="http://gigaom2.files.wordpress.com/2013/05/20130424_121443-e1367793432461.jpg?w=300&#038;h=200" width="300" height="200" class="size-medium wp-image-642357" /></a><p class="wp-caption-text">Neil Bainton</p></div>
<p>Actually, though, MailChimp first brought in Foreman in 2011 to help the company improve its core business of letting users build and send their emails. MailChimp&#8217;s culture was built around many things, COO Neil Bainton told me, but data wasn&#8217;t one of them. It had &#8220;various fits and starts&#8221; through the years trying to work data into its business model, and each step just added more complexity.</p>
<p>The challenges were technological as well as cultural, but Foreman had a plan, of which focus was a key aspect. Keeping a tight focus meant Foreman and his lone-developer sidekick could build what they needed to in a short timeframe. It also meant the company didn&#8217;t have to worry about some massive overnight transformation into a data-obsessed company like Google.</p>
<div id="attachment_642358" class="wp-caption alignright" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/05/20130424_121423.jpg"><img  alt="John Foreman" src="http://gigaom2.files.wordpress.com/2013/05/20130424_121423-e1367793376856.jpg?w=300&#038;h=200" width="300" height="200" class="size-medium wp-image-642358" /></a><p class="wp-caption-text">John Foreman</p></div>
<p>&#8220;[They] don&#8217;t need to be afraid the entire culture is gonna fall down if we bring in this weird math guy,&#8221; he joked.</p>
<p>Foreman&#8217;s first project &#8212; deploying artificial intelligence models that would <a href="http://blog.mailchimp.com/project-omnivore-three-years-of-gorging-on-data/">automatically detect spammy email lists from MailChimp&#8217;s users</a> &#8211; is actually critical to the way MailChimp operates, though. It was up and running in production within a year, after a technologically challenging effort of merging separate database instances for each customer into a single environment that would let MailChimp run complex analyses across its customer base.</p>
<p>It&#8217;s such an important project, Foreman explained, because internet service and email providers keep reputation scores on the IP addresses that send email through their systems. Because MailChimp serves as the email engine for its millions of users, sending too many messages that get flagged as spam and lower MailChimp&#8217;s reputation will have a negative impact on everyone. The company used to deal with spam manually, and only after recipients began complaining about the messages they received.</p>
<p>&#8220;It used to be before we had that AI model in place that everyone had a crappier experience,&#8221; Foreman said.</p>
<h2 id="say-goodbye-to-those-90s-fans-">Say goodbye to those &#8217;90s fans, Pearl Jam</h2>
<div id="attachment_642362" class="wp-caption alignleft" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/05/bcdf-1024x864.png"><img  alt="Source: MailChimp" src="http://gigaom2.files.wordpress.com/2013/05/bcdf-1024x864.png?w=300&#038;h=253" width="300" height="253" class="size-medium wp-image-642362" /></a><p class="wp-caption-text">Source: MailChimp</p></div>
<p>Now, however, MailChimp knows some of the telltale signs of spam for which it should be on the lookout. If too high a percentage of email addresses on a given list are also <a href="http://blog.mailchimp.com/aol-and-hotmail-users-spend-more-than-gmail-users-and-other-research-finds/">available via publicly available lists</a> or those you can buy on sketchy corners of the internet, it&#8217;s probably spam. Too many old and far-more-likely-to-be-dead Earthlink or Compuserve addresses, or letters within one keystroke of each other as if someone just mashed the keyboard? Probably spam.</p>
<p>Thankfully, though, about 98 percent of the spam that MailChimp identifies is what Foreman calls &#8220;ignorant&#8221; &#8212; that is, people or companies that just don&#8217;t know the laws or best practices around sending emails. But ignorance doesn&#8217;t mean MailChimp relaxes its rules. Recently, it even flagged Pearl Jam for spammy practices because the band was trying to reconnect with old fans whose email addresses read like a who&#8217;s who list of 1990s email providers.</p>
<p>Having such a high percentage of ignorant spam actually has a positive effect on the company&#8217;s overall goal of monetizing its vast data repositories. Because the AI model automates what used to be a manual process, and because most innocent spammers will fall in line quickly once they&#8217;re notified (as opposed to nefarious spammers who constantly try to outsmart the system), MailChimp can pretty much set the model loose, forget about it and get to work on new efforts, Foreman said.</p>
<h2 id="now-about-that-pulp">Now, about that pulp</h2>
<p>Spam under control, MailChimp can focus its efforts on actually building new products with data, just like Coca-Cola did with that extra pulp. One of its first orders of business is figuring out how to help customers get to know better the people to whom they&#8217;re sending their newsletters.</p>
<p>With this in mind, the company built a service called <a href="http://wavelength.mailchimpapp.com/">Wavelength</a> that shows customers other newsletters that are similar to theirs. But the system that powers Wavelength also stores pretty much every interaction that every email address in the company&#8217;s database has with the newsletters they&#8217;re sent. That means what emails they open and when they open them, what links they click and when they click them, and what other newsletters they&#8217;re subscribed to. MailChimp also has a feature called <a href="http://kb.mailchimp.com/article/what-is-ecommerce360-and-how-does-it-work-with-mailchimp">Ecommerce360</a> that lets customers track clicks right through to conversions (marketing speak for someone actually buying something).</p>
<p>The company has been <a href="http://blog.mailchimp.com/digging-deeper-into-wavelength-and-egp-data-finding-interest-clusters-in-mailchimps-network/">playing around with this data to identify clusters of users</a> based on their behaviors and their interests &#8212; some of which Foreman has detailed on the company&#8217;s blog &#8212; and now it wants to roll it out to customers via a product MailChimp is calling ChimpQuery. Built atop <a href="http://gigaom.com/2013/03/14/google-bigquery-is-now-even-bigger/">Google&#8217;s BigQuery analytics service</a>, ChimpQuery will let customers start doing this type of clustering and segmentation on their own, while saving MailChimp the troubles of hosting that infrastructure itself. (You can play with a monstrous, interactive graph of the entire MailChimp subscriber list <a href="http://zoom.it/HD3t#full">here</a>.)</p>
<p>If you sell knitting supplies and you find out there&#8217;s a big cluster of people on your mailing list who also are interested in wedding planning and custom jewelry, there might be an opportunity to create your content with these interests in mind or even to partner with companies in those spaces.</p>
<div id="attachment_642360" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/05/marriedknit-tiff.jpg"><img  alt="A sample cluster of subscribers." src="http://gigaom2.files.wordpress.com/2013/05/marriedknit-tiff.jpg?w=708&#038;h=427" width="708" height="427" class="size-large wp-image-642360" /></a><p class="wp-caption-text">A sample cluster of subscribers.</p></div>
<p>Another topic that has been on Foreman&#8217;s mind lately is what he calls &#8220;frequency elasticity of engagement.&#8221; <a href="http://blog.mailchimp.com/sending-frequency-more-is-not-always-better/">He&#8217;s done research</a> suggesting that blasting the heck out of your email list might actually have detrimental effects in the long term (regardless of <a href="http://gigaom.com/2012/12/08/how-obamas-data-scientists-built-a-volunteer-army-on-facebook/">how the Obama campaign successfully exploited this strategy</a>) but noted that engagement also has a lot to do with content and a particular company&#8217;s given user list. MailChimp&#8217;s data could help customers figure out the ideal schedule for emailing their subscribers.</p>
<p>For example, Birchbox has really high engagement because people love the service and have to open their emails to find out what goodies they&#8217;re receiving. Emails from a company like Papa John&#8217;s, on the other hand, might sit in someone&#8217;s inbox essentially as spam until they want to order a pizza and go searching for a coupon. Everyone has to figure out what pace and engagement metrics work for them.</p>
<h2 id="reining-expectations-back-in">Reining expectations back in</h2>
<p>However, now that management is fully sold on the power of data, Foreman sometimes finds himself managing expectations rather than just pitching his ideas. COO Bainton, for example, is adamant that MailChimp start aiding its publishing-industry customers by using techniques such as natural-language processing and semantic analysis to help them personalize emails based on readers stated and unstated interests (that is, what boxes they check when they sign up and what stuff they actually click on).</p>
<p>Foreman, well, he&#8217;s pretty sure that&#8217;s too big a challenge for MailChimp to tackle considering how many publishing customers it has. MailChimp would have to understand all those customers&#8217; industries to some degree (<a href="http://www.opencalais.com/about">open source tools</a> tend to highlight technically but not situationally relevant relationships, he said, and don&#8217;t always understand things like sarcasm) and probably the different languages they publish in, as well. Rather than understand content, he&#8217;d rather focus personalization efforts around how users are connected.</p>
<p>The company also needs to balance its ambitions with what&#8217;s legally and socially acceptable. The creep factor might be more important than what&#8217;s legal when it comes to email marketing. MailChimp determines the legality of everything it does before rolling it out, Foreman explained, but in era of &#8220;post-modern spam&#8221; where legitimacy is in the eye of the recipient and where some people use their &#8220;spam&#8221; button as a proxy for unsubscribing, companies must be careful not to offend.</p>
<p>&#8220;The more we can tell you about that list without getting creepy is really useful,&#8221; Bainton said. However, he added, &#8221;I think expectation is more important than law.&#8221;</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=642316&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=996130"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=996130" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=642316+how-mailchimp-learned-to-treat-data-like-orange-juice-and-rethink-email-in-the-process&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/01/why-the-next-front-in-big-data-might-be-psychological/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=642316+how-mailchimp-learned-to-treat-data-like-orange-juice-and-rethink-email-in-the-process&utm_content=dharrisstructure">Why the next front in big data might be psychological</a></li><li><a href="http://pro.gigaom.com/2010/10/will-hadoop-vendors-profit-from-banks-big-data-woes/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=642316+how-mailchimp-learned-to-treat-data-like-orange-juice-and-rethink-email-in-the-process&utm_content=dharrisstructure">Will Hadoop Vendors Profit from Banks&#8217; Big Data Woes?</a></li><li><a href="http://pro.gigaom.com/2010/09/the-red-hot-data-warehouse-market-whos-buying-next/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=642316+how-mailchimp-learned-to-treat-data-like-orange-juice-and-rethink-email-in-the-process&utm_content=dharrisstructure">The Red-Hot Data Warehouse Market: Who&#8217;s Buying Next?</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/05/how-mailchimp-learned-to-treat-data-like-orange-juice-and-rethink-email-in-the-process/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/05/joyusgray-e1367794217987.png?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/05/joyusgray-e1367794217987.png?w=150" medium="image">
			<media:title type="html">JoyusGray</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/20130424_121443-e1367793432461.jpg?w=300" medium="image">
			<media:title type="html">Neil Bainton</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/20130424_121423-e1367793376856.jpg?w=300" medium="image">
			<media:title type="html">John Foreman</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/bcdf-1024x864.png?w=300" medium="image">
			<media:title type="html">Source: MailChimp</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/marriedknit-tiff.jpg?w=708" medium="image">
			<media:title type="html">A sample cluster of subscribers.</media:title>
		</media:content>
	</item>
		<item>
		<title>USVP, UPS and Scott McNealy pump $18M into machine-learning startup Skytree</title>
		<link>http://gigaom.com/2013/04/30/usvp-ups-and-scott-mcnealy-pump-18m-into-machine-learning-startup-skytree/</link>
		<comments>http://gigaom.com/2013/04/30/usvp-ups-and-scott-mcnealy-pump-18m-into-machine-learning-startup-skytree/#comments</comments>
		<pubDate>Tue, 30 Apr 2013 16:30:15 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[Scott McNealy]]></category>
		<category><![CDATA[Skytree]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=640909</guid>
		<description><![CDATA[Machine learning startup Skytree has raised $18 million for its software that makes short work of pattern recognition across massive datasets.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=640909&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Machine learning is everywhere these days as companies and organizations find themselves trying to make sense of data sets far too large and complex for the human brain alone. On Tuesday, <a href="http://www.skytree.net/">Skytree</a> cashed in on the hype with with an $18 million Series A round led by U.S. Venture Partners along with delivery giant UPS and Sun Microsystems co-founder and former CEO Scott McNealy. Skytree <a href="http://gigaom.com/2012/02/23/skytree-intros-machine-learning-for-the-masses/">launched in February 2012</a> with $1.5 million in seed funding.</p>
<p>Machine learning is such a hot topic right now because data volumes are becoming so large and complex that humans alone can&#8217;t query their ways through them fast enough or intelligently enough to spot latent patterns among the mess of data. It&#8217;s the algorithmic engine that <a href="http://gigaom.com/2012/06/25/how-google-is-teaching-computers-to-see/">powers a bunch of Google services</a> and <a href="http://gigaom.com/2012/06/14/netflix-analyzes-a-lot-of-data-about-your-viewing-habits/">your Netflix recommendations</a>, as well as <a href="http://gigaom.com/2012/12/05/prismatic-gets-15m-to-build-a-recommendation-engine-for-the-world/">web content-curation service Prismatic</a> and <a href="http://gigaom.com/2012/11/19/where-machine-learning-and-human-artistry-meet-your-wallet/">alternative-underwriting platform ZestFinance</a>. As we <a href="http://gigaom.com/2013/03/22/5-ways-big-data-is-going-to-blow-your-mind-and-change-your-world/">covered in some detail at this year&#8217;s Structure: Data conference</a>, machine learning is particularly powerful when its ability to correlate tens of thousands of variables is paired with human judgment about what really matters.</p>
<div id="attachment_640923" class="wp-caption alignleft" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/04/ml-2012.jpg"><img  alt="Skytree co-founder Alexander Gray (second from left) at Structure: Data 2012. (c) Pinar Ozger" src="http://gigaom2.files.wordpress.com/2013/04/ml-2012.jpg?w=300&#038;h=200" width="300" height="200" class="size-medium wp-image-640923" /></a><p class="wp-caption-text">Skytree co-founder Alexander Gray (second from left) at Structure: Data 2012. (c) Pinar Ozger</p></div>
<p>Skytree, for its part, sells a product called Skytree Server that lets users run a wide variety of machine learning algorithms across whatever data they have. It might be an oversimplification, but Skytree is essentially a souped-up version of statistical-analysis packages like SPSS or SAS that&#8217;s designed to run fast &#8212; and, more importantly &#8212; without sampling across a scale-out server architecture. In March, the company also rolled out the beta version of <a href="http://www.skytree.net/adviser-beta/">a new product called Adviser</a> that can run on a laptop and walks more-novice users through the analysis of their data, including what methods were used and why, and whether the findings are statistically significant.</p>
<p>I suspect we&#8217;re just seeing the opening salvo in what will be a rush to fund machine learning startups over the next couple of years. Skytree is among a number of increasingly promising startups in the space, including (but certainly not limited to) <a href="http://gigaom.com/2013/01/16/has-ayasdi-turned-machine-learning-into-a-magic-bullet/">Ayasdi</a> and <a href="http://gigaom.com/2013/03/20/data-science-is-not-enough-we-need-data-intelligence-too/">Quid</a>. As more individuals see the promise of machine learning and get skilled in applying it to their particular problems and datasets &#8212; as UPS apparently has &#8212; it could become become one of the go-to analytic methods in the big data era.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=640909&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=993011"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=993011" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640909+usvp-ups-and-scott-mcnealy-pump-18m-into-machine-learning-startup-skytree&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640909+usvp-ups-and-scott-mcnealy-pump-18m-into-machine-learning-startup-skytree&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/12/sector-roadmap-health-care-and-big-data-in-2012/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640909+usvp-ups-and-scott-mcnealy-pump-18m-into-machine-learning-startup-skytree&utm_content=dharrisstructure">Health care and big data in 2012</a></li><li><a href="http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640909+usvp-ups-and-scott-mcnealy-pump-18m-into-machine-learning-startup-skytree&utm_content=dharrisstructure">Takeaways from the second quarter in cloud and data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/30/usvp-ups-and-scott-mcnealy-pump-18m-into-machine-learning-startup-skytree/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/05/machine-learning.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/05/machine-learning.jpg?w=150" medium="image">
			<media:title type="html">machine learning</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/ml-2012.jpg?w=300" medium="image">
			<media:title type="html">Skytree co-founder Alexander Gray (second from left) at Structure: Data 2012. (c) Pinar Ozger</media:title>
		</media:content>
	</item>
		<item>
		<title>Google research director and AI expert Peter Norvig elected into AAAS</title>
		<link>http://gigaom.com/2013/04/29/google-research-director-and-ai-expert-peter-norvig-elected-into-aaas/</link>
		<comments>http://gigaom.com/2013/04/29/google-research-director-and-ai-expert-peter-norvig-elected-into-aaas/#comments</comments>
		<pubDate>Mon, 29 Apr 2013 16:30:13 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[artificial intelligence]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[Peter Norvig]]></category>
		<category><![CDATA[Semantic Web]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=635066</guid>
		<description><![CDATA[Artificial intelligence expert and Google Director of Research was elected to the American Academy of Arts and Sciences last week. He's well known for a 2009 paper titled "The Unreasonable Effectiveness of Data."<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=635066&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>&#8220;Simple models and a lot of data trump more elaborate models based on less data.&#8221;</p>
<p>With that line, the 2009 paper <a href="http://www.csee.wvu.edu/~gidoretto/courses/2011-fall-cp/reading/TheUnreasonable%20EffectivenessofData_IEEE_IS2009.pdf">&#8220;The Unreasonable Effectiveness of Data&#8221;</a> (co-authored by Google co-workers Alon Halevy and Fernando Pereira), Google Director of Research Peter Norvig all but guaranteed his status as one of most-quoted &#8212; or at least most-paraphrased &#8212; people in the world of big data. Last week, Norvig &#8212; as well as Google VP of Energy Arun Majumdar &#8212; was bestowed a slightly more-formal honor, as he was inducted into the American Academy of Arts and Sciences.</p>
<p>Norvig, who previously led Google&#8217;s search algorithms team and was head of computational sciences at the NASA Ames Research Center, is best known for <a href="http://norvig.com/">his work in the realm of artificial intelligence</a>. In fact, the above quote and the paper in which it appears are essentially a testament to the advances Google has been able to make in AI and machine learning thanks to the massive web page and search dataset that Google has amassed. The more examples it has of words and phrases used together in natural language, the better it can perform semantic analysis to determine what&#8217;s related to what.</p>
<p>Norvig and Majumdar are among 198 new inductees into the American Academy of Arts and Sciences&#8217; latest class. According to a <a href="http://googleresearch.blogspot.com/2013/04/two-googlers-elected-to-american.html">blog post from Google</a>, they also join six other Google employees as members: Sergey Brin, Larry Page, Eric Schmidt, Vint Cerf, Alfred Spector, Hal Varian and <a href="http://gigaom.com/2012/12/14/ray-kurzweil-joins-google-to-work-on-machine-learning-language-processing/">Ray Kurzweil</a>.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=635066&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=4840"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=4840" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=635066+google-research-director-and-ai-expert-peter-norvig-elected-into-aaas&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2011/11/connected-world-the-consumer-technology-revolution/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=635066+google-research-director-and-ai-expert-peter-norvig-elected-into-aaas&utm_content=dharrisstructure">Connected world: the consumer technology revolution</a></li><li><a href="http://pro.gigaom.com/2012/09/listening-platforms-finding-the-value-in-social-media-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=635066+google-research-director-and-ai-expert-peter-norvig-elected-into-aaas&utm_content=dharrisstructure">Listening platforms: finding the value in social media data</a></li><li><a href="http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=635066+google-research-director-and-ai-expert-peter-norvig-elected-into-aaas&utm_content=dharrisstructure">Takeaways from the second quarter in cloud and data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/29/google-research-director-and-ai-expert-peter-norvig-elected-into-aaas/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/04/norvig.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/04/norvig.jpg?w=150" medium="image">
			<media:title type="html">norvig</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>
	</item>
		<item>
		<title>Infer takes $10M to find the sales leads most likely to pay off</title>
		<link>http://gigaom.com/2013/04/23/infer-takes-10m-to-find-the-sales-leads-most-likely-to-pay-off/</link>
		<comments>http://gigaom.com/2013/04/23/infer-takes-10m-to-find-the-sales-leads-most-likely-to-pay-off/#comments</comments>
		<pubDate>Tue, 23 Apr 2013 11:30:53 +0000</pubDate>
		<dc:creator>Jordan Novet</dc:creator>
				<category><![CDATA[Infer]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[Salesforce.com]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=633308</guid>
		<description><![CDATA[A startup called Infer has taken on $10 million in venture funding to help more companies get a better idea of which leads to focus on in their sales and marketing lead lists.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=633308&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>On a good day, a sales executive can direct a salesperson to hone in on the best lead in the customer-relationship-management system such as Salesforce.com, and a deal might or might not come through. But getting the most out of sales and marketing staffers every day is the sweet spot. A startup named <a href="https://www.infer.com/">Infer</a> is emerging from stealth mode with $10 million in venture funding to help more companies get to that sweet spot with a tool for identifying the most promising leads based on a user&#8217;s historical deal-making tendencies and external data about potential leads.</p>
<p>Redpoint Ventures led the Series A round, and Andreessen Horowitz, the Social+Capital Partnership, Sutter Hill Ventures and others also contributed.</p>
<p>Based in Palo Alto, Calif., the company focuses on sales operations that keep their leads in Salesforce and other cloud-based tools such as <a href="http://gigaom.com/2012/12/20/oracle-beefs-up-marketing-applications-savvy-with-871m-buy-of-eloqua/">Eloqua</a> and <a href="http://gigaom.com/2013/04/03/marketing-automation-boom-continues-with-75m-marketo-ipo/">Marketo</a>, although Infer Co-founder and CEO Vik Singh said it&#8217;s also possible for the software to hook in to on-premise appliances. </p>
<p>Once signed up with Infer &#8212; Box, Jive Software, Tableau Software, Yammer, Zendesk and other companies are already paying customers &#8212; the system inspects historical sales information to check which deals have been sealed and which fell apart. That becomes training data for a model that scans &#8220;hundreds of signals of external data&#8221; to get a sense of which potential leads the company stands a chance of closing. Inputs include news articles, social-media accounts, website-traffic data, industry data, financial data, legal data, trademark data &#8212; &#8220;anything we can get that can give us more of a complete picture on who the customer is,&#8221; Singh said. Users can determine the weight of certain types of information.</p>
<p>Users also set priorities for the scoring of leads. &#8220;Do you want a model where the higher the score, the more likely (you are) to win (a deal), or do you care more about conversion, or do you care more about lifetime revenue, or deal size? We build the model based on that,&#8221; Singh said. It&#8217;s not just a neat way to prioritize leads; Singh said many Infer customers have boosted conversion rates with the tool.</p>
<div id="attachment_633309" class="wp-caption alignleft" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/04/infer-salesforce-integration.jpg"><img src="http://gigaom2.files.wordpress.com/2013/04/infer-salesforce-integration.jpg?w=708&#038;h=326" alt="Infer data integrated in Salesforce.com" width="708" height="326"  class="size-large wp-image-633309" /></a><p class="wp-caption-text">Infer data integrated in Salesforce.com</p></div>
<p>Before starting Infer, Singh spent some time at Google, where he focused on machine-learning methods for automatically providing answers to questions users type into the search box. Another former Googler, one-time chief information officer Douglas Merrill, co-founded a different company that uses lots of external data to make determinations: <a href="http://gigaom.com/2012/11/19/where-machine-learning-and-human-artistry-meet-your-wallet/">ZestFinance</a>, formerly known as ZestCash, extracts information from 70,000 sources to figure out if a lender should make a loan. </p>
<p>The broader strategy the two companies have in common &#8212; making predictions based on data &#8212; has become more popular in recent years, as companies merge and grow data sets to create more than the sum of their parts. In this case, it seems that the approach could garner wide adoption as a few companies optimize the time of their sales and marketing employees and pull ahead of their competitors, and other companies might want to do the same thing to catch up.</p>
<p>Of course, if Salesforce or Oracle acquires Infer, then adoption could come even faster.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=633308&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=146639"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=146639" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=633308+infer-takes-10m-to-find-the-sales-leads-most-likely-to-pay-off&utm_content=gigajordan">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2013/01/the-2013-task-management-tools-market/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=633308+infer-takes-10m-to-find-the-sales-leads-most-likely-to-pay-off&utm_content=gigajordan">The 2013 task management tools market</a></li><li><a href="http://pro.gigaom.com/2012/12/social-2013-the-enterprise-strikes-back/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=633308+infer-takes-10m-to-find-the-sales-leads-most-likely-to-pay-off&utm_content=gigajordan">Social 2013: The enterprise strikes back</a></li><li><a href="http://pro.gigaom.com/2012/04/supporting-startup-growth-with-the-new-recruiting-ecosystem/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=633308+infer-takes-10m-to-find-the-sales-leads-most-likely-to-pay-off&utm_content=gigajordan">Startup growth and the new recruiting ecosystem</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/23/infer-takes-10m-to-find-the-sales-leads-most-likely-to-pay-off/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/04/vik-singh-infer-3.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/04/vik-singh-infer-3.jpg?w=150" medium="image">
			<media:title type="html">Vik Singh infer 3</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/c00ab753df107b639e76ed4c3ab07ba7?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigajordan</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/infer-salesforce-integration.jpg?w=708" medium="image">
			<media:title type="html">Infer data integrated in Salesforce.com</media:title>
		</media:content>
	</item>
		<item>
		<title>How to hire data scientists and get hired as one</title>
		<link>http://gigaom.com/2013/04/16/how-to-hire-data-scientists-and-get-hired-as-one/</link>
		<comments>http://gigaom.com/2013/04/16/how-to-hire-data-scientists-and-get-hired-as-one/#comments</comments>
		<pubDate>Tue, 16 Apr 2013 23:56:37 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[data scientists]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[Netflix]]></category>
		<category><![CDATA[Orbitz]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=631544</guid>
		<description><![CDATA[Data scientist might be the sexiest job of the 21st century, but it's hardly an easy gig to land. Here is some advice from practitioners at Netflix, Orbitz and Hortonworks on how get hired and even do the hiring.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=631544&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>As you might have heard before if you read <a href="http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation">McKinsey reports</a>, <a href="http://www.nytimes.com/2013/04/14/education/edlife/universities-offer-courses-in-a-hot-new-field-data-science.html">the New York Times</a> or <a href="http://gigaom.com/2013/01/06/why-data-scientists-matter-data-science-is-the-future-of-everything/">just about any technology news site</a>, data scientists are in high demand. Heck, the Harvard Business Review <a href="http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/">called it the sexiest job of the 21st century</a>. But landing a gig as a data scientist isn&#8217;t easy &#8212; especially a top-notch gig at a major web or e-commerce company where merely <em>talented </em>people are a dime a dozen.</p>
<p>However, companies are starting to talk openly about what they look for in data scientists, including the skills someone should have and what they&#8217;ll need to know to survive an interview. I spent a day at the <a href="http://www.predictiveanalyticsworld.com/sanfrancisco/2013/">Predictive Analytics World</a> conference on Monday and heard both Netflix and Orbitz give their two cents. That&#8217;s also the same day Hortonworks <a href="http://hortonworks.com/blog/hortonworks-hadoop-data-science/">published a blog post about how to build a data science team</a>.</p>
<p>Granted that &#8220;data scientist&#8221; is a nebulous term &#8212; perhaps as much so as &#8220;big data&#8221; &#8212; these tips (a mashup of all three sources) are still broadly applicable. If you want to make the leap from guy who knows data to data scientist, I suggest paying attention.</p>
<h2 id="1-know-the-core-competencies">1. Know the core competencies.</h2>
<p>For most of us, there&#8217;s readin, &#8216;ritin&#8217; and &#8216;rithmetic. For data scientists, there&#8217;s SQL, statistics, predictive modeling and programming (probably Python). If you don&#8217;t have at least a grounding in these skills, you&#8217;re probably not getting through the door, in part because they form a common language that lets people from different backgrounds talk to each other.</p>
<p>Hortonworks&#8217; Ofer Mendelevitch describes the ideal data scientist as occupying a place on the spectrum between a software engineer and a research scientist. In distinguishing a great engineer, mathematician or data analyst from a data scientist, programming skills are probably the biggest variable. That&#8217;s because being able to write code means you&#8217;ll have an easier time testing out your hypotheses and algorithms, hacking through certain problems and generally thinking in ways that actually relate to the products your employer is building.</p>
<div id="attachment_631679" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/04/ofer1.png"><img  alt="Source: Hortonworks" src="http://gigaom2.files.wordpress.com/2013/04/ofer1.png?w=708&#038;h=77" width="708" height="77" class="size-large wp-image-631679" /></a><p class="wp-caption-text">Source: Hortonworks</p></div>
<p>Chris Pouliot, director of algorithms and analytics at Netflix, said even being able to &#8220;pseudo-code&#8221; might be good enough if someone is otherwise a strong candidate. You can pick up SQL or Python or whatever you need pretty quickly, he noted.</p>
<p>Or, hinted Orbitz VP of Advanced Analytics Sameer Chopra, you could just suck it up and learn Python now: &#8220;If you were to leave today and ask &#8216;What specific skills should I learn?&#8217;: Python.&#8221;</p>
<h2 id="2-know-a-little-more">2. Know a little more.</h2>
<p>Of course, just meeting the minimum requirements never got anybody a job (well, almost nobody). What Pouliot is <em>really </em>looking for in a candidate are: an advanced degree in a quantitative field; hands-on experience hacking data (ideally using Hive, Pig, SQL or Python); good exploratory analysis skills; the ability to work with engineering teams; and the ability to generate and create algorithms and models rather than relying on out-of-the-box ones.</p>
<p>Chopra&#8217;s advice was to get up to speed on machine learning, especially if you want to work in Silicon Valley, <a href="http://gigaom.com/2013/03/20/its-not-skynet-yet-in-machine-learning-theres-still-a-role-for-humans/">where machine learning has exploded in popularity</a>. He&#8217;s also a big fan of honing those hacking skills because <a href="http://gigaom.com/2012/10/04/how-trifacta-wants-to-teach-humans-and-data-to-work-together/">data munging is such a valuable skill</a> when you&#8217;re dealing with so many types of data that you need to process so they work together. If you can do quality analytics across myriad data sources, Chopra said, &#8220;you can write your own ticket in this day and age.&#8221;</p>
<p>Oh, and if you&#8217;re planning to work at a startup, he added, R is almost a must-know for anyone whose job will entail statistical analysis.</p>
<h2 id="3-embrace-online-learning">3. Embrace online learning.</h2>
<p>If it all sounds a little daunting, don&#8217;t be too worried, Chopra advised. That&#8217;s because there are plenty of opportunities to learn these new skills online via both massive open online courses (he&#8217;s particularly keen on Udacity&#8217;s Computer Science 101 and <a href="http://gigaom.com/2012/10/14/why-becoming-a-data-scientist-might-be-easier-than-you-think/">Andrew Ng&#8217;s machine learning course on Coursera</a>) and universities&#8217; own online curricula. Chopra also suggested joining professional groups on LinkedIn, <a href="http://gigaom.com/2012/09/21/forget-your-fancy-data-science-try-overkill-analytics/">participating in Kaggle competitons</a> and maybe even getting out of the house by going to meetups.</p>
<p>Whatever you&#8217;re curious about, though &#8212; text mining, natural language processing, deep learning &#8212; you can probably find someone willing to teach you for free or nearly free, and any additional skills will help set you apart from the crowd.</p>
<h2 id="4-learn-to-tell-a-story">4. Learn to tell a story.</h2>
<p>Last month at Structure: Data, DJ Patil told me that one of the biggest skill shortcomings in data science <a href="http://gigaom.com/2013/03/20/big-data-is-still-hard-but-it-gets-better/">is the ability to tell a story with data</a> beyond just pointing to the numbers. Chopra agreed, noting that today&#8217;s new visualization tools make it easier to display data in formats that non-scientists might be able to (or at least want to) consume. A corollary of storytelling is good, old-fashioned communication: All the charts in the world won&#8217;t make a difference if you can&#8217;t communicate to product managers or executives why your findings matter.</p>
<p>Pouliot is a little less sold on communication skills, though &#8212; at least sometimes. If you&#8217;re an engineer primarily talking to other engineers, he told the room, you probably can speak all the jargon you want. It&#8217;s only if someone has a business-facing role when communication really becomes important.</p>
<h2 id="5-prepare-to-be-tested-aka-you">5. Prepare to be tested (aka &#8220;Your pedigree means nothing&#8221;).</h2>
<p>After you&#8217;ve learned all these skills, added them to your résumé and talked to a hiring manager about how good you are at them, it&#8217;s likely testing time. Prospective Netflix data scientists go through a battery of exercises, Pouliot says, including explaining projects they&#8217;ve worked on and questions to determine the depth of their knowledge. They&#8217;ll also be asked to devise a framework that solves a problem of the interviewer&#8217;s choice.</p>
<div id="attachment_631680" class="wp-caption alignright" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/04/20130415_150900.jpg"><img  alt="Chris Pouliot" src="http://gigaom2.files.wordpress.com/2013/04/20130415_150900.jpg?w=300&#038;h=225" width="300" height="225" class="size-medium wp-image-631680" /></a><p class="wp-caption-text">Chris Pouliot</p></div>
<p>One thing Pouliot warned about is an over-reliance on what&#8217;s on your résumé. Right off the bat, for example, he&#8217;ll test the heck out the skills or knowledge that someone claims to ensure they really know it.</p>
<p>Having a Stanford degree and work experience at Google don&#8217;t necessarily make someone a shoo-in, either. Pouliot acknowledged during a quick chat after his presentation that he&#8217;s been seduced by the perfect resume before &#8212; even going so far as to cut a few corners to get someone in for an interview &#8212; only to be disappointed in the end. Everyone has to pass the tests, he said, and some of the best applicants on paper crashed and burned very early in the process.</p>
<h2 id="6-exercise-creativity">6. Exercise creativity.</h2>
<p>It&#8217;s during the testing phase at places like Netflix that all those personal skills and experience can come into play. There&#8217;s often no right answer when it comes to answering the hypotheticals an interviewer like Pouliot might ask, and he gives bonus points for solutions he&#8217;s never seen before. &#8220;Creativity is one of the biggest things to look for when hiring data scientists,&#8221; he said. Later, he added, &#8220;Creativity is king, I think, for a great data scientist.&#8221;</p>
<h2 id="bonus-tips-for-anyone-hiring-a">Bonus tips for anyone hiring and managing data scientists</h2>
<p>Technically, Pouliot&#8217;s talk at Predictive Analytics World was about hiring data scientists, but much of the insights were probably more valuable to aspiring data scientists. Some of them, though, we&#8217;re definitely for management, possibly at the C-level. A few points to consider:</p>
<ul>
<li>Netflix has a standalone data science team that works closely with other departments but ultimately answers to itself. This helps the data scientists collaborate with one another, gives them upward mobility (i.e., they might never become director of marketing, but they could become director of data science) and makes it easier to manage them because everyone speaks the same language so an employee knows his boss knows his stuff.</li>
</ul>
<p style="padding-left:30px;">However, he noted, the alternative approach of embedding data scientists within other departments does bring its own benefits. That type of setup can result in a better alignment of research efforts and business needs, and it can help products get built faster because everyone is on the same page. Pouliot suggests one compromise might be to keep a centralized data science team but locate it physically near the other teams it will be interacting with most often, and other is just to ensure you have representatives from every stakeholder department present for meetings and problem-solving exercises.</p>
<ul>
<li>Actually, if you just cannot hire data scientists with all the skills you want them to have, Mendelevitch from Hortonworks suggests a similar tactic. It can be difficult to teach applied math to software engineers and vice versa, so, he writes, &#8220;[S]imply build a Hadoop data science team that combines data engineers and applied scientists, working in tandem to build your data products. Back when I was at Yahoo!, that’s exactly the structure we had: applied scientists working together with data engineers to build large-scale computational advertising systems.&#8221;</li>
</ul>
<ul>
<li>If you want to retain your good data scientists once you&#8217;ve hired them &#8212; especially in Silicon Valley where they can walk out the door and get five offers &#8212; paying them the market rate is a good start. Additionally, Pouliot said, letting them work on challenging products will keep them happy. Micro-managing them will not.</li>
</ul>
<p><em>This post was corrected 4/23 with the correct spelling of Hortonworks&#8217; Ofer Mendelevitch.</em></p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-461077p1.html">Shutterstock user Sergey Nivens</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=631544&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=555946"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=555946" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=631544+how-to-hire-data-scientists-and-get-hired-as-one&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=631544+how-to-hire-data-scientists-and-get-hired-as-one&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2011/11/connected-world-the-consumer-technology-revolution/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=631544+how-to-hire-data-scientists-and-get-hired-as-one&utm_content=dharrisstructure">Connected world: the consumer technology revolution</a></li><li><a href="http://pro.gigaom.com/2010/12/9-companies-that-pushed-the-infrastructure-discussion-in-2010/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=631544+how-to-hire-data-scientists-and-get-hired-as-one&utm_content=dharrisstructure">9 Companies that Pushed the Infrastructure Discussion in 2010</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/16/how-to-hire-data-scientists-and-get-hired-as-one/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/04/shutterstock_115502362.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/04/shutterstock_115502362.jpg?w=150" medium="image">
			<media:title type="html">data scientist at board</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/ofer1.png?w=708" medium="image">
			<media:title type="html">Source: Hortonworks</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/20130415_150900.jpg?w=300" medium="image">
			<media:title type="html">Chris Pouliot</media:title>
		</media:content>
	</item>
		<item>
		<title>5 ways big data is going to blow your mind and change your world</title>
		<link>http://gigaom.com/2013/03/22/5-ways-big-data-is-going-to-blow-your-mind-and-change-your-world/</link>
		<comments>http://gigaom.com/2013/03/22/5-ways-big-data-is-going-to-blow-your-mind-and-change-your-world/#comments</comments>
		<pubDate>Sat, 23 Mar 2013 00:00:37 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[privacy]]></category>
		<category><![CDATA[Structure Data 2013]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=623203</guid>
		<description><![CDATA[Call it whatever you want -- big data, data science, data intelligence -- but be prepared to have your mind blown. Imagination and technology are on a collision course that will change the world in profound ways.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=623203&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Some people say big data is wallowing in the trough of disillusionment, but that&#8217;s a limited worldview. If you only look at it like an IT issue it might be easy to see big data as little more than business intelligence on steroids. If you only see data science as a means to serving better ads, it might be easy to ask yourself what all the fuss is about.</p>
<p>If you&#8217;re like me, though, all you see are the bright lights ahead. They might be some sort of data nirvana, or they <a href="http://gigaom.com/2013/03/20/people-will-give-up-their-personal-info-if-you-give-them-a-good-reason/">might be a privacy-destroying 18-wheeler</a> bearing down on us. They might be both. But we&#8217;re going to find out, and we&#8217;re we&#8217;re going to find out sooner rather than later.</p>
<p>This is because there are small pockets of technologists who are letting their imaginations lead the way. In a suddenly cliché way of saying it, they&#8217;re aiming for 10x improvement rather than 10 percent improvement. They can do that because they now have a base set of analytic technologies and techniques that are <a href="http://gigaom.com/2013/03/21/hadoop-applications-abound-but-hadoop-still-needs-improvement/">well positioned to solve, with <em>relatively</em> little effort</a>, whatever data problems are thrown their way.</p>
<p>Here are some themes from <a href="http://gigaom.com/2013/03/22/structuredata-2013-recap/">our just-concluded Structure: Data conference</a> that I think highlight the promise of data, but also the challenges that lie ahead.</p>
<h2 id="man-and-machine-unite">Man and machine unite</h2>
<p>Machine learning is already infiltrating nearly every aspect of our digital lives, but its ultimate promise will only be realized <a href="http://gigaom.com/2013/03/20/its-not-skynet-yet-in-machine-learning-theres-still-a-role-for-humans/">when it becomes more human</a>. That doesn&#8217;t necessarily mean making machines think like human brains (although, granted, that&#8217;s a vision currently driving billions of research dollars), but just <a href="http://gigaom.com/2013/03/20/beyond-the-like-button-putting-social-networks-to-work-for-us/">letting people better interact with the systems and models</a> <a href="http://gigaom.com/2013/03/20/six-ideas-from-entrepreneurs-for-solving-your-big-data-problems/">trying to discover the hidden patterns</a> in everything around us.</p>
<p>Whatever shape it takes, the results will be revolutionary. We&#8217;ll <a href="http://gigaom.com/2013/03/20/how-aetna-uses-patient-data-to-prevent-diabetes-and-heart-attacks/">treat diseases</a> once thought untreatable, <a href="http://gigaom.com/2013/03/20/without-human-input-augmentation-algorithms-alone-are-making-us-dumber/">tackle difficult socio-economic and cultural issues</a>, and learn to experience the world around in entirely new ways. Maybe that consumer-experience scourge known as advertising might actually become helpful rather than annoying.</p>
<p>That would really be something.</p>
<iframe src="http://new.livestream.com/accounts/74987/events/1927733/videos/14300738/player?autoPlay=false&amp;height=360&amp;mute=false&amp;width=640" height="360" width="640" frameborder="0" scrolling="no"></iframe>
<h2 id="data-science-or-data-intellige">Data science, or data intelligence?</h2>
<p>I&#8217;m not sure there needs to be <a href="http://gigaom.com/2013/03/20/data-science-is-not-enough-we-need-data-intelligence-too/">a distinction between <i>data science</i> and <i>data intelligence</i></a>, but the latter does connote a grander goal. It&#8217;s about trying to solve meaningful problems rather than just serving ads; about trying to understand why things happen just as well as when they&#8217;ll happen. This means learning to work with smaller, messier data than we might like &#8212; certainly smaller and messier than the data sets underneath most of the massive web-company data science undertakings.</p>
<p>But just think about being able to go beyond predictive models and into a world of preventative &#8212; or even professorial &#8212; models. If you know what I like, where I go and who my friends are, it might be fairly easy to predict what I want to buy. Figuring out how my decision to buy something might affect my overall well-being and then telling me why? That&#8217;s a little more difficult and a lot more beneficial.</p>
<iframe src="http://new.livestream.com/accounts/74987/events/1927733/videos/14301242/player?autoPlay=false&amp;height=360&amp;mute=false&amp;width=640" height="360" width="640" frameborder="0" scrolling="no"></iframe>
<h2 id="telling-stories-with-data">Telling stories with data</h2>
<p>Have you ever looked at a chart and wondered what the heck it was supposed to be telling you? Or downloaded a report of your Facebook activity only to ask yourself <a href="http://gigaom.com/2012/12/18/what-well-see-in-2013-in-data/">if all the disparate data points come together to paint a bigger picture</a>? Or <a href="http://gigaom.com/2013/03/21/its-not-enough-to-just-have-information-intelligence-requires-context/">tried &#8212; and failed &#8212; to stop a terrorist</a> before his movement to recruit an army of followers gained critical mass?</p>
<p>A big problem with a lot data analysis right now is that it still treats data points as entities unto themselves, largely disconnected from those around them. However, data needs context in order to be really useful; <a href="http://gigaom.com/2013/03/20/big-data-is-still-hard-but-it-gets-better/">it&#8217;s context that turns disparate data points into a story</a>. Don&#8217;t just tell me how many steps I took today or the time of day I&#8217;m most active on Facebook, but tell me how that relates to the rest of my life.</p>
<p>And don&#8217;t just tell me that someone said he wants to kill Americans. Rather, tell me a story about how much more frequently he&#8217;s saying it and how much more inciteful his words are becoming.</p>
<iframe src="http://new.livestream.com/accounts/74987/events/1927733/videos/14369011/player?autoPlay=false&amp;height=360&amp;mute=false&amp;width=640" height="360" width="640" frameborder="0" scrolling="no"></iframe>
<h2 id="the-internet-of-things-knows-a">The internet of things knows all</h2>
<p>The mobile phone in your pocket <a href="http://gigaom.com/2013/03/20/even-the-cia-is-struggling-to-deal-with-the-volume-of-real-time-social-data/">is tracking your every movement</a> and can also monitor the sounds that are surrounding you. That fitness tracker you&#8217;re wearing is identifying you by how you walk. Your smart meter data <a href="http://gigaom.com/2012/05/14/how-smart-analytics-could-thwart-terrorist-attacks/">shows when you&#8217;re home, when you&#8217;re away and when you&#8217;re in the shower</a>. Sensors in everything <a href="http://gigaom.com/2012/02/10/bits-meet-bite-check-out-the-connected-toothbrush/">from toothbrushes</a> to cars are quantifying every aspect of our lives.</p>
<p>This volume of data <a href="http://gigaom.com/2013/03/20/if-you-think-big-data-is-big-now-just-wait-for-the-internet-of-things/">can still be a lot to deal with in terms of its volume, velocity and variety</a>, and we&#8217;re still not quite sure what to do with it even if the right tools were in place. But <a href="http://gigaom.com/2013/02/01/the-increasingly-blurry-line-between-big-data-and-big-brother/">all sorts of entrepreneurs</a>, powerful institutions and intelligence agents have ideas. The technological pieces <a href="http://gigaom.com/2013/03/21/no-not-every-database-was-created-equal-heres-how-theyre-stand-out/">are coming along nicely, too</a>. Just sayin&#8217; &#8230;</p>
<iframe src="http://new.livestream.com/accounts/74987/events/1927733/videos/14306067/player?autoPlay=false&amp;height=360&amp;mute=false&amp;width=640" height="360" width="640" frameborder="0" scrolling="no"></iframe>
<h2 id="this-semantic-life">This semantic life</h2>
<p>The semantic web lives on; only it&#8217;s spreading well beyond our search engines and even our web browsers. Soon enough, we&#8217;ll <a href="http://gigaom.com/2013/02/07/the-future-of-search-is-gravitational-content-will-come-to-you/">be able to surface relevant content and people</a> simply by highlighting passage of text in whatever we&#8217;re reading &#8212; web page or not &#8212; on any type of device. When we speak to our devices, <a href="http://gigaom.com/2013/03/21/why-nuance-sees-the-semantic-web-as-a-key-to-smarter-natural-language-interfaces/">they&#8217;ll not only know what we&#8217;re saying</a>, but also what we really want even <a href="http://gigaom.com/2013/03/21/how-search-can-solve-big-data-problems/">without the help of specific commands or keywords</a>.</p>
<p>That&#8217;s a powerful proposition in a world where we increasingly expect our interactions to be hands-free and our answers to come as fast as our questions. Of course, what&#8217;s powerful in the hands of consumers driving in their cars or sitting on their couches is <a href="http://gigaom.com/2013/02/08/watson-now-officially-fighting-cancer-in-hospitals-from-the-cloud/">even more powerful in the hands of doctors</a> trying to diagnose difficult diseases or aid workers trying lend a helping hand in places where they don&#8217;t know the customs or even speak the language.</p>
<iframe src="http://new.livestream.com/accounts/74987/events/1927733/videos/14366290/player?autoPlay=false&amp;height=360&amp;mute=false&amp;width=640" height="360" width="640" frameborder="0" scrolling="no"></iframe>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-661822p1.html">Shutterstock user GrandeDuc</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=623203&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=210815"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=210815" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=623203+5-ways-big-data-is-going-to-blow-your-mind-and-change-your-world&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2011/11/connected-world-the-consumer-technology-revolution/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=623203+5-ways-big-data-is-going-to-blow-your-mind-and-change-your-world&utm_content=dharrisstructure">Connected world: the consumer technology revolution</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=623203+5-ways-big-data-is-going-to-blow-your-mind-and-change-your-world&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/01/newnet-q4-platform-mania-and-social-commerce-shakeout/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=623203+5-ways-big-data-is-going-to-blow-your-mind-and-change-your-world&utm_content=dharrisstructure">NewNet Q4: Platform mania and social commerce shakeout</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/03/22/5-ways-big-data-is-going-to-blow-your-mind-and-change-your-world/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/03/shutterstock_85269745.jpg?w=106" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/03/shutterstock_85269745.jpg?w=106" medium="image">
			<media:title type="html">mind-blowing</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>
	</item>
		<item>
		<title>It&#8217;s not Skynet yet: In machine learning, there&#8217;s still a role for humans</title>
		<link>http://gigaom.com/2013/03/20/its-not-skynet-yet-in-machine-learning-theres-still-a-role-for-humans/</link>
		<comments>http://gigaom.com/2013/03/20/its-not-skynet-yet-in-machine-learning-theres-still-a-role-for-humans/#comments</comments>
		<pubDate>Wed, 20 Mar 2013 15:07:22 +0000</pubDate>
		<dc:creator>Ki Mae Heussner</dc:creator>
				<category><![CDATA[artificial intelligence]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[Structure Data 2013]]></category>
		<category><![CDATA[StructureData2013]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=622280</guid>
		<description><![CDATA[Even though a perception persists that machines can increasingly solve complex problems and process large amounts of data on their own, machine learning experts say humans still play a key role.
<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=622280&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>If you’ve ever seen any of <a href="http://en.wikipedia.org/wiki/Skynet_(Terminator)"><i>The Terminator</i> films</a>, you’re familiar with Skynet, the <a href="http://gigaom.com/2012/03/04/why-the-cloud-has-me-fearing-wall-e-more-than-skynet/">self-aware computing system </a>at odds with humanity. But, even though a perception persists that machines can increasingly solve complex problems and process large amounts of data on their own, machine learning experts say humans still play a very important role.</p>
<p>Human intervention is critical at multiple layers, from choosing the algorithms to apply to feature creation to crafting the entire structure within which a machine will learn, said Scott Brave, founder and CTO of Baynote, at <a href="http://event.gigaom.com/structuredata/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=622280+its-not-skynet-yet-in-machine-learning-theres-still-a-role-for-humans&amp;utm_content=kimaeheussner">GigaOM’s Structure: Data conference </a>Wednesday.</p>
<p>Down the road, he said, there will be more opportunities for machine-man collaboration, as data scientists observe what the machines may be learning and then add new inputs and ideas to the system.</p>
<p>“A lot of times we forget that even though it’s big data, the amount of data that the machine has access to pales in comparison to the amount of data we’re absorbing and have access to,” he said. “We’re building intuitions and holistic pictures in our minds and we see these connections that the machine might not even have the possibility of seeing because it doesn’t have the right data.”</p>
<p>Humans have a powerful role in figuring out the sources of data to give the machine and projecting their intuition, he added.</p>
<p>Still, Timothy Estes, founder and CEO of Digital Reasoning, pointed out that there are three key areas in which machine bests man – and, over time, they could give rise to some interesting social and cultural questions.</p>
<p>Humans will never be able to consume the sheer amount of data machines can process (unless it’s with some “Ray Kurzweil-style” man and machine merging), humans weren’t designed to receive thousands of inputs at once, and we’re ill-equipped to create a unified model of knowledge across that scale of information and make judgements from it, Estes said.</p>
<p>Recognizing that, he said, he predicts a social debate between adopting a “Google”-like model to artificial intelligence, in which the machine simply tells you what to do next, and a software model, that assumes more human agency.</p>
<p>“I believe we’re going to see that [debate] play out in the next decade between the software-centric model – a personal empowerment model – and a collective model,” he said. “And that’s the Skynet problem… you get a computer with intentionality that has access to data and the next thing you know you’re looking for a robot coming back from the future.”</p>
<p>Check out &lt;a href=”<a href="http://gigaom.com/2013/03/20/structuredata-2013-live-coverage/">http://gigaom.com/2013/03/20/structuredata-2013-live-coverage/</a>“&gt;the rest of our Structure:Data 2013 coverage here&lt;/a&gt;, and a video embed of the session follows below:</p>
<p><span class="embed-youtube" style="text-align:center; display: block;"><iframe class="youtube-player" type="text/html" width="560" height="315" src="http://www.youtube.com/embed/1Drkg9e9ZTo?version=3&amp;rel=1&amp;fs=1&amp;showsearch=0&amp;showinfo=1&amp;iv_load_policy=1&amp;wmode=transparent" frameborder="0"></iframe></span><br>
A transcription of the video follows on the next page</p>
<p><a href="http://gigaom.com/2013/03/20/its-not-skynet-yet-in-machine-learning-theres-still-a-role-for-humans/2/">Go to page 2 (of 2) on GigaOM .</a></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=622280&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=785567"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=785567" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=622280+its-not-skynet-yet-in-machine-learning-theres-still-a-role-for-humans&utm_content=kimaeheussner">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=622280+its-not-skynet-yet-in-machine-learning-theres-still-a-role-for-humans&utm_content=kimaeheussner">Takeaways from the second quarter in cloud and data</a></li><li><a href="http://pro.gigaom.com/2011/11/connected-world-the-consumer-technology-revolution/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=622280+its-not-skynet-yet-in-machine-learning-theres-still-a-role-for-humans&utm_content=kimaeheussner">Connected world: the consumer technology revolution</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=622280+its-not-skynet-yet-in-machine-learning-theres-still-a-role-for-humans&utm_content=kimaeheussner">A near-term outlook for big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/03/20/its-not-skynet-yet-in-machine-learning-theres-still-a-role-for-humans/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/03/structure-data-2013-jan-puzicha-recommind-timothy-estes-digital-reasoning-scott-brave-baynote.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/03/structure-data-2013-jan-puzicha-recommind-timothy-estes-digital-reasoning-scott-brave-baynote.jpg?w=150" medium="image">
			<media:title type="html">Structure Data 2013 Jan Puzicha Recommind Timothy Estes Digital Reasoning Scott Brave Baynote</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/7467db695203dccb9119d2430d0c5246?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">kimaeheussner</media:title>
		</media:content>
	</item>
	</channel>
</rss>
