<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>GigaOM &#187; data science</title>
	<atom:link href="http://gigaom.com/tag/data-science/feed/" rel="self" type="application/rss+xml" />
	<link>http://gigaom.com</link>
	<description></description>
	<lastBuildDate>Wed, 19 Jun 2013 18:11:34 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='gigaom.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/0db8f6557d022075dbbf010c54d46d93?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>GigaOM &#187; data science</title>
		<link>http://gigaom.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://gigaom.com/osd.xml" title="GigaOM" />
	<atom:link rel='hub' href='http://gigaom.com/?pushpress=hub'/>
		<item>
		<title>Accel Partners putting another $100M toward big data apps</title>
		<link>http://gigaom.com/2013/06/17/accel-partners-putting-another-100m-toward-big-data-apps/</link>
		<comments>http://gigaom.com/2013/06/17/accel-partners-putting-another-100m-toward-big-data-apps/#comments</comments>
		<pubDate>Tue, 18 Jun 2013 04:00:03 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Accel Partners]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Visualization]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=658345</guid>
		<description><![CDATA[Accel has launched its Big Data Fund 2, a followup on the equally large fund the venture capital firm started in November 2011. Rather than seeking products that target data scientists, it wants those targeting business users.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=658345&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Venture capital firm Accel Partners is doubling down on its big data investments, announcing on Monday evening that it&#8217;s launching its second $100 million fund dedicated to analytic software and applications. The aptly named Big Data Fund 2 follows on <a href="http://gigaom.com/2011/11/08/accel-forms-100m-fund-to-feed-big-data-apps/">the firm&#8217;s initial Big Data Fund</a> that it announced in November 2011.</p>
<p>Since then, Accel has put a name on the types of companies it&#8217;s seeking to fund with the new allocation &#8212; namely, those selling what it calls &#8220;data-driven software.&#8221; That&#8217;s a fancy way of saying that it&#8217;s not looking to fund infrastructure-level software such as Hadoop or NoSQL databases, but rather software that leverages these technologies and others in order to make analytics simpler. It wants to fund startups targeting business users rather than data scientists.</p>
<div id="attachment_614655" class="wp-caption alignleft" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/02/1z5o3444.jpg"><img  alt="Structure 2011: Avery Lyford – Chairman Elect, Churchill Club; Michael Goguen – Partner, Sequoia Capital; Satish Dharmaraj – Partner, Redpoint Ventures; Ping Li – Partner, Accel Partners; John Vrionis – Managing Director, Lightspeed Venture Partners" src="http://gigaom2.files.wordpress.com/2013/02/1z5o3444.jpg?w=300&#038;h=200" width="300" height="200" class="size-medium wp-image-614655" /></a><p class="wp-caption-text">Accel Partner Ping Li (second from right) at Structure 2011. (c) Pinar Ozger</p></div>
<p>This type of company isn&#8217;t too difficult to come by anymore. Just about everywhere you look, someone is trying to put a big data spin on an old problem or invent some new methods for doing business intelligence. Accel has recently funded a number of them including RelateIQ, <a href="http://gigaom.com/2012/11/19/opower-the-big-data-energy-player-to-beat/">Opower</a>, <a href="http://gigaom.com/2012/11/28/log-data-startup-sumo-logic-raises-30m/">Sumo Logic</a>  and <a href="http://gigaom.com/2013/02/06/exclusive-causata-raises-7-5m-and-steps-up-its-game-in-targeted-ads/">Causata</a>. Among the non-Accel-funded startups GigaOM has covered in just the past few months are <a href="http://gigaom.com/2013/01/16/has-ayasdi-turned-machine-learning-into-a-magic-bullet/">Ayasdi</a>, <a href="http://gigaom.com/2013/05/31/wise-io-wants-to-make-machine-learning-available-to-all/">Wise.io</a>, <a href="http://gigaom.com/2013/06/10/spinnakr-brings-data-science-spin-to-tracking-web-traffic/">Spinnakr</a>, <a href="http://gigaom.com/2013/03/17/statwing-wants-to-make-your-data-and-armchair-quarterback-dreams-come-true/">Statwing</a> and <a href="http://gigaom.com/2013/05/14/this-is-why-big-data-is-the-sweet-spot-for-saas/">BloomReach</a>.</p>
<p>All this interest in data-driven software is no doubt inspired by the proven utility and wildly successful initial public offerings by enterprise data software companies such as <a href="http://gigaom.com/2012/04/19/splunk-ipo-kills-lives-up-to-expectations/">Splunk</a> and <a href="http://gigaom.com/2013/05/17/tableau-closes-day-1-as-a-2-9-billion-public-company-up-64-percent/">Tableau</a>. Entrepreneurs can see the value in rethinking legacy business software or processes for the era of big data and cloud computing, and investors have dollar signs in their eyes as they <a href="http://gigaom.com/2013/01/18/alchemist-accelerator-shows-off-as-enterprise-investment-picks-up/">try to get a piece of the most-promising companies</a>.</p>
<p>As with all trends, much of this startup and investing activity will prove to be overkill, but there&#8217;s no denying the promise that the right products have for everyone involved. Businesses really are hurting for better ways to make sense of all the data they&#8217;re generating and being exposed to, and they&#8217;ll pay handsomely to software vendors that can solve the problem.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=658345&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=729485"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=729485" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=658345+accel-partners-putting-another-100m-toward-big-data-apps&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=658345+accel-partners-putting-another-100m-toward-big-data-apps&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/03/big-data-budgets-on-the-rise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=658345+accel-partners-putting-another-100m-toward-big-data-apps&utm_content=dharrisstructure">Big data budgets on the rise</a></li><li><a href="http://pro.gigaom.com/2010/10/will-hadoop-vendors-profit-from-banks-big-data-woes/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=658345+accel-partners-putting-another-100m-toward-big-data-apps&utm_content=dharrisstructure">Will Hadoop Vendors Profit from Banks&#8217; Big Data Woes?</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/06/17/accel-partners-putting-another-100m-toward-big-data-apps/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/03/shutterstock_125574617.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/03/shutterstock_125574617.jpg?w=150" medium="image">
			<media:title type="html">Big Data</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/1z5o3444.jpg?w=300" medium="image">
			<media:title type="html">Structure 2011: Avery Lyford – Chairman Elect, Churchill Club; Michael Goguen – Partner, Sequoia Capital; Satish Dharmaraj – Partner, Redpoint Ventures; Ping Li – Partner, Accel Partners; John Vrionis – Managing Director, Lightspeed Venture Partners</media:title>
		</media:content>
	</item>
		<item>
		<title>WalmartLabs keeps getting smarter with Inkiru acquisition</title>
		<link>http://gigaom.com/2013/06/10/walmartlabs-keeps-getting-smarter-with-inkiru-acquisition/</link>
		<comments>http://gigaom.com/2013/06/10/walmartlabs-keeps-getting-smarter-with-inkiru-acquisition/#comments</comments>
		<pubDate>Mon, 10 Jun 2013 23:28:38 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[ad-targeting]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[Inkiru]]></category>
		<category><![CDATA[predictive analytics]]></category>
		<category><![CDATA[targeted-advertising]]></category>
		<category><![CDATA[WalmartLabs]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=656522</guid>
		<description><![CDATA[WalmartLabs has acquired a predictive analytics startup called Inkiru to bolster its ability to create better customer experiences through data. The division of Walmart was created in 2011 on a foundation of big data.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=656522&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Walmart, it seems, will not go gently into that good night when it comes to the company&#8217;s fight against e-commerce giant Amazon. It offered more evidence of its longevity on Monday, as WalmartLabs, the company&#8217;s division dedicated to developing new technologies for the web and mobile worlds, acquired a predictive analytics startup called <a href="http://www.inkiru.com/">Inkiru</a>.</p>
<p>Inkiru, which has created software for real-time predictive analytics for things like customer targeting and credit risk, seems like a fine fit with the WalmartLabs mission. On mobile devices, for example, being able to deliver deals to customers at the right time and the right place is critical. Here&#8217;s how WalmartLabs characterized the fit in <a href="http://walmartlabs.blogspot.com/2013/06/we-predict-big-data-will-move-much.html">a press release announcing the acquisition</a>:</p>
<blockquote id="quote-inkiru-has-developed"><p>
&#8220;Inkiru has developed an active learning system that combines real-time predictive intelligence, big data analytics and a customizable decision engine to inform and streamline business decisions. &#8230;<br />
&#8220;Inkiru‘s predictive analytics platform will enable us to further accelerate the big data capabilities that @WalmartLabs has propelled forward at scale…including site personalization, search, fraud prevention and marketing. Walmart’s data scientists will now be able to work with big data directly and create impact faster than ever before.&#8221;</p></blockquote>
<p>Not that WalmartLabs hasn&#8217;t been focused on building a more data-driven company since its inception. The division was created after the acquisition of social media startup Kosmix &#8212; which was co-founded by big data pioneer Anand Rajaraman &#8212; in 2011. Subsequent acquisitions include <a href="http://gigaom.com/2011/09/14/what-media-companies-can-learn-from-walmart/">OneRiot,</a> <a href="http://gigaom.com/2013/05/14/wal-mart-gets-paas-and-social-software-chops-through-oneops-tasty-labs-buys/">Tasty Labs and a cloud computing startup called OneOps</a>.</p>
<p>All of these new capabilities around social, behavioral and mobile data are likely critical to Walmart as it attempts to keep relevant against Amazon and other e-commerce companies that have digital data in their DNA. Walmart <a href="http://gigaom.com/2013/03/27/why-apple-ebay-and-walmart-have-some-of-the-biggest-data-warehouses-youve-ever-seen/">has perfected data analysis</a> and the big-box experience in its brick-and-mortar stores, but mastering the digital experience and even fusing that with the in-store experience takes a new set of skills that it seems determined to acquire.</p>
<p>To hear more about what WalmartLabs is up to from a data perspective, check out this interview with Stephen O&#8217;Sullivan as he talks about how the company is building and open sourcing big data tools.</p>
<span class='embed-youtube' style='text-align:center; display: block;'><iframe class='youtube-player' type='text/html' width='604' height='370' src='http://www.youtube.com/embed/friUWJ8ff08?version=3&#038;rel=1&#038;fs=1&#038;showsearch=0&#038;showinfo=1&#038;iv_load_policy=1&#038;wmode=transparent' frameborder='0'></iframe></span>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-461077p1.html">Shutterstock user Sergey Nivens</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=656522&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=307154"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=307154" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=656522+walmartlabs-keeps-getting-smarter-with-inkiru-acquisition&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=656522+walmartlabs-keeps-getting-smarter-with-inkiru-acquisition&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/report/sector-roadmap-social-customer-service-in-2013/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=656522+walmartlabs-keeps-getting-smarter-with-inkiru-acquisition&utm_content=dharrisstructure">Sector RoadMap: Social customer service in 2013</a></li><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=656522+walmartlabs-keeps-getting-smarter-with-inkiru-acquisition&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/06/10/walmartlabs-keeps-getting-smarter-with-inkiru-acquisition/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/06/shutterstock_134968730-e1370906684208.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/06/shutterstock_134968730-e1370906684208.jpg?w=150" medium="image">
			<media:title type="html">analytics</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>
	</item>
		<item>
		<title>First, they gave us targeted ads. Now, data scientists think they can change the world</title>
		<link>http://gigaom.com/2013/06/01/first-they-gave-us-targeted-ads-now-data-scientists-think-they-can-change-the-world/</link>
		<comments>http://gigaom.com/2013/06/01/first-they-gave-us-targeted-ads-now-data-scientists-think-they-can-change-the-world/#comments</comments>
		<pubDate>Sat, 01 Jun 2013 15:00:34 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[DataKind]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[nonprofit]]></category>
		<category><![CDATA[predictive analytics]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=652808</guid>
		<description><![CDATA[Sure, a lot of data scientists spend their days trying to optimize ads or movie recommendations, but a growing number are spending their free time tackling bigger causes.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=652808&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><em>&#8220;The best minds of my generation are thinking about how to make people click ads &#8230; That sucks.&#8221;</em></p>
<p><em>- Jeff Hammerbacher, co-founder and chief scientist, Cloudera</em></p>
<p>Well, something has to pay the bills. Thankfully, there&#8217;s also a sweeping trend in the data science world right now around bringing those skills to bear on some really meaningful problems, from the effects of tree pruning to mapping humanitarian crises around the world. I don&#8217;t know about you, but I&#8217;m willing to sacrifice a little digital privacy if it means saving some lives.</p>
<p>We&#8217;ve already covered some of these efforts, including <a href="http://gigaom.com/2013/04/08/why-saving-the-world-with-data-means-finding-your-inner-ceo/">the SumAll Foundation&#8217;s work on modern-day slavery</a> and future work on child pornography. Closely related is the effort &#8212; led by Google.org&#8217;s deep pockets &#8212; <a href="http://gigaom.com/2013/04/10/this-might-be-the-best-thing-anyone-can-do-with-data/">to create an international hotline network</a> for reporting human trafficking and collecting data. Microsoft, in particular Microsoft Research&#8217;s danah boyd, has been active in helping fight child exploitation using technology.</p>
<p>This week, I came across two new efforts on different ends of the spectrum. One is <a href="http://about.activityinfo.org/">ActivityInfo</a>, which describes itself on its website as &#8220;an online humanitarian project monitoring tool&#8221; &#8212; developed by Unicef and a consulting firm called <a href="http://www.bedatadriven.com">BeDataDriven</a> &#8212; that &#8220;helps humanitarian organizations to collect, manage, map and analyze indicators.&#8221; That partnership actually seems fairly well established (the ActivityInfo website claims it&#8217;s used by more than 75 organizations across more than 15,000 sites), although I came across it via a blog post about why BeDataDriven <a href="http://googlecloudplatform.blogspot.com/2013/05/building-humanitarian-project.html">decided to build the database on Google&#8217;s cloud</a>.</p>
<div id="attachment_653527" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/06/map-editor.png"><img  alt="ActivityInfo's map editor." src="http://gigaom2.files.wordpress.com/2013/06/map-editor.png?w=708&#038;h=330" width="708" height="330" class="size-large wp-image-653527" /></a><p class="wp-caption-text">ActivityInfo&#8217;s map editor.</p></div>
<p>The other effort I came across is <a href="http://datakind.org">DataKind</a>, specifically its work helping the New York City Department of Parks and Recreations, or NYC Parks, quantify the benefits of a strategic tree-pruning program. Founded by renowned data scientists Drew Conway and Jake Porway (who&#8217;s also the host of the National Geographic channel&#8217;s <a href="http://channel.nationalgeographic.com/channel/the-numbers-game/"><em>The Numbers Game</em></a>), DataKind exists for the sole purpose of helping non-profit organizations and small government agencies solve their most-pressing data problems. It accomplishes this goal by hosting weekend-long DataDives &#8212; essentially hackathons for data scientists &#8212; as well by facilitating longer-term engagements between volunteer data scientists or DataKind staff and organizations.</p>
<h2 id="saving-money-by-proving-what-e">Saving money by proving what every landscaper knows</h2>
<p>One of those volunteers is Brian Dalessandro, VP of data science for display advertising platform Media6Degrees. He met Porway at a data-industry function in New York in late 2012, was sold on DataKind&#8217;s vision (&#8220;[Jake's] very convincing that you should be passionate about it, too,&#8221; Dalessandro said) and got involved with his first DataDive shortly thereafter. The beneficiary organization: NYC Parks, which wanted help quantifying the benefits of tree pruning and the neighborhoods most at risk of tree damage from storms.</p>
<div id="attachment_653525" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/06/delassandro.jpg"><img  alt="Delassandro tackling storm damage at the DataDive." src="http://gigaom2.files.wordpress.com/2013/06/delassandro.jpg?w=708&#038;h=531" width="708" height="531" class="size-large wp-image-653525" /></a><p class="wp-caption-text">Delassandro tackling storm damage at the DataDive.</p></div>
<p>The benefits of mapping the neighborhoods in peril are pretty obvious, but doesn&#8217;t everyone already know that pruning keeps trees healthier and reduces the risk of falling limbs and other accidents? Kind of, Delassandro explained. Up to this point, all of the evidence has been anecdotal, which isn&#8217;t always enough when it comes to new expenditures in tight city budgets.</p>
<p>&#8220;They knew what they wanted to solve,&#8221; Dalessandro recalled, &#8220;they just didn&#8217;t know if they had the right ingredients to solve it.&#8221;</p>
<p>NYC Parks came to the DataDive with three datasets it hoped would do the trick &#8212; a census of every public tree in the city; a log of every work order on those trees; and a log of when each city block&#8217;s trees were pruned. After scraping some weather data and figuring out a working definition of &#8220;risk&#8221; that was both quantifiable and satisfied the department&#8217;s needs, Dalessandro and some others were able to solve the storm-prediction problem. Quantifying the effects of pruning turned out to be a hairier problem, though.</p>
<p>So, for the next four months, Dalessandro went to work during his spare time trying to solve it. Most of the work went to formatting the datasets so he could actually work with them like they were the same thing. This is actually a common issue with government agencies and non-profits, Porway noted, because they&#8217;re usually collecting data for accounting or reporting purposes rather than to use for statistic analysis.</p>
<p>Once the data was ready to go, though, Dalessandro was able to rework some existing code, which he had previously written to predict whether ads actually caused people to buy products, and do the actual analysis. &#8220;Instead of people converting, there&#8217;s trees and limbs falling off,&#8221; he analogized.</p>
<div id="attachment_653526" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/06/equation.png"><img  alt="You know, classic parks department stuff. Source: Brian Delassandro" src="http://gigaom2.files.wordpress.com/2013/06/equation.png?w=708&#038;h=546" width="708" height="546" class="size-large wp-image-653526" /></a><p class="wp-caption-text">You know, classic parks department stuff. Source: Brian Delassandro</p></div>
<p>In the end, he found that pruning reduces hazardous work orders the following year on the blocks pruned by 22 percent. The next steps are to put his results into a business context, presumably to make a case for a better-planned and more-comprehensive pruning system. If it&#8217;s cheaper than sending out crews to fix damage, that&#8217;s probably not a bad idea.</p>
<h2 id="can-you-solve-bigger-problems-">Can you solve bigger problems without targeting a few ads?</h2>
<p>As easy as it is to rip data science in the name of advertising, though, it seems like having that high-pressure business experience actually really helps with data volunteerism. One of SumAll&#8217;s missions is to teach the non-profits it works with to think about businesses in terms of what key performance indicators they want to track. Porway said DataKind is quite focused on teaching organizations to think like data scientists, even that just means structuring their data consistently so they can analyze it if they need to.</p>
<p>For his part, Dalessandro is excited to volunteer again, in part because he likes putting his well-honed technological skills to work in the name of the greater good. At previous jobs, he said, volunteering meant spending eight hours at the park pulling weeds or something equally mundane. However, he said, if someone needs a type of predictive model that he could build in his sleep, he could deliver truly meaningful results in just a couple hours.</p>
<p>If there&#8217;s a dark lining to this silver cloud, though, it&#8217;s that there will always be more problems than people to solve them. That doesn&#8217;t dissuade Porway, though, who sees a growing movement every time hundreds of people show up at a DataKind event, new chapters popping up overseas and the work being done by his peers in other organizations. Beside, he said, while some people are tackling difficult problems, there are lots of organizations who could benefit even from simple things like visualizations.</p>
<p>And free help is probably a better option than trying to bring those skills inside an organization. &#8220;Trying to hire data scientists to do this,&#8221; Porway said, &#8220;would be a Herculean task given how rare they are.&#8221;</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=652808&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=899629"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=899629" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=652808+first-they-gave-us-targeted-ads-now-data-scientists-think-they-can-change-the-world&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2011/11/connected-world-the-consumer-technology-revolution/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=652808+first-they-gave-us-targeted-ads-now-data-scientists-think-they-can-change-the-world&utm_content=dharrisstructure">Connected world: the consumer technology revolution</a></li><li><a href="http://pro.gigaom.com/report/how-big-data-analytics-drives-competitive-advantage/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=652808+first-they-gave-us-targeted-ads-now-data-scientists-think-they-can-change-the-world&utm_content=dharrisstructure">How big data analytics drives competitive advantage</a></li><li><a href="http://pro.gigaom.com/report/sector-roadmap-social-customer-service-in-2013/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=652808+first-they-gave-us-targeted-ads-now-data-scientists-think-they-can-change-the-world&utm_content=dharrisstructure">Sector RoadMap: Social customer service in 2013</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/06/01/first-they-gave-us-targeted-ads-now-data-scientists-think-they-can-change-the-world/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/06/delassandro1.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/06/delassandro1.jpg?w=150" medium="image">
			<media:title type="html">delassandro</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/06/map-editor.png?w=708" medium="image">
			<media:title type="html">ActivityInfo&#039;s map editor.</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/06/delassandro.jpg?w=708" medium="image">
			<media:title type="html">Delassandro tackling storm damage at the DataDive.</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/06/equation.png?w=708" medium="image">
			<media:title type="html">You know, classic parks department stuff. Source: Brian Delassandro</media:title>
		</media:content>
	</item>
		<item>
		<title>If you&#8217;re disappointed with big data, you&#8217;re not paying attention</title>
		<link>http://gigaom.com/2013/05/28/if-youre-disappointed-with-big-data-youre-not-paying-attention/</link>
		<comments>http://gigaom.com/2013/05/28/if-youre-disappointed-with-big-data-youre-not-paying-attention/#comments</comments>
		<pubDate>Tue, 28 May 2013 20:42:13 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[machine-learning]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=645895</guid>
		<description><![CDATA[The big data skeptics have been getting louder over the past few months, but the message doesn't resonate too loudly. No one said big data was perfect data.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=645895&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>There has been a backlash lately against big data. From <a href="http://radar.oreilly.com/2013/05/another-serving-of-data-skepticism.html">O&#8217;Reilly Media</a> to the <a href="http://www.newyorker.com/online/blogs/elements/2013/04/steamrolled-by-big-data.html">New Yorker</a>, from <a href="http://www.wired.com/opinion/2013/02/big-data-means-big-errors-people/">Nassim Taleb</a> to <a href="http://blogs.hbr.org/cs/2013/04/the_hidden_biases_in_big_data.html">Kate Crawford</a>, everyone is treating big data like a piñata. Gartner has <a href="http://blogs.gartner.com/svetlana-sicular/big-data-is-falling-into-the-trough-of-disillusionment/">dropped it into the &#8220;trough of disillusionment.&#8221;</a> I call B.S. on all of it.</p>
<p>It might be provocative to call into question one of the hottest tech movements in generations, but it&#8217;s not really fair. That&#8217;s because how companies and people benefit from big data, data science or whatever else they choose to call the movement toward a data-centric world is directly related to what they expect going in. Arguing that big data isn&#8217;t all it&#8217;s cracked up to be is a strawman, pure and simple &#8212; because no one should think it&#8217;s magic to begin with.</p>
<h2 id="correlation-versus-causation-v">Correlation versus causation versus &#8220;what&#8217;s good enough for the job&#8221;</h2>
<p>One of the biggest complaints &#8212; or, in some cases, proposed facts &#8212; about big data is that is <a href="http://gigaom.com/2013/03/25/liking-curly-fries-might-not-mean-youre-smart-when-correlation-isnt-enough/">relies more on correlation than causation</a> in order to find its vaunted insights. To the extent that&#8217;s true, it&#8217;s a fair criticism. Only I&#8217;m not certain how often it&#8217;s true for things that really matter.</p>
<p>Honestly, for song or product recommendations, who really cares?</p>
<p>But in areas like medicine, finance and even marketing, people are becoming much more concerned with finding out &#8220;why&#8221; once they&#8217;ve found out &#8220;what.&#8221; If you&#8217;re a police department trying to figure out a strategy for stopping people on the street, for example, even a strong correlation between race and certain crimes probably won&#8217;t be enough to justify harassing minorities. Oncologists might <a href="http://gigaom.com/2013/05/20/new-algorithm-maps-cancer-cells-like-nodes-on-a-social-network/">benefit from seeing the similarities among cells in a biopsy</a>, but targeting certain markers doesn&#8217;t guarantee you can cure someone&#8217;s cancer.</p>
<p>Or if you&#8217;re a retail store, <a href="http://gigaom.com/2011/11/22/big-data-reveals-mac-users-book-pricier-hotels/">knowing that Mac users who visit your site</a> tend to buy more-expensive products might make you want to show them more-expensive products. Some deeper digging &#8212; perhaps even via direct questions &#8212; would show they&#8217;re really concerned with craftsmanship. <a href="http://gigaom.com/2013/04/22/how-a-star-trek-convention-explains-the-secret-to-selling-more-stuff/">The more you learn beyond what a clustering algorithm can tell you</a>, the better you can connect with customers.</p>
<p>This is why some people call the process of asking interesting questions of data &#8220;exploratory analytics.&#8221; Data analysts can send out a virtual Christopher Columbus to see what&#8217;s doing inside their data. If they find something potentially valuable, they dig in further. <a href="http://gigaom.com/2012/11/20/a-startup-asks-what-if-you-didnt-have-to-analyze-data-at-all/">Correlations are just a notice</a> that there might be something worth looking at here.</p>
<div id="attachment_649835" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/05/screen-shot-2013-05-20-at-9-42-09-am2.png"><img  alt="Clusters show where oncologists should start investigating." src="http://gigaom2.files.wordpress.com/2013/05/screen-shot-2013-05-20-at-9-42-09-am2.png?w=708&#038;h=403" width="708" height="403" class="size-large wp-image-649835" /></a><p class="wp-caption-text">Clusters show where oncologists should start investigating. Source: Columbia University</p></div>
<p>And even in the realm of machine learning &#8212; where algorithms are tearing through datasets trying to discover complex patterns humans could never spot &#8212; very few people are seriously suggesting we take the machines at their word. In case after case after case, the story is the same: machines do the heavy lifting but <a href="http://gigaom.com/2013/03/20/its-not-skynet-yet-in-machine-learning-theres-still-a-role-for-humans/">humans still play critical roles</a> in training the models by correcting mistakes or adding judgment into an otherwise entirely logical process.</p>
<h2 id="web-data-is-only-part-of-big-d">Web data is only part of big data</h2>
<p>There&#8217;s another idea floating around, too, which is that web-derived data &#8212; be it from social media, search queries or some other place &#8212; is somehow synonymous with big data. Critics are quick to point out that there are biases in this type of data and that we shouldn&#8217;t abolish traditional methods of qualitative, non-digital research in lieu of methods utilizing this fast, easy web data. Of course these critics are right.</p>
<p>But who is really suggesting we do away with traditional forms of research? Social media data shouldn&#8217;t usurp traditional customer service or market research data that&#8217;s still useful, <a href="http://gigaom.com/2013/02/14/googles-flu-snafu-and-the-reliability-of-web-data/">nor should the Centers for Disease Control start relying on Google Flu Trends</a> at the expense of traditional flu-tracking methodologies. Web and social data are just <em></em><a href="http://gigaom.com/2012/10/02/why-the-trick-to-twitter-as-a-data-source-is-more-data/"><em>one more </em>source of data</a> to factor into decisions, albeit a potentially voluminous and high-velocity one.</p>
<p>Even if they&#8217;re biased or perhaps even slightly misleading, though, these new data types are still valuable, even for social science research. It is a source of new, large, and arguably unfiltered insights into attitudes and behaviors that were previously difficult to track in the wild. I&#8217;m thinking of the researchers who <a href="http://gigaom.com/2012/08/02/big-data-as-a-tool-for-detecting-and-punishing-bullies/">identified new insights into bullying</a> by studying Twitter activity, and of those who have <a href="http://www.floatingsheep.org/2013/05/hatemap.html">mapped racist tweets across the United States</a>.</p>
<div id="attachment_649834" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/05/hatemap.jpg"><img  alt="Floating Sheep's Hate Map" src="http://gigaom2.files.wordpress.com/2013/05/hatemap.jpg?w=708&#038;h=478" width="708" height="478" class="size-large wp-image-649834" /></a><p class="wp-caption-text">Floating Sheep&#8217;s Hate Map</p></div>
<p>The drawbacks should be pretty easy to overcome. Demographic or other biases might be relatively easy to spot when information is also tagged with geodata and perhaps profile information, for example. And assuming the data is mostly indicative of macro trends, there&#8217;s definitely value in being able to track it by the day, hour or minute and see trends shaping up in something far closer to real time than traditional research methods would allow.</p>
<h2 id="its-not-all-about-insights">It&#8217;s not all about insights</h2>
<p>Which brings me to another point, this one about the idea that big data is all about finding out new things through exploration. Sure, that can be the case if you&#8217;re starting to analyze entirely new data sources (like social media data) or using entirely new techniques, and it&#8217;s a very compelling reason to get started down the big data path. But <a href="http://gigaom.com/2013/01/02/why-big-data-might-be-more-about-automation-than-insights/">sometimes big data is just about automation</a>.</p>
<p>Technologies like Hadoop, for example, aren&#8217;t designed to write you better models &#8212; they&#8217;re designed to process a lot more data a lot faster. If your models still work, Hadoop should help you run them better against a much larger dataset. That might lead to more accurate models and faster answers, but it won&#8217;t necessarily lead to some &#8220;a ha&#8221; moment &#8212; like that you&#8217;ve been doing business all wrong for all these years.</p>
<p>If you&#8217;re a law firm, analyzing e-discovery files faster and more accurately might be reward enough in itself. Or maybe you&#8217;re just trying to get a better view of customers or products by putting all your data on them, that you&#8217;ve collected over years, into one place. The point is these are valuable objectives even if they don&#8217;t involve finding a needle in the haystack.</p>
<p>I think <a href="http://gigaom.com/2013/05/05/how-mailchimp-learned-to-treat-data-like-orange-juice-and-rethink-email-in-the-process/">MailChimp is a great example of this</a>. It used big data techniques to discover some interesting things about the characteristics of spam, but the bigger goal was automating the spam-detection process. Those insights don&#8217;t directly affect the bottom line, but they did free up resources to help apply data science in others areas that could.</p>
<h2 id="lower-your-expectations-or-at-">Lower your expectations. Or at least know them</h2>
<p>Like anything in IT, <a href="http://gigaom.com/2013/03/21/getting-beyond-the-cult-of-big-data/">big data is almost destined to be a money pit if you go into it without a plan</a>. I&#8217;ve heard stories of large-enterprise CIOs deploying Hadoop clusters &#8212; sometimes numerous flavors of Hadoop clusters &#8212; just because they felt obligated to. I assume there are companies trying desperately to hire data scientists with no real idea what types of problems they&#8217;ll be trying to solve. That&#8217;s crazy.</p>
<p>In some ways, this type of thinking ties back to the idea that new digital data sources somehow overtake a company&#8217;s legacy data in terms importance. Without any actual plan of attack, proposing &#8220;We&#8217;ll use social media&#8221; as a solution to finding out more about consumers is about as useful as proposing &#8220;We&#8217;ll use Hadoop&#8221; as a solution to a question about a big data strategy. Both might very well be parts of any given plan, but they need to be used for what they&#8217;re good for.</p>
<p><a href="http://gigaom.com/2013/05/07/with-300m-earmarked-for-tech-innovation-metlife-wants-to-remake-insurance/">One major takeaway from</a> my recent interview with MetLife, for example, was how fast the company was able to move on a new data-centric project because it approached it with a plan in place about the types of data and technology it needed. I don&#8217;t think it&#8217;s surprising, either, to hear the team at Infochimps say that while customers often approach thinking they need Hadoop, it turns out they usually need to <a href="http://gigaom.com/2012/08/07/infochimps-makes-its-big-data-for-developers-platform-real-time/">begin with something a little less industrial-strength</a>.</p>
<p>So, no, new data types, technologies for processing them and techniques for analyzing them aren&#8217;t going to change the world through their mere existence. At the worst, they&#8217;re just bigger, shinier and arguably better versions of what we already had. At the best, however &#8212; and used appropriately &#8212; they really could make a big difference.</p>
<p>Big data will never equal perfect data, but it can definitely point us in the right direction. I suggest not throwing the baby away with the bathwater.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-94807p1.html">Shutterstock user alri</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=645895&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=23118"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=23118" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=645895+if-youre-disappointed-with-big-data-youre-not-paying-attention&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/report/how-big-data-analytics-drives-competitive-advantage/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=645895+if-youre-disappointed-with-big-data-youre-not-paying-attention&utm_content=dharrisstructure">How big data analytics drives competitive advantage</a></li><li><a href="http://pro.gigaom.com/report/how-to-use-big-data-to-make-better-business-decisions/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=645895+if-youre-disappointed-with-big-data-youre-not-paying-attention&utm_content=dharrisstructure">How to use big data to make better business decisions</a></li><li><a href="http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=645895+if-youre-disappointed-with-big-data-youre-not-paying-attention&utm_content=dharrisstructure">Takeaways from the second quarter in cloud and data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/28/if-youre-disappointed-with-big-data-youre-not-paying-attention/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/05/shutterstock_51620866-e1369767261728.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/05/shutterstock_51620866-e1369767261728.jpg?w=150" medium="image">
			<media:title type="html">data plot</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/screen-shot-2013-05-20-at-9-42-09-am2.png?w=708" medium="image">
			<media:title type="html">Clusters show where oncologists should start investigating.</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/hatemap.jpg?w=708" medium="image">
			<media:title type="html">Floating Sheep&#039;s Hate Map</media:title>
		</media:content>
	</item>
		<item>
		<title>Black box software: a problem for science that extends to big data</title>
		<link>http://gigaom.com/2013/05/16/black-box-software-a-problem-for-science-that-extends-to-big-data-2/</link>
		<comments>http://gigaom.com/2013/05/16/black-box-software-a-problem-for-science-that-extends-to-big-data-2/#comments</comments>
		<pubDate>Thu, 16 May 2013 18:00:44 +0000</pubDate>
		<dc:creator>Amanda Alvarez</dc:creator>
				<category><![CDATA[big data analytics]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[ecology]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[scientific computing]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=646192</guid>
		<description><![CDATA[Blind trust in black box, or click-and-run, software is a growing problem in science, and the concern extends to big data and high performance computing.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=646192&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>You probably don’t need to know how a calculator makes two plus two equal four, or how your favorite smartphone app works, but the way the background software is implemented can make a big difference to the output. Slight rounding errors or slow load times in these cases might be annoying, but when you scale up to big data modeling, for instance, you might want to take a closer look at the software running your calculations before you click go.</p>
<p>Blind trust in black box, or click-and-run, software is a growing problem in science, according to a <a href="http://www.sciencemag.org/lookup/doi/10.1126/science.1231535">commentary published Thursday in the journal <i>Science</i></a>, and the concern extends beyond formal research to other domains that use high performance computing.</p>
<p>The researchers who addressed the “troubling trend in scientific software use” were motivated by a growing unease that the abundance of powerful software is letting scientists derive answers without a thorough understanding of what the software is doing. Software snafus have been responsible for some high-profile <a href="http://www.ligo-wa.caltech.edu/~michael.landry/calibration/S5/getsignright.pdf">data misinterpretations and retractions</a>.</p>
<p>This wouldn’t normally cause a blip on the average citizen’s radar, but now a lot of these scientific conclusions have real-world implications, from climate modeling and weather forecasting to high volume financial trading. In any domain using big data, misplaced trust in the power of software can be problematic, particularly when the decision makers don’t know what the software they are using is doing, said lead author Lucas Joppa of Microsoft Research.</p>
<p>So what does ecology have to do with any of this? Joppa is an ecologist by training, and works on computational techniques in that field that may also have applications for big data more broadly. He and his colleagues surveyed scientists in a sub-field of ecology &#8212; species distribution modeling (SDM) &#8212; to find out how they choose software and how well they understand its inner workings.</p>
<p>“Lots of SDM techniques are only available as computational methods, but there is a lot of discourse going on in the literature about whether the methods themselves are correct,” said Joppa. Scientists use SDM to forecast where plants and animals will be in the future given current numbers, known habitats, and climate change. It’s a niche area of research, but the disquieting survey results should be noted in any domain where forecasting is done by plugging data into software.</p>
<p>Only 8 percent of the more than 400 scientists who responded had validated their modeling software against other methods. “The number speaks for itself,” said Joppa. “The real crux of the problem is the results from software being published in a peer-reviewed journal, versus the software itself having been peer-reviewed,” which is rare. Software packages, whether proprietary or not, are often black box systems that can’t be opened and inspected. Even if you can get under the proverbial hood, like with open source software, said Joppa, most people will still have no idea what they are looking at, or how to judge its quality.</p>
<p><img  alt="catch 22" src="http://gigaom2.files.wordpress.com/2013/05/91201888.jpg?w=347&#038;h=231" width="347" height="231" class="alignleft" /></p>
<p>To top it all off, having confidence in what your software is doing results in a massive computational catch-22: how do you know the software is giving you the right answer, if you can’t get the answer without running the software? The level of confusion over what algorithms are doing in the SDM field is illustrated by a debate over <a href="http://methodsblog.wordpress.com/2013/02/20/some-big-news-about-maxent/">which of two statistical techniques is superior</a>. It turns out, Joppa explained, that the two techniques were mathematically equivalent, but the ways they were implemented in software resulted in big predictive differences.</p>
<p>This sort of mix-up isn’t surprising given the messy nature of software development (if you can even call it that) in research environments. Joppa lauded efforts like Software Carpentry that teach scientists basic software fundamentals for better programming, and said the days of getting a doctorate by merely pushing a button are over.</p>
<p>“Scientists themselves can learn a bare minimum of software engineering,” said Joppa. On the flip side, he said computer science students should have more exposure to scientific methods. “People with traditional software engineering training become uncomfortable with the way scientists want to work with software, where the design and specs are constantly changing. The way that scientific software is built is fundamentally different from consumer apps.”</p>
<p>Developers of scientific software, like MathWorks or SAS, may want to watch this space. If Joppa’s suggestions are implemented, journals may start requiring that even proprietary software be opened up for inspection and peer-review. Nearly half of the surveyed ecologists report using free statistical language R as their primary software, so maybe there is hope yet, both for open, inspectable code, and for computational science becoming more accessible while yielding trustworthy, high impact results.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=646192&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=182249"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=182249" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=646192+black-box-software-a-problem-for-science-that-extends-to-big-data-2&utm_content=neuroamanda">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/report/sector-roadmap-social-customer-service-in-2013/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=646192+black-box-software-a-problem-for-science-that-extends-to-big-data-2&utm_content=neuroamanda">Sector RoadMap: Social customer service in 2013</a></li><li><a href="http://pro.gigaom.com/2012/12/sector-roadmap-health-care-and-big-data-in-2012/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=646192+black-box-software-a-problem-for-science-that-extends-to-big-data-2&utm_content=neuroamanda">Health care and big data in 2012</a></li><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=646192+black-box-software-a-problem-for-science-that-extends-to-big-data-2&utm_content=neuroamanda">The importance of putting the U and I in visualization</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/16/black-box-software-a-problem-for-science-that-extends-to-big-data-2/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/05/146799217.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/05/146799217.jpg?w=150" medium="image">
			<media:title type="html">black box</media:title>
		</media:content>

		<media:content url="http://2.gravatar.com/avatar/e37323b74d1f383817d82c9f906b7bcf?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">neuroamanda</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/91201888.jpg?w=708" medium="image">
			<media:title type="html">catch 22</media:title>
		</media:content>
	</item>
		<item>
		<title>This is why big data is the sweet spot for SaaS</title>
		<link>http://gigaom.com/2013/05/14/this-is-why-big-data-is-the-sweet-spot-for-saas/</link>
		<comments>http://gigaom.com/2013/05/14/this-is-why-big-data-is-the-sweet-spot-for-saas/#comments</comments>
		<pubDate>Wed, 15 May 2013 01:10:22 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[BloomReach]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[marketing]]></category>
		<category><![CDATA[natural language processing]]></category>
		<category><![CDATA[saas]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=645189</guid>
		<description><![CDATA[When it comes to using big data technology effectively, there's a lot to like about SaaS. When companies like BloomReach create and analyze massive web-wide data sets, they automate insights that almost no individual company could discover on its own.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=645189&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>People often ask me where the smart money is in big data. I often tell them that’s a foolish question, because I’m not an investor — but if I were, I’d look to software as a service.</p>
<p>There are two primary reasons why, the first of which is obvious: Companies are tired of managing applications and infrastructure, so something that optimizes a common task using techniques they don’t know on servers they don’t have to manage is probably compelling. It’s called cloud computing.</p>
<p>The other reason is that <a href="http://gigaom.com/2013/04/29/google-research-director-and-ai-expert-peter-norvig-elected-into-aaas/">the <em>big </em>part of big data really is important</a> if you want to get a really clear picture of what’s happening in any given space. While no single end-user company can (or likely would) address search-engine optimization, for example, by building a massive store comprised of data from hundreds or thousands of companies as well as the entire web, a cloud service dedicated to that specific task can.</p>
<p>From <a href="http://gigaom.com/2012/11/28/log-data-startup-sumo-logic-raises-30m/">web security</a> to <a href="http://gigaom.com/2012/06/21/how-collective-intelligence-is-reshaping-systems-management/">systems management</a>, we’re already seeing how centralized data stores provide SaaS companies a broad view into what’s happening that can then be filtered down to serve each individual customer’s specific situation. <a href="http://www.bloomreach.com/">BloomReach</a>, a SaaS startup that helps companies optimize web-page content, is another good example of this principle in action.</p>
<h2 id="how-do-you-say-cotton-maxi-dre">How do <em>you</em> say, “cotton maxi dress”</h2>
<p>Ideally, BloomReach Head of Marketing Joelle Kaufman told me, the company wants to help customers ensure they get found in web searches by making sure they’re not invisible (buried deep down), irrelevant (not saying anything meaningful on their sites) or incompatible (not speaking their consumers’ language). On Tuesday, the company <a href="http://www.bloomreach.com/buzz/media-center-pr/continuous-quality-management/">announced a new feature called Continuous Quality Management</a>, which lets customers continuously monitor their pages to ensure they’re still featuring the right products and the right terminology. It’s the latest addition to a seemingly useful service that’s built atop a big data foundation few — if any — of its customers would ever attempt to build themselves.</p>
<p>BloomReach is able to help companies optimize their sites because it’s constantly crawling the web in order to figure out how everyone else is describing their content, laying out their pages and structuring their links. Running on the Amazon Web Services cloud, BloomReach runs more than 1,000 Hadoop jobs a day that process about 5 terabytes of data and a billion data points about users’ site behavior. With the latter, co-founder and CTO Ashutosh Garg explained, the company is trying to figure out who’s visiting sites, what they’re doing, how long they’re spending there and how they’re related in terms of behavior.</p>
<p>“You need to have the right amount of data and from the right places before we can do anything with it,” he said. “… It’s a massive machine learning problem.”</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/05/br-stack.png"><img alt="BR stack" src="http://gigaom2.files.wordpress.com/2013/05/br-stack.png?w=708&#038;h=531" width="708" height="531" class="aligncenter size-large wp-image-645359"></a></p>
<p>When you consider all the possible ways something could be described or formatted, the scale of the problem becomes more evident. Simple semantic analysis like associating “desk” and “table” is easy, Garg explained, but what if some wants a lightweight camera and you only have its exact weight listed without any indication of how it compares to other options? What if people searching for “smartphones” really mean “Android phones,” but you’re top-loading your results with BlackBerry phones and Windows phones?</p>
<p>Another of Garg’s hypotheticals has to do with consumers’ presentation biases. If, for example, they’re looking at a lot of websites that look the same or focus on the same things (e.g., megapixels for digital cameras), they’ll expect to see the same things from every site.</p>
<h2 id="10-nonillion-possibilities-cho">10 nonillion possibilities: Choose 1.</h2>
<p>From a sheer numbers perspective, things get even hairier when you’re trying to determine the relationship between any two pages in order to figure out the best path for links to to take. Garg said this is what computer scientists call an <a href="http://en.wikipedia.org/wiki/NP-complete">NP-complete problem</a>, which means the amount of time it takes to process the results is exponentially greater than the amount of content you’re analyzing. So, for example, analyzing 40 pages doesn’t take 10 times as long as analyzing 4 pages, but more like 100 times longer.</p>
<p>Actually, BloomReach CEO Raj De Datta gave me another example of this problem <a href="http://gigaom.com/2012/02/22/bloomreach-wants-to-save-your-site-with-big-data/">when we spoke in early 2012</a>. Here’s how I described it then:</p>
<blockquote id="quote-if-a-company-wants-t"><p>[I]f a company wants to display just 1,000 products across 100 pages, De Datta explained, there are 10-to-the-28th-power (10 octillion) possibilities for how to do that. When it comes time to describe those products, there are 10-to-the-30th-power (10 nonillion) possibilities.</p></blockquote>
<p>If a website has a million pages, Garg said, “it will take you longer than the life of the universe to solve that problem.”</p>
<p>Where this type of problem arises, BloomReach turns to <a href="http://en.wikipedia.org/wiki/Monte_Carlo_method">Monte Carlo simluations</a>, a favorite technique of physicists and Wall Street quants. The method involves running lots of simulations over large data sets in order to determine approximate results in a reasonable time frame. (And if all this isn’t enough computer science and cloud infrastructure for you, I suggest attending our <a href="http://event.gigaom.com/structure/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=645189+this-is-why-big-data-is-the-sweet-spot-for-saas&amp;utm_content=dharrisstructure">Structure conference</a> in June, which features a who’s who list of speakers, including Google’s Jeff Dean, Facebook’s Jay Parikh and Netflix’s Adrian Cockroft.)</p>
<h2 id="different-queries-different-pa">Different queries, different pages</h2>
<p>Things get even trickier when you’re trying to change the content of web pages in real time as people are searching for things. This isn’t the best method for organic search, where pages need to stay pretty consistent with the indexed versions, but it can be ideal in situations such as paid search and mobile. There are millions of ways to segment buyers, Garg explained, and how accurately you assess their intent and display your content can make the all the difference. Whether someone is a new or repeat visitor often matters, as does whether someone is price-conscious (e.g., the query included “cheap”) or perhaps searching for a particular brand.</p>
<div id="attachment_645358" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/05/llbean.png"><img alt="Source: BloomReach" src="http://gigaom2.files.wordpress.com/2013/05/llbean.png?w=708&#038;h=531" width="708" height="531" class="size-large wp-image-645358"></a><p class="wp-caption-text">Source: BloomReach</p></div>
<p>Around the holidays, the company actually realized something interesting: The bounce rate on queries for things like “gifts for dad” or “gifts for co-workers” was pretty high, but so was the conversion rate. The time to conversion was relatively fast, as well. It turns out, Garg explained, that people don’t like to overthink certain gifts too much, so if something is presented in a visually appealing manner and is within their price range, they’ll buy.</p>
<p>But creating these types of models involves more than meets the eye. For all the talk about machine learning — and machines do a majority of the work for BloomReach — people also play a critical role. A person might know better than a machine whether something was likely purchased as gift, Garg explained, or they might spot the offensive content on the T-shirt the machine decided was ideal.</p>
<p>“Humans are really good at creativity, thinking through stuff,” he said.</p>
<p>Smart humans are also good at knowing when they’re overmatched, which is why SaaS is so valuable in the big data era. CMOs could try doing what BloomReach or <a href="http://gigaom.com/2012/04/24/datapop-scores-7m-for-custom-built-ads/">similar companies such as DataPop</a> are doing, or they could pay someone to do it much better. Guess which route the smart ones will take.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-54269p1.html">Shutterstock user Andrea Danti</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=645189&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=241135"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=241135" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=645189+this-is-why-big-data-is-the-sweet-spot-for-saas&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/report/sector-roadmap-social-customer-service-in-2013/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=645189+this-is-why-big-data-is-the-sweet-spot-for-saas&utm_content=dharrisstructure">Sector RoadMap: Social customer service in 2013</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=645189+this-is-why-big-data-is-the-sweet-spot-for-saas&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/06/cloud-computing-infrastructure-2012-and-beyond/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=645189+this-is-why-big-data-is-the-sweet-spot-for-saas&utm_content=dharrisstructure">Cloud computing infrastructure: 2012 and beyond</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/14/this-is-why-big-data-is-the-sweet-spot-for-saas/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/05/shutterstock_119782672.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/05/shutterstock_119782672.jpg?w=150" medium="image">
			<media:title type="html">collective intelligence</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/br-stack.png?w=708" medium="image">
			<media:title type="html">BR stack</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/llbean.png?w=708" medium="image">
			<media:title type="html">Source: BloomReach</media:title>
		</media:content>
	</item>
		<item>
		<title>We&#8217;re witnessing the rise of the graph in big data</title>
		<link>http://gigaom.com/2013/05/14/were-witnessing-the-rise-of-the-graph-in-big-data/</link>
		<comments>http://gigaom.com/2013/05/14/were-witnessing-the-rise-of-the-graph-in-big-data/#comments</comments>
		<pubDate>Tue, 14 May 2013 14:33:33 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[graph analysis]]></category>
		<category><![CDATA[graph database]]></category>
		<category><![CDATA[GraphLab]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=645059</guid>
		<description><![CDATA[Graph databases and graph-processing applications have been popping up all over the place lately, and now they're starting to go commercial. On Tuesday, popular open source project GraphLab joined the ranks of graph startups.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=645059&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>GraphLab, a popular <a href="http://graphlab.org/">open source project</a> dedicated to graph analysis and machine learning, is trying to capitalize on the excitement around graphs by spinning off a commercial entity, <a href="http://graphlab.com/">GraphLab Inc.</a> GraphLab creator &#8212; and University of Washington machine learning professor &#8212; Carlos Guestrin will lead the new Seattle-based company, which has raised $6.75 million from Madrona Venture Group and NEA.</p>
<p>Graph analysis is among the hottest techniques around for making sense of large datasets, primarily by determining how tightly different data points are related or how similar they are. The term &#8220;graph&#8221; came into the broader lexicon along with social networks, which built social graphs to <a href="http://gigaom.com/2013/03/14/facebook-tweaks-its-algorithms-to-improve-graph-search-comment-search-coming/">assess the relationships among their millions of users</a>, but the technique has much broader uses.</p>
<div id="attachment_645089" class="wp-caption aligncenter" style="width: 677px"><a href="http://gigaom2.files.wordpress.com/2013/05/lnkdmap-1.jpg"><img  alt="My LinkedIn social graph" src="http://gigaom2.files.wordpress.com/2013/05/lnkdmap-1.jpg?w=708"   class="size-full wp-image-645089" /></a><p class="wp-caption-text">My LinkedIn social graph</p></div>
<p>Guestrin said GraphLab&#8217;s algorithms are used in a lot of recommender systems, but he also cites fraud detection in banking networks and intrusion detection in computer networks as potential applications. We&#8217;ve covered graphs as the analytical model of choice for everything <a href="http://gigaom.com/2013/04/22/how-hbase-converted-myspaces-mysql-champion-and-is-driving-hadoop-mainstream/">from content recommendation</a> to <a href="http://gigaom.com/2013/01/22/biotech-startup-syapse-wants-to-be-salesforce-com-for-our-genomes/">tracking lab work in genomics</a>. Really, though &#8212; especially when combined with machine learning &#8212; graph analysis <a href="http://gigaom.com/2013/01/16/has-ayasdi-turned-machine-learning-into-a-magic-bullet/">can be applied to anything</a> where there&#8217;s too much data for a person to possibly analyze the relationships between every point.</p>
<div id="attachment_601469" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/01/ayasdi-product-image-2-e1358295341371.jpg"><img  alt="One of Ayasdi's graph-like data maps" src="http://gigaom2.files.wordpress.com/2013/01/ayasdi-product-image-2-e1358295341371.jpg?w=708&#038;h=472" width="708" height="472" class="size-large wp-image-601469" /></a><p class="wp-caption-text">One of Ayasdi&#8217;s graph-like data maps</p></div>
<p>Google also famously uses <a href="http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html">a graph-processing system called Pregel</a> as part of PageRank. Although a number of graph databases and other projects have popped up in the past few years, Guestrin said GraphLab is actually a contemporary of Pregel. He and some colleagues at Carnegie Mellon built a small system for their lab about five years ago, then released it into the open-source world with few expectations that it would catch on. Now, he added, Pandora and WalmartLabs are among the project&#8217;s user base.</p>
<p>Among those other projects are graph databases such as <a href="http://giraph.apache.org/">Giraph</a> (an open source, Hadoop-based Pregel clone developed at Facebook) and <a href="http://www.neo4j.org/">Neo4j</a> (which also has a commercial arm, <a href="http://gigaom.com/2012/11/02/graph-startup-neo-raises-11m-as-specialized-databases-take-hold/">called Neo Technology</a>), as well as <a href="http://engineering.twitter.com/2012/03/cassovary-big-graph-processing-library.html">Twitter&#8217;s Cassovary</a> and fellow University of Washington project <a href="http://www.cs.washington.edu/node/4217/">Grappa</a>. Guestrin said GraphLab can work with most of them, particularly if they&#8217;re not designed to do machine learning at scale like GraphLab is. Some efforts, he noted, are focused on simply storing data in graph form (e.g., databases) or in providing simple graph analysis.</p>
<p>As for when we&#8217;ll actually see the results of the effort to commercialize GraphLab, Guestrin said it will be a while. Right now, he&#8217;s focused on the next open source release of GraphLab in July. However, the company will begin engaging with commercial users over the next several months to determine what types of features they would expect in commercial graph-analysis software.</p>
<p>The bigger question to come out of all this graph activity, though, is how big a market we&#8217;ll ultimately see for graph-analysis or any other specific technique. As companies get more comfortable with big data from a technical standpoint, they&#8217;re getting more interested in the different types of analysis it allows for too. This is evidenced by the <a href="http://gigaom.com/2013/03/07/5-reasons-why-the-future-of-hadoop-is-real-time-relatively-speaking/">quest to make Hadoop support myriad processing frameworks</a> aside from MapReduce.</p>
<p>We already have a handful of commercial graph products on the market &#8212; including an industrial grade one called <a href="http://www.yarcdata.com/">YarcData</a> from supercomputer maker Cray &#8212; but how many will there eventually be? And if graph analysis is all the rage right now, what comes next?</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=645059&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=229602"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=229602" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=645059+were-witnessing-the-rise-of-the-graph-in-big-data&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=645059+were-witnessing-the-rise-of-the-graph-in-big-data&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/01/12-tech-leaders-resolutions-for-2012/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=645059+were-witnessing-the-rise-of-the-graph-in-big-data&utm_content=dharrisstructure">12 tech leaders’ resolutions for 2012</a></li><li><a href="http://pro.gigaom.com/2011/11/connected-world-the-consumer-technology-revolution/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=645059+were-witnessing-the-rise-of-the-graph-in-big-data&utm_content=dharrisstructure">Connected world: the consumer technology revolution</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/14/were-witnessing-the-rise-of-the-graph-in-big-data/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/05/graphics2-3_final_cartoon.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/05/graphics2-3_final_cartoon.jpg?w=150" medium="image">
			<media:title type="html">graphics2-3_final_cartoon</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/lnkdmap-1.jpg" medium="image">
			<media:title type="html">My LinkedIn social graph</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/01/ayasdi-product-image-2-e1358295341371.jpg?w=708" medium="image">
			<media:title type="html">One of Ayasdi&#039;s graph-like data maps</media:title>
		</media:content>
	</item>
		<item>
		<title>Why 3 celebrity data scientists are willing to work for free &#8212; for you</title>
		<link>http://gigaom.com/2013/05/08/why-3-celebrity-data-scientists-are-willing-to-work-for-free-for-you/</link>
		<comments>http://gigaom.com/2013/05/08/why-3-celebrity-data-scientists-are-willing-to-work-for-free-for-you/#comments</comments>
		<pubDate>Wed, 08 May 2013 16:58:30 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hilary Mason]]></category>
		<category><![CDATA[Mortar Data]]></category>
		<category><![CDATA[recommendation engines]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=643353</guid>
		<description><![CDATA[Hadoop startup Mortar Data is offering to build recommendation systems for 10 companies, with help from Hilary Mason, Drew Conway and Max Shron. It's part of a bigger plan to democratize the science behind online recommendations.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=643353&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Hadoop-in-the-cloud startup Mortar Data is on a mission to bring recommendation engines to the masses, and it has recruited three well-known data scientists to aid its cause. On Wednesday, the company will start accepting applications <a href="http://mortardata.com/">on its website</a> from companies that would like to have Mortar Data &#8212; as well as Bit.ly&#8217;s <a href="http://www.hilarymason.com/">Hilary Mason</a>, IA Ventures Scientist-in-Residence <a href="http://drewconway.com/">Drew Conway</a> and freelancer (and former OKCupid data scientist) <a href="http://shron.net/about">Max Shron</a> &#8212; build a custom recommendation system for them.</p>
<p>The way it works, said Mortar Co-founder and CEO K Young, is that his company will choose eight companies (in addition to the two it has been working with already) to implement custom systems based on their specific needs and businesses. Mason, Conway and Shron will split their time among the 10 total companies, but will be much more than advisers &#8212; they&#8217;ll actually dig into the data and work hands-on to ensure the right techniques and algorithms are applied in the right places.</p>
<p>The applicant companies will keep any custom code, but the ultimate goal from Mortar&#8217;s perspective is to learn some best practices and create reusable building blocks that will let anyone create recommendation engines without pre-existing data science knowledge. Recommendation engines <a href="http://gigaom.com/2013/01/29/you-might-also-like-to-know-how-online-recommendations-work/">are commonplace on large web sites</a> (Netflix, Spotify, iTunes, Google, Amazon, <a href="http://gigaom.com/2013/03/03/how-and-why-linkedin-is-becoming-an-engineering-powerhouse/">LinkedIn</a>, Eventbrite and the list goes on) but smaller companies can sometimes struggle to do them, or to do them well. Young hopes Mortar can establish an open source reference architecture of sorts that makes it easy to implement everything from building data pipelines to the actual algorithms that power recommendations.</p>
<p>&#8220;They&#8217;re really common and they&#8217;re really useful, but they&#8217;re really hard,&#8221; he said. &#8220;That&#8217;s why [a reference implementation] hasn&#8217;t been done before.&#8221;</p>
<div id="attachment_643436" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/05/gernres-support-1.jpg"><img  alt="They can get pretty complex, as evidence by this Netflix example." src="http://gigaom2.files.wordpress.com/2013/05/gernres-support-1.jpg?w=708&#038;h=358" width="708" height="358" class="size-large wp-image-643436" /></a><p class="wp-caption-text">They can get pretty complex, as evidence by this Netflix example.</p></div>
<p>Presently, Young explained, anyone wanting to build a recommendation system probably knows some of the algorithms to begin with and then gets to work researching how to implement them with specific processing frameworks (e.g., MapReduce) and on their specific data. Alternatively, they might have to hire a consultant that helps them build the recommendation engine. Either way, he noted, they&#8217;re probably not open sourcing it at the end because it&#8217;s presumed too valuable a competitive edge.</p>
<p>Mortar Data&#8217;s recommendation framework will be based on Pig, Python and Java, <a href="http://gigaom.com/2012/11/28/mortar-data-wants-to-become-a-hadoop-developers-best-friend/">just like the company&#8217;s flagship platform</a> for creating Hadoop jobs. Those languages will make the implementation more accessible and customizable by more people, Young said.</p>
<p>Really, he added, any web site or service that has multiple customers and deals with multiple entities &#8212; be they restaurants, songs, dating profiles, artisan necklaces, what have you &#8212; should have some sort of recommendation engine to help provide a more-intelligent customer experience. &#8220;It should become so ubiquitous that any service you go to knows enough about you to put forward the things you actually want to see,&#8221; Young said.</p>
<p>There is, however, one catch to Mortar&#8217;s plans as they stand: Because the service is hosted on Amazon Web Services, anyone interested in having Mason, Conway, Shron and Mortar work on their systems must have their data in AWS or be able to move it there. The initial reference implementation will likely be AWS-centric, too, but Young hopes contributors will use it and share methods for running it atop other platforms.</p>
<p><em>Feature image of Hilary Mason at Structure: Data 2011 courtesy of Pinar Ozger (www.pinarozger.com).</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=643353&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=658618"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=658618" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=643353+why-3-celebrity-data-scientists-are-willing-to-work-for-free-for-you&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/08/why-3-celebrity-data-scientists-are-willing-to-work-for-free-for-you/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/05/hilarymason.jpeg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/05/hilarymason.jpeg?w=150" medium="image">
			<media:title type="html">hilarymason</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/gernres-support-1.jpg?w=708" medium="image">
			<media:title type="html">They can get pretty complex, as evidence by this Netflix example.</media:title>
		</media:content>
	</item>
		<item>
		<title>Four ways data scientists are using digital art to humanize data</title>
		<link>http://gigaom.com/2013/05/01/four-ways-data-scientists-are-using-digital-art-to-humanize-data/</link>
		<comments>http://gigaom.com/2013/05/01/four-ways-data-scientists-are-using-digital-art-to-humanize-data/#comments</comments>
		<pubDate>Wed, 01 May 2013 16:10:06 +0000</pubDate>
		<dc:creator>Amanda Alvarez</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[data visualization]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=641094</guid>
		<description><![CDATA[The growing pains of big data were apparent at the Data 2.0 Summit on Tuesday in San Francisco. Here is a selection of visualization tools that came up at the meeting.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=641094&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>The growing pains of big data were apparent at the <a href="http://data2summit.com/">Data 2.0 Summit </a>on Tuesday in San Francisco.</p>
<p>During one panel, the assertion that data science is dead was indeed debated. Along with the habitual tension between end user requirements for businesses and consumers and the “elitist” ideas of data scientists and engineers, other themes explored included increasing accessibility to data, as well as changing behaviors and encouraging better decision-making with data. Everyone from sales and marketing people to fitness enthusiasts, it turns out, can be motivated by pretty pictures.</p>
<p>As IBM’s Alah Keahey put it during a panel, “there is a hunger for friendly data,” and visualization can help to humanize those threatening terabytes. Here are a selection of new, and new-to-us, visualization tools that came up at the meeting.</p>
<p><strong>Bringing climate change home: <a href="http://databasin.org/">Databasin.org</a></strong></p>
<p style="text-align:left;">A mapping and analytics platform from the Conservation Biology Institute that has 10,000 datasets on everything you need to understand how extreme weather will impact natural resources, renewable energy, and endangered species. Here is <a href="http://databasin.org/datasets/638a938ba0f84e238b342337f7616ecd">one projection</a> of maximum temperatures in 2080.</p>
<p style="text-align:left;"><img  alt="world-map-climate-change-databasin" src="http://gigaom2.files.wordpress.com/2013/04/world-map-climate-change-databasin.png?w=367&#038;h=258" width="367" height="258" class="aligncenter  wp-image-641097" /></p>
<p><strong><a href="http://www.sparkvis.com/">Sparkvis</a> by Chloe Fan</strong></p>
<p>This app is for the quantified self junkie who loves to interpret their burned calories as abstract art. The research behind the colorful display of Fitbit (see disclosure) data is explained <a href="http://www.chloefan.com/static/files/2012-UbiComp-Fan.pdf">here</a>. <i>Image via <a href="http://quantifiedself.com/2012/05/spark-visualizing-physical-activity-using-abstract-ambient-art/">QuantifiedSelf.com</a></i></p>
<p><img  alt="sparkvis-fitbit-visualization" src="http://gigaom2.files.wordpress.com/2013/04/sparkvis-fitbit-visualization.png?w=421&#038;h=242" width="421" height="242" class="aligncenter size-medium wp-image-641098" /></p>
<p><strong><a href="http://disqus.com/gravity/">Disqus Gravity</a></strong></p>
<p>The commenting platform’s diverse content is brought together in an interactive and live visualization. Pulling from about 500 sites that use Disqus, Gravity brings together the “small” data of individual comments within the context of 11 content categories. Another visualization, <a href="http://map.labs.disqus.com/">Orbital</a>, shows realtime comments geolocated on a spinning globe.</p>
<p><img  alt="disqus-gravity-visualization" src="http://gigaom2.files.wordpress.com/2013/04/disqus-gravity-visualization.png?w=396&#038;h=231" width="396" height="231" class="aligncenter size-medium wp-image-641103" /></p>
<p><strong><a href="http://www-958.ibm.com/software/analytics/manyeyes/">IBM Many Eyes</a></strong></p>
<p>Originally conceived by visualization guru Martin Wattenberg and colleagues in 2007, Many Eyes lets you plug in any dataset and generate nifty figures. Here, for example, is the <a href="http://www-958.ibm.com/software/analytics/manyeyes/visualizations/distribution-of-us-foreign-aid-ove-3">distribution of U.S. foreign aid</a> over a 60-year period.</p>
<p><img  alt="many-eyes-visualization-foreign-aid" src="http://gigaom2.files.wordpress.com/2013/04/many-eyes-visualization-foreign-aid.png?w=444&#038;h=297" width="444" height="297" class="aligncenter size-medium wp-image-641104" /></p>
<p><em>Disclosure: Fitbit is backed by True Ventures, a venture capital firm that is an investor in the parent company of GigaOM. Om Malik, founder of GigaOM, is also a venture partner at True.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=641094&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=105954"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=105954" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=641094+four-ways-data-scientists-are-using-digital-art-to-humanize-data&utm_content=neuroamanda">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/12/big-data-2013-key-trends-and-companies-to-watch/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=641094+four-ways-data-scientists-are-using-digital-art-to-humanize-data&utm_content=neuroamanda">Big data 2013: key trends and companies to watch</a></li><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=641094+four-ways-data-scientists-are-using-digital-art-to-humanize-data&utm_content=neuroamanda">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2012/04/sector-roadmap-hadoop-platforms-2012/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=641094+four-ways-data-scientists-are-using-digital-art-to-humanize-data&utm_content=neuroamanda">2012: The Hadoop infrastructure market booms</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/01/four-ways-data-scientists-are-using-digital-art-to-humanize-data/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/04/visualization-examples.png?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/04/visualization-examples.png?w=150" medium="image">
			<media:title type="html">visualization-examples</media:title>
		</media:content>

		<media:content url="http://2.gravatar.com/avatar/e37323b74d1f383817d82c9f906b7bcf?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">neuroamanda</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/world-map-climate-change-databasin.png?w=300" medium="image">
			<media:title type="html">world-map-climate-change-databasin</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/sparkvis-fitbit-visualization.png?w=300" medium="image">
			<media:title type="html">sparkvis-fitbit-visualization</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/disqus-gravity-visualization.png?w=300" medium="image">
			<media:title type="html">disqus-gravity-visualization</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/many-eyes-visualization-foreign-aid.png?w=300" medium="image">
			<media:title type="html">many-eyes-visualization-foreign-aid</media:title>
		</media:content>
	</item>
		<item>
		<title>USVP, UPS and Scott McNealy pump $18M into machine-learning startup Skytree</title>
		<link>http://gigaom.com/2013/04/30/usvp-ups-and-scott-mcnealy-pump-18m-into-machine-learning-startup-skytree/</link>
		<comments>http://gigaom.com/2013/04/30/usvp-ups-and-scott-mcnealy-pump-18m-into-machine-learning-startup-skytree/#comments</comments>
		<pubDate>Tue, 30 Apr 2013 16:30:15 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[Scott McNealy]]></category>
		<category><![CDATA[Skytree]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=640909</guid>
		<description><![CDATA[Machine learning startup Skytree has raised $18 million for its software that makes short work of pattern recognition across massive datasets.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=640909&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Machine learning is everywhere these days as companies and organizations find themselves trying to make sense of data sets far too large and complex for the human brain alone. On Tuesday, <a href="http://www.skytree.net/">Skytree</a> cashed in on the hype with with an $18 million Series A round led by U.S. Venture Partners along with delivery giant UPS and Sun Microsystems co-founder and former CEO Scott McNealy. Skytree <a href="http://gigaom.com/2012/02/23/skytree-intros-machine-learning-for-the-masses/">launched in February 2012</a> with $1.5 million in seed funding.</p>
<p>Machine learning is such a hot topic right now because data volumes are becoming so large and complex that humans alone can&#8217;t query their ways through them fast enough or intelligently enough to spot latent patterns among the mess of data. It&#8217;s the algorithmic engine that <a href="http://gigaom.com/2012/06/25/how-google-is-teaching-computers-to-see/">powers a bunch of Google services</a> and <a href="http://gigaom.com/2012/06/14/netflix-analyzes-a-lot-of-data-about-your-viewing-habits/">your Netflix recommendations</a>, as well as <a href="http://gigaom.com/2012/12/05/prismatic-gets-15m-to-build-a-recommendation-engine-for-the-world/">web content-curation service Prismatic</a> and <a href="http://gigaom.com/2012/11/19/where-machine-learning-and-human-artistry-meet-your-wallet/">alternative-underwriting platform ZestFinance</a>. As we <a href="http://gigaom.com/2013/03/22/5-ways-big-data-is-going-to-blow-your-mind-and-change-your-world/">covered in some detail at this year&#8217;s Structure: Data conference</a>, machine learning is particularly powerful when its ability to correlate tens of thousands of variables is paired with human judgment about what really matters.</p>
<div id="attachment_640923" class="wp-caption alignleft" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/04/ml-2012.jpg"><img  alt="Skytree co-founder Alexander Gray (second from left) at Structure: Data 2012. (c) Pinar Ozger" src="http://gigaom2.files.wordpress.com/2013/04/ml-2012.jpg?w=300&#038;h=200" width="300" height="200" class="size-medium wp-image-640923" /></a><p class="wp-caption-text">Skytree co-founder Alexander Gray (second from left) at Structure: Data 2012. (c) Pinar Ozger</p></div>
<p>Skytree, for its part, sells a product called Skytree Server that lets users run a wide variety of machine learning algorithms across whatever data they have. It might be an oversimplification, but Skytree is essentially a souped-up version of statistical-analysis packages like SPSS or SAS that&#8217;s designed to run fast &#8212; and, more importantly &#8212; without sampling across a scale-out server architecture. In March, the company also rolled out the beta version of <a href="http://www.skytree.net/adviser-beta/">a new product called Adviser</a> that can run on a laptop and walks more-novice users through the analysis of their data, including what methods were used and why, and whether the findings are statistically significant.</p>
<p>I suspect we&#8217;re just seeing the opening salvo in what will be a rush to fund machine learning startups over the next couple of years. Skytree is among a number of increasingly promising startups in the space, including (but certainly not limited to) <a href="http://gigaom.com/2013/01/16/has-ayasdi-turned-machine-learning-into-a-magic-bullet/">Ayasdi</a> and <a href="http://gigaom.com/2013/03/20/data-science-is-not-enough-we-need-data-intelligence-too/">Quid</a>. As more individuals see the promise of machine learning and get skilled in applying it to their particular problems and datasets &#8212; as UPS apparently has &#8212; it could become become one of the go-to analytic methods in the big data era.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=640909&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=167527"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=167527" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640909+usvp-ups-and-scott-mcnealy-pump-18m-into-machine-learning-startup-skytree&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640909+usvp-ups-and-scott-mcnealy-pump-18m-into-machine-learning-startup-skytree&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/report/how-big-data-analytics-drives-competitive-advantage/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640909+usvp-ups-and-scott-mcnealy-pump-18m-into-machine-learning-startup-skytree&utm_content=dharrisstructure">How big data analytics drives competitive advantage</a></li><li><a href="http://pro.gigaom.com/report/the-new-economics-of-enterprise-data-warehousing/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640909+usvp-ups-and-scott-mcnealy-pump-18m-into-machine-learning-startup-skytree&utm_content=dharrisstructure">How data warehousing is now a cost-effective solution for businesses</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/30/usvp-ups-and-scott-mcnealy-pump-18m-into-machine-learning-startup-skytree/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/05/machine-learning.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/05/machine-learning.jpg?w=150" medium="image">
			<media:title type="html">machine learning</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/ml-2012.jpg?w=300" medium="image">
			<media:title type="html">Skytree co-founder Alexander Gray (second from left) at Structure: Data 2012. (c) Pinar Ozger</media:title>
		</media:content>
	</item>
	</channel>
</rss>
