<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>GigaOM &#187; algorithms</title>
	<atom:link href="http://gigaom.com/tag/algorithms/feed/" rel="self" type="application/rss+xml" />
	<link>http://gigaom.com</link>
	<description></description>
	<lastBuildDate>Wed, 19 Jun 2013 02:58:24 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='gigaom.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/0db8f6557d022075dbbf010c54d46d93?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>GigaOM &#187; algorithms</title>
		<link>http://gigaom.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://gigaom.com/osd.xml" title="GigaOM" />
	<atom:link rel='hub' href='http://gigaom.com/?pushpress=hub'/>
		<item>
		<title>First, they gave us targeted ads. Now, data scientists think they can change the world</title>
		<link>http://gigaom.com/2013/06/01/first-they-gave-us-targeted-ads-now-data-scientists-think-they-can-change-the-world/</link>
		<comments>http://gigaom.com/2013/06/01/first-they-gave-us-targeted-ads-now-data-scientists-think-they-can-change-the-world/#comments</comments>
		<pubDate>Sat, 01 Jun 2013 15:00:34 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[DataKind]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[nonprofit]]></category>
		<category><![CDATA[predictive analytics]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=652808</guid>
		<description><![CDATA[Sure, a lot of data scientists spend their days trying to optimize ads or movie recommendations, but a growing number are spending their free time tackling bigger causes.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=652808&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><em>&#8220;The best minds of my generation are thinking about how to make people click ads &#8230; That sucks.&#8221;</em></p>
<p><em>- Jeff Hammerbacher, co-founder and chief scientist, Cloudera</em></p>
<p>Well, something has to pay the bills. Thankfully, there&#8217;s also a sweeping trend in the data science world right now around bringing those skills to bear on some really meaningful problems, from the effects of tree pruning to mapping humanitarian crises around the world. I don&#8217;t know about you, but I&#8217;m willing to sacrifice a little digital privacy if it means saving some lives.</p>
<p>We&#8217;ve already covered some of these efforts, including <a href="http://gigaom.com/2013/04/08/why-saving-the-world-with-data-means-finding-your-inner-ceo/">the SumAll Foundation&#8217;s work on modern-day slavery</a> and future work on child pornography. Closely related is the effort &#8212; led by Google.org&#8217;s deep pockets &#8212; <a href="http://gigaom.com/2013/04/10/this-might-be-the-best-thing-anyone-can-do-with-data/">to create an international hotline network</a> for reporting human trafficking and collecting data. Microsoft, in particular Microsoft Research&#8217;s danah boyd, has been active in helping fight child exploitation using technology.</p>
<p>This week, I came across two new efforts on different ends of the spectrum. One is <a href="http://about.activityinfo.org/">ActivityInfo</a>, which describes itself on its website as &#8220;an online humanitarian project monitoring tool&#8221; &#8212; developed by Unicef and a consulting firm called <a href="http://www.bedatadriven.com">BeDataDriven</a> &#8212; that &#8220;helps humanitarian organizations to collect, manage, map and analyze indicators.&#8221; That partnership actually seems fairly well established (the ActivityInfo website claims it&#8217;s used by more than 75 organizations across more than 15,000 sites), although I came across it via a blog post about why BeDataDriven <a href="http://googlecloudplatform.blogspot.com/2013/05/building-humanitarian-project.html">decided to build the database on Google&#8217;s cloud</a>.</p>
<div id="attachment_653527" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/06/map-editor.png"><img  alt="ActivityInfo's map editor." src="http://gigaom2.files.wordpress.com/2013/06/map-editor.png?w=708&#038;h=330" width="708" height="330" class="size-large wp-image-653527" /></a><p class="wp-caption-text">ActivityInfo&#8217;s map editor.</p></div>
<p>The other effort I came across is <a href="http://datakind.org">DataKind</a>, specifically its work helping the New York City Department of Parks and Recreations, or NYC Parks, quantify the benefits of a strategic tree-pruning program. Founded by renowned data scientists Drew Conway and Jake Porway (who&#8217;s also the host of the National Geographic channel&#8217;s <a href="http://channel.nationalgeographic.com/channel/the-numbers-game/"><em>The Numbers Game</em></a>), DataKind exists for the sole purpose of helping non-profit organizations and small government agencies solve their most-pressing data problems. It accomplishes this goal by hosting weekend-long DataDives &#8212; essentially hackathons for data scientists &#8212; as well by facilitating longer-term engagements between volunteer data scientists or DataKind staff and organizations.</p>
<h2 id="saving-money-by-proving-what-e">Saving money by proving what every landscaper knows</h2>
<p>One of those volunteers is Brian Dalessandro, VP of data science for display advertising platform Media6Degrees. He met Porway at a data-industry function in New York in late 2012, was sold on DataKind&#8217;s vision (&#8220;[Jake's] very convincing that you should be passionate about it, too,&#8221; Dalessandro said) and got involved with his first DataDive shortly thereafter. The beneficiary organization: NYC Parks, which wanted help quantifying the benefits of tree pruning and the neighborhoods most at risk of tree damage from storms.</p>
<div id="attachment_653525" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/06/delassandro.jpg"><img  alt="Delassandro tackling storm damage at the DataDive." src="http://gigaom2.files.wordpress.com/2013/06/delassandro.jpg?w=708&#038;h=531" width="708" height="531" class="size-large wp-image-653525" /></a><p class="wp-caption-text">Delassandro tackling storm damage at the DataDive.</p></div>
<p>The benefits of mapping the neighborhoods in peril are pretty obvious, but doesn&#8217;t everyone already know that pruning keeps trees healthier and reduces the risk of falling limbs and other accidents? Kind of, Delassandro explained. Up to this point, all of the evidence has been anecdotal, which isn&#8217;t always enough when it comes to new expenditures in tight city budgets.</p>
<p>&#8220;They knew what they wanted to solve,&#8221; Dalessandro recalled, &#8220;they just didn&#8217;t know if they had the right ingredients to solve it.&#8221;</p>
<p>NYC Parks came to the DataDive with three datasets it hoped would do the trick &#8212; a census of every public tree in the city; a log of every work order on those trees; and a log of when each city block&#8217;s trees were pruned. After scraping some weather data and figuring out a working definition of &#8220;risk&#8221; that was both quantifiable and satisfied the department&#8217;s needs, Dalessandro and some others were able to solve the storm-prediction problem. Quantifying the effects of pruning turned out to be a hairier problem, though.</p>
<p>So, for the next four months, Dalessandro went to work during his spare time trying to solve it. Most of the work went to formatting the datasets so he could actually work with them like they were the same thing. This is actually a common issue with government agencies and non-profits, Porway noted, because they&#8217;re usually collecting data for accounting or reporting purposes rather than to use for statistic analysis.</p>
<p>Once the data was ready to go, though, Dalessandro was able to rework some existing code, which he had previously written to predict whether ads actually caused people to buy products, and do the actual analysis. &#8220;Instead of people converting, there&#8217;s trees and limbs falling off,&#8221; he analogized.</p>
<div id="attachment_653526" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/06/equation.png"><img  alt="You know, classic parks department stuff. Source: Brian Delassandro" src="http://gigaom2.files.wordpress.com/2013/06/equation.png?w=708&#038;h=546" width="708" height="546" class="size-large wp-image-653526" /></a><p class="wp-caption-text">You know, classic parks department stuff. Source: Brian Delassandro</p></div>
<p>In the end, he found that pruning reduces hazardous work orders the following year on the blocks pruned by 22 percent. The next steps are to put his results into a business context, presumably to make a case for a better-planned and more-comprehensive pruning system. If it&#8217;s cheaper than sending out crews to fix damage, that&#8217;s probably not a bad idea.</p>
<h2 id="can-you-solve-bigger-problems-">Can you solve bigger problems without targeting a few ads?</h2>
<p>As easy as it is to rip data science in the name of advertising, though, it seems like having that high-pressure business experience actually really helps with data volunteerism. One of SumAll&#8217;s missions is to teach the non-profits it works with to think about businesses in terms of what key performance indicators they want to track. Porway said DataKind is quite focused on teaching organizations to think like data scientists, even that just means structuring their data consistently so they can analyze it if they need to.</p>
<p>For his part, Dalessandro is excited to volunteer again, in part because he likes putting his well-honed technological skills to work in the name of the greater good. At previous jobs, he said, volunteering meant spending eight hours at the park pulling weeds or something equally mundane. However, he said, if someone needs a type of predictive model that he could build in his sleep, he could deliver truly meaningful results in just a couple hours.</p>
<p>If there&#8217;s a dark lining to this silver cloud, though, it&#8217;s that there will always be more problems than people to solve them. That doesn&#8217;t dissuade Porway, though, who sees a growing movement every time hundreds of people show up at a DataKind event, new chapters popping up overseas and the work being done by his peers in other organizations. Beside, he said, while some people are tackling difficult problems, there are lots of organizations who could benefit even from simple things like visualizations.</p>
<p>And free help is probably a better option than trying to bring those skills inside an organization. &#8220;Trying to hire data scientists to do this,&#8221; Porway said, &#8220;would be a Herculean task given how rare they are.&#8221;</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=652808&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=743558"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=743558" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=652808+first-they-gave-us-targeted-ads-now-data-scientists-think-they-can-change-the-world&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2011/11/connected-world-the-consumer-technology-revolution/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=652808+first-they-gave-us-targeted-ads-now-data-scientists-think-they-can-change-the-world&utm_content=dharrisstructure">Connected world: the consumer technology revolution</a></li><li><a href="http://pro.gigaom.com/report/how-big-data-analytics-drives-competitive-advantage/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=652808+first-they-gave-us-targeted-ads-now-data-scientists-think-they-can-change-the-world&utm_content=dharrisstructure">How big data analytics drives competitive advantage</a></li><li><a href="http://pro.gigaom.com/report/sector-roadmap-social-customer-service-in-2013/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=652808+first-they-gave-us-targeted-ads-now-data-scientists-think-they-can-change-the-world&utm_content=dharrisstructure">Sector RoadMap: Social customer service in 2013</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/06/01/first-they-gave-us-targeted-ads-now-data-scientists-think-they-can-change-the-world/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/06/delassandro1.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/06/delassandro1.jpg?w=150" medium="image">
			<media:title type="html">delassandro</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/06/map-editor.png?w=708" medium="image">
			<media:title type="html">ActivityInfo&#039;s map editor.</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/06/delassandro.jpg?w=708" medium="image">
			<media:title type="html">Delassandro tackling storm damage at the DataDive.</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/06/equation.png?w=708" medium="image">
			<media:title type="html">You know, classic parks department stuff. Source: Brian Delassandro</media:title>
		</media:content>
	</item>
		<item>
		<title>New algorithm maps cancer cells like nodes on a social network</title>
		<link>http://gigaom.com/2013/05/20/new-algorithm-maps-cancer-cells-like-nodes-on-a-social-network/</link>
		<comments>http://gigaom.com/2013/05/20/new-algorithm-maps-cancer-cells-like-nodes-on-a-social-network/#comments</comments>
		<pubDate>Mon, 20 May 2013 20:58:53 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[cancer research]]></category>
		<category><![CDATA[graph analysis]]></category>
		<category><![CDATA[health care]]></category>
		<category><![CDATA[medical research]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=647256</guid>
		<description><![CDATA[A group of researchers from Columbia and Stanford have created a method for turning complex cellular datasets into visualizations that map the similarities between tens of thousands of cells within a tissue sample.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=647256&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Often times, the best way to to get a sense of your data is to look at it. A bunch of of numbers or words might not mean anything sitting within a table, but they start to make a lot more sense when they’re turned into a chart. In fields like mass cytometry, though, where doctors might want to analyze dozens of biological markers for each of tends of thousands of cells in a tissue sample, creating an easy-to-understand chart is easier said than done.</p>
<p>That’s why a group of researchers from Columbia University and Stanford University developed an algorithm that can do just that, turning those cells into something that resembles your social graph. This lets researchers see how the various cells are related to each other so they know , for example, where to focus cancer treatment and what to track as that treatment progresses.</p>
<p>The idea of representing large or complex data as a graph is nothing new, but it has taken on more prominence thanks to the rise of social media and those ubiquitous social graphs that map out who’s connected to whom. As we highlighted recently, however, <a href="http://gigaom.com/2013/05/14/were-witnessing-the-rise-of-the-graph-in-big-data/">graph analysis is becoming more popular</a> outside the realm of social networks, and is being applied to problems that are more complex than just figuring out simple relationships within a network. In cases such as medical research, especially, graphs can provide a very effective way of seeing how potentially hundreds of thousands of data points spanning perhaps hundreds of variables are similar to each other.</p>
<p>That’s exactly what the team at Columbia and Stanford has done with a new algorithm that they’ve demonstrated within the realm of mass cytometry. According to <a href="http://newsroom.cumc.columbia.edu/2013/05/20/computational-tool-translates-complex-data-into-simplified-2-dimensional-images/">a press release announcing the research</a> (which is <a href="http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.2594.html">available via paid download</a> at Nature Biotechnology):</p>
<blockquote id="quote-the-method-called-vi"><p>“The method, called viSNE (visual interactive Stochastic Neighbor Embedding), is based on a sophisticated algorithm that translates high-dimensional data (e.g., a dataset that includes many different simultaneous measurements from single cells) into visual representations similar to two-dimensional ‘scatter plots’ ….</p>
<p>“The viSNE software can analyze measurements of dozens of molecular markers. In the two-dimensional maps that result, the distance between points represents the degree of similarity between single cells. The maps can reveal clearly defined groups of cells with distinct behaviors (e.g., drug resistance) even if they are only a tiny fraction of the total population. This should enable the design of ways to physically isolate and study these cell subpopulations in the laboratory.”</p></blockquote>
<p>I assume they say <em>similar</em> to scatter plots because the algorithm is analyzing data across more than two dimensions, although the resulting chart is essentially the same (i.e., data points with similar characteristics will form clusters).</p>
<div id="attachment_647346" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/05/screen-shot-2013-05-20-at-9-42-09-am.png"><img alt="The results of viSNE, showing cell densities in diagnosis and relapse samples." src="http://gigaom2.files.wordpress.com/2013/05/screen-shot-2013-05-20-at-9-42-09-am.png?w=708&#038;h=403" width="708" height="403" class="size-large wp-image-647346"></a><p class="wp-caption-text">The results of viSNE, showing cell densities in diagnosis and relapse samples.</p></div>
<p>Whether or not they’re technically similar, this research <a href="http://gigaom.com/2013/01/16/has-ayasdi-turned-machine-learning-into-a-magic-bullet/">seems similar to what Ayasdi is doing</a> with its new data-analysis software based on a technique called topological data analysis. In both cases, though, the algorithms aren’t necessarily concerned with how data points interact with one another (like in network graphs), but rather what similar characteristics the points share. Ayasdi’s software has been used in cancer research, too, including on datasets spanning hundreds of patients and tens of thousands of variables.</p>
<p>In theory — although not likely in practice considering the complexity of the datasets medical researchers are dealing with — these approaches are similar to clustering approaches that are also popular among data scientists working with web companies. In areas such as e-commerce or <a href="http://gigaom.com/2013/05/05/how-mailchimp-learned-to-treat-data-like-orange-juice-and-rethink-email-in-the-process/">email management</a>, for example, where there isn’t a strong social element, companies can broadly break customers into distinct groups based on their behavior or interests.</p>
<div id="attachment_642360" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/05/marriedknit-tiff.jpg"><img alt="A sample cluster of subscribers." src="http://gigaom2.files.wordpress.com/2013/05/marriedknit-tiff.jpg?w=708&#038;h=427" width="708" height="427" class="size-large wp-image-642360"></a><p class="wp-caption-text">A sample cluster of MailChimp subscribers.</p></div>
<p>Of course, curing cancer is a slightly more compelling — and difficult — goal than targeted advertising. The algorithms have to be precise so as not to miss similarities hidden within the mass of data. In the case of viSNE, the researchers say they’ve been able to spot small groups of cells (like 20 out of tens of thousands) that might be able to survive chemotherapy and increase the likelihood of a recurring tumor.</p>
<p>But we probably shouldn’t bee too quick to discount the work that web companies do as somehow less valuable than that of cancers researchers, for example. The big data era arguably started with the web, and web companies have generated some of the most important data-analysis techniques and technologies around today (see, for example, Google’s Jeff Dean, with whom I’ll be speaking at our <a href="http://event.gigaom.com/structure/schedule/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=647256+new-algorithm-maps-cancer-cells-like-nodes-on-a-social-network&amp;utm_content=dharrisstructure">Structure conference</a> next month). As <a href="http://gigaom.com/2012/11/27/why-data-is-the-key-to-better-medicine-and-maybe-a-cure-for-cancer/">medical researchers start generating more and more data</a> via cytometry, genome sequencing and even electronic medical records, it will be critical for individuals in all fields to keep track of what data scientists in other fields are doing and <a href="http://gigaom.com/2013/03/26/how-researchers-are-fighting-lung-cancer-using-pagerank/">figure out how that might apply to their own work</a>.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=647256&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=727465"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=727465" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=647256+new-algorithm-maps-cancer-cells-like-nodes-on-a-social-network&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2011/11/connected-world-the-consumer-technology-revolution/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=647256+new-algorithm-maps-cancer-cells-like-nodes-on-a-social-network&utm_content=dharrisstructure">Connected world: the consumer technology revolution</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=647256+new-algorithm-maps-cancer-cells-like-nodes-on-a-social-network&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/01/newnet-q4-platform-mania-and-social-commerce-shakeout/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=647256+new-algorithm-maps-cancer-cells-like-nodes-on-a-social-network&utm_content=dharrisstructure">NewNet Q4: Platform mania and social commerce shakeout</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/20/new-algorithm-maps-cancer-cells-like-nodes-on-a-social-network/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/05/screen-shot-2013-05-20-at-9-42-09-am1-e1369079018409.png?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/05/screen-shot-2013-05-20-at-9-42-09-am1-e1369079018409.png?w=150" medium="image">
			<media:title type="html">Screen-Shot-2013-05-20-at-9.42.09-AM</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/screen-shot-2013-05-20-at-9-42-09-am.png?w=708" medium="image">
			<media:title type="html">The results of viSNE, showing cell densities in diagnosis and relapse samples.</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/marriedknit-tiff.jpg?w=708" medium="image">
			<media:title type="html">A sample cluster of subscribers.</media:title>
		</media:content>
	</item>
		<item>
		<title>The Google Now dilemma: Yes, it&#8217;s kind of creepy &#8212; but it&#8217;s also incredibly useful</title>
		<link>http://gigaom.com/2013/05/03/the-google-now-dilemma-yes-its-kind-of-creepy-but-its-also-incredibly-useful/</link>
		<comments>http://gigaom.com/2013/05/03/the-google-now-dilemma-yes-its-kind-of-creepy-but-its-also-incredibly-useful/#comments</comments>
		<pubDate>Fri, 03 May 2013 19:30:03 +0000</pubDate>
		<dc:creator>Mathew Ingram</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[Anticipatory search]]></category>
		<category><![CDATA[augmented reality]]></category>
		<category><![CDATA[Creepy]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Google Now]]></category>
		<category><![CDATA[Mobile]]></category>
		<category><![CDATA[nexus]]></category>
		<category><![CDATA[Predictive Search]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[the Nexus]]></category>
		<category><![CDATA[web search]]></category>
		<category><![CDATA[web services]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=642114</guid>
		<description><![CDATA[There's no question the kind of data collection Google has to do in the background to power its Google Now service can be a little intrusive -- perhaps too intrusive for some. But it also makes the results extremely useful.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=642114&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>One of the reasons I decided to make the switch from using an iPhone to an Android phone &#8212; in addition to <a href="http://gigaom.com/2013/01/15/why-im-thinking-of-ditching-my-precious-iphone-for-an-android/">the freedom it allowed me</a> from Apple&#8217;s walled garden &#8212; was that I was interested in trying out Google&#8217;s version of &#8220;augmented reality&#8221; search, namely Google Now. Although I&#8217;ve used it periodically over the past few months, the utility of it really started to hit home while I was on a recent trip to Europe and relied on my smartphone as a lifeline. </p>
<p>While there is something undeniably creepy about <a href="http://www.google.com/landing/now/">the Google Now service</a>, I have to admit that it is also very useful &#8212; so much so that I couldn&#8217;t imagine going on a trip without it. I&#8217;m already imagining how it and other kinds of <a href="http://www.technologyreview.com/news/514346/the-data-made-me-do-it/">&#8220;anticipatory data&#8221; services</a> (including Google News updates) might work through Google Glass.</p>
<h2 id="useful-information-when-you-ne">Useful information when you need it</h2>
<p>It&#8217;s not that Google Now is really all that revolutionary, in the sense of being surprising or magical or having whiz-bang special effects: it just <a href="http://www.theverge.com/2012/10/29/3569684/google-now-android-4-2-knowledge-graph-neural-networks">collects a broad range</a> of information about you and your activity from your search history, your calendar, your email, web services you are signed into, and so on, and then uses that to show you information that is relevant to what you are doing or where you happen to be (Google recently <a href="http://googleblog.blogspot.it/2013/04/google-now-on-your-iphone-and-ipad-with.html">introduced it for iOS</a> as well as Android).</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/05/google-now.png"><img src="http://gigaom2.files.wordpress.com/2013/05/google-now.png?w=708" alt="Google Now"    class="aligncenter size-full wp-image-642115" /></a></p>
<p>In a way, that could be part of the reason Google Now is so appealing &#8212; it doesn&#8217;t try to impress you, it just works silently in the background, in more or less the way you would expect it to. That in itself is something to be grateful for.</p>
<p>The first time I noticed myself depending on it (or at least noticing how useful it was), came when I was getting ready for my flight to Italy: sliding upwards from the home button on the Nexus 4 showed a series of Google Now &#8220;cards,&#8221; and <a href="http://www.google.ca/landing/now/#tab=flights">the first one said that my flight</a> had been delayed by an hour. Since I was  panicking at that point about how much I still had to do before leaving for the airport, that information was incredibly helpful. I could take a bit more time and relax.</p>
<p>Meanwhile, the second Google Now card <a href="http://www.google.ca/landing/now/#tab=traffic">showed the traffic</a> on the highway and told me that I should probably give myself more time than usual to get to the airport &#8212; and when I got closer to the time of my departure, a third card showed my boarding pass information, including boarding time and the gate number (Google Now got that info from my calendar, but it also supports <a href="http://www.google.ca/landing/now/#tab=boarding-pass">scannable boarding passes</a> for a limited number of airlines).</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/05/google-now2.png"><img src="http://gigaom2.files.wordpress.com/2013/05/google-now2.png?w=708" alt="Google Now2"    class="aligncenter size-full wp-image-642116" /></a></p>
<h2 id="not-revolutionary-but-evolutio">Not revolutionary, but evolutionary</h2>
<p>Again, none of this information was specific to Google Now, or derived magically by Google search trickery: I could have easily found out about my flight being delayed by using a service like FlightStats, or by checking the website for the airline or the airport itself &#8212; and I could have checked the traffic on any number of sites. But the point is that doing these things would take time, and I was already pressed for time. Seeing it all displayed in front of me in a simple way, without me having to do anything, was exactly the kind of thing a virtual assistant is good for.</p>
<p>Google Now continued to perform this kind of function while I was travelling (once I got a local SIM card, of course, so that I wouldn&#8217;t <a href="http://gigaom.com/2012/10/19/thanks-to-telecom-oligopolies-its-always-raining-in-the-cloud/">get robbed by my carrier</a> for roaming charges). It told me that my connecting flight in Munich was on time, which allowed me to prepare for possibly not making my connection &#8212; and once I arrived in Italy, it informed me of the weather, the traffic from the airport in Rome, and also showed me <a href="http://www.google.ca/landing/now/#tab=nearby-photo-spots">photos of nearby sights</a> that I might want to visit.</p>
<p>These latter aspects were also very useful for someone visiting a foreign country: I didn&#8217;t have much use for them while I was at home, but they instantly became much more important when I was travelling. Like the flight information or traffic, I could have found that content myself by doing a web search &#8212; but it was much handier to have it displayed for me automatically. And I started to imagine what it might be like to simply <a href="http://gigaom.com/2012/06/27/with-google-now-google-search-is-getting-ready-for-project-glass/">look at something like the Colosseum with Google Glass</a> and have information about it appear in front of my eyes. Geeky? Yes. But also hugely useful.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/05/google-now3.png"><img src="http://gigaom2.files.wordpress.com/2013/05/google-now3.png?w=708" alt="Google Now3"    class="aligncenter size-full wp-image-642117" /></a></p>
<h2 id="the-privacy-tradeoff-is-worth-">The privacy tradeoff is worth it</h2>
<p>The part that clearly disturbs some people about Google Now <a href="http://www.telegraph.co.uk/technology/mobile-app-reviews/10032788/Google-Now-for-iOS-review-straddling-the-creepy-line.html">is the data collection</a> that is involved in making it work: the tracking of your web searches, your calendar appointments, your location via GPS, the photos you have posted, the flights you are preparing to take, and so on. There&#8217;s no question that this is invasive &#8212; and some users will undoubtedly decide that it&#8217;s not worth the tradeoff, and choose to keep the information to themselves. I think the benefits outweigh the disadvantages.</p>
<p>Are there ways Google could use this information that I might not like? Of course there are. But I trust that Google is aware enough of the dangers &#8212; both legal and commercial &#8212; of engaging in that kind of behavior that they will avoid it. While some may choose to see Google&#8217;s ambitions in this area as evil, I think the company&#8217;s goal remains the same: <a href="http://www.blindfiveyearold.com/google-evil-plan">to provide services that encourage users</a> to spend more time on the internet and produce more data that improves Google&#8217;s search and/or advertising algorithms. And I am okay with that.</p>
<p>In return for providing some anonymized data and behavior patterns, I get access to a personalized assistant that is not only more unobtrusive than any human version would be, but is also faster and completely free. That&#8217;s a pretty good bargain.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=642114&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=339222"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=339222" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=642114+the-google-now-dilemma-yes-its-kind-of-creepy-but-its-also-incredibly-useful&utm_content=mathewingram">Sign up for a free trial</a>.</p><ul></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/03/the-google-now-dilemma-yes-its-kind-of-creepy-but-its-also-incredibly-useful/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/05/13-03-12-google_now.png?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/05/13-03-12-google_now.png?w=150" medium="image">
			<media:title type="html">13.03.12-Google_Now</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/0bdf7ab171ade0708a11fa3378e6d8cb?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">Mathew</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/google-now.png" medium="image">
			<media:title type="html">Google Now</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/google-now2.png" medium="image">
			<media:title type="html">Google Now2</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/google-now3.png" medium="image">
			<media:title type="html">Google Now3</media:title>
		</media:content>
	</item>
		<item>
		<title>Peter Thiel&#8217;s latest investments: better search and cellular nanotechnology</title>
		<link>http://gigaom.com/2013/04/17/peter-thiels-latest-investments-better-search-and-cellular-nanotechnology/</link>
		<comments>http://gigaom.com/2013/04/17/peter-thiels-latest-investments-better-search-and-cellular-nanotechnology/#comments</comments>
		<pubDate>Wed, 17 Apr 2013 12:00:47 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[breakout labs]]></category>
		<category><![CDATA[nanotechnology]]></category>
		<category><![CDATA[natural language processing]]></category>
		<category><![CDATA[Peter Thiel]]></category>
		<category><![CDATA[semantic search]]></category>
		<category><![CDATA[SkyPhrase]]></category>
		<category><![CDATA[Stealth Biosciences]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=631740</guid>
		<description><![CDATA[Thiel Foundation subsidiary Breakout Labs has funded two new startups called SkyPhrase and Stealth Biosciences that, respectively, are trying to reinvent natural language processing and improve our ability to interact with individual cells.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=631740&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Breakout Labs, an offshoot of PayPal Co-founder Peter Thiel&#8217;s eponymous Thiel Foundation, has funded its first two startups of the year: SkyPhrase and Stealth Biosciences. The former is trying to improve data analysis and interaction via better natural language processing, while the other is trying to improve our health by literally sticking straws into our cells.</p>
<p><a href="https://skyphrase.com/">SkyPhrase</a> is a very early-phase company that, according to its web site, has &#8220;made breakthroughs in algorithms that enable computers to understand more complex language with greater precision than has ever been possible.&#8221; The goal is to improve search functionality but also to give developers a new, easy way to incorporate natural language processing into their apps. The company was founded by Rensselaer Polytechnic Institute Professor Nick Cassimatis.</p>
<p>In January, MIT Technology Review reporter Rachel Metz <a href="http://www.technologyreview.com/news/510056/startup-brings-better-understanding-of-tricky-questions-to-the-web/">covered the company and actually reviewed an early version</a> of the technology as applied to searching through tweets and emails. It wasn&#8217;t yet trained to do what she wanted with tweets but, she wrote, did a &#8220;decent&#8221; job searching through emails. Part of what makes it work appears to be its ability to understand conjunctions, even if it doesn&#8217;t yet have semantic capabilities: &#8220;I could search for, say, &#8216;e-mails from Bob Loblaw in December and January about recipes with a PDF,&#8217; or &#8216;e-mails from Bob Loblaw or Tobias Funke about cookies in December,&#8217;&#8221; Metz explained.</p>
<div id="attachment_631746" class="wp-caption alignright" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/04/nanostraws_sem.png"><img  alt="Nanostraws in a cell" src="http://gigaom2.files.wordpress.com/2013/04/nanostraws_sem.png?w=300&#038;h=200" width="300" height="200" class="size-medium wp-image-631746" /></a><p class="wp-caption-text">Nanostraws in a cell</p></div>
<p>Breakout Labs&#8217; other new investment, <a href="http://stealthbiosciences.com/">Stealth Biosciences</a>, is a team of Stanford professors, executives and entrepreneurs that has invented a way to get materials into and out of individual cells and to monitor their activity via electric probe. Called Nanostraws and Stealth Electrodes, respectively, the companies two techniques do just what they sound like they do: NanoStraws let doctors inject or extract material from cells in the aims of advancing research and delivering personalized medicine, while the electrodes &#8220;automate long-term intracellular electrical recordings of neurons and heart cells.&#8221;</p>
<p>Stealth Biosciences, in particular, seems like a heady endeavor, but that&#8217;s exactly what Breakout Labs is all about. <a href="http://gigaom.com/2011/10/25/peter-thiel-breakout-labs/">Launched in 2011</a>, the organization aims to fund projects too early in their lives to attract traditional venture capital. Those funded aren&#8217;t giving up large equity stakes in their companies, but are expected to provide a &#8220;modest portion&#8221; of their revenues back into the program to fund the next generation of Breakout Labs investments. Other investments thus far include Modern meadow &#8212; a company <a href="http://gigaom.com/2012/08/16/cue-the-protein-printer-peter-thiel-invests-in-artificial-meat/">trying to create artificial meat using 3-D printers</a> &#8212; and AVEtec, a Canadian startup<a href="http://gigaom.com/2012/12/16/peter-thiel-funds-tornado-power-seriously/"> trying to harness the power of tornadoes for good</a>.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=631740&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=362715"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=362715" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=631740+peter-thiels-latest-investments-better-search-and-cellular-nanotechnology&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/report/sector-roadmap-social-customer-service-in-2013/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=631740+peter-thiels-latest-investments-better-search-and-cellular-nanotechnology&utm_content=dharrisstructure">Sector RoadMap: Social customer service in 2013</a></li><li><a href="http://pro.gigaom.com/2012/08/how-emerging-technologies-are-influencing-collaboration/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=631740+peter-thiels-latest-investments-better-search-and-cellular-nanotechnology&utm_content=dharrisstructure">How emerging technologies will influence collaboration</a></li><li><a href="http://pro.gigaom.com/2012/01/newnet-q4-platform-mania-and-social-commerce-shakeout/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=631740+peter-thiels-latest-investments-better-search-and-cellular-nanotechnology&utm_content=dharrisstructure">NewNet Q4: Platform mania and social commerce shakeout</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/17/peter-thiels-latest-investments-better-search-and-cellular-nanotechnology/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/04/electrode-band-schematic.png?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/04/electrode-band-schematic.png?w=150" medium="image">
			<media:title type="html">electrode band schematic</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/nanostraws_sem.png?w=300" medium="image">
			<media:title type="html">Nanostraws in a cell</media:title>
		</media:content>
	</item>
		<item>
		<title>Politics and personalization have more in common than you think</title>
		<link>http://gigaom.com/2013/04/01/politics-and-personalization-have-more-in-common-than-you-think/</link>
		<comments>http://gigaom.com/2013/04/01/politics-and-personalization-have-more-in-common-than-you-think/#comments</comments>
		<pubDate>Mon, 01 Apr 2013 19:29:54 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[content curation]]></category>
		<category><![CDATA[Politics]]></category>
		<category><![CDATA[recommendation engines]]></category>
		<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=625945</guid>
		<description><![CDATA[New research suggests that a phenomenon called biased assimilation makes people view new, inconclusive evidence in ways that support existing biases, leading to increased polarization on topics such as politics or even what we read online. <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=625945&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>FOX News and Prismatic might have more in common than meets the eye. From politics to products, our innate biases affect the way we view the information with which we&#8217;re presented, which means anyone trying to spread a message or effect change via content must do more than just crunch some data.</p>
<p>Aiming to figure out why America is becoming more politically polarized despite traditional beliefs that societies naturally move toward the middle, a group of Stanford researchers <a href="http://www.pnas.org/content/early/2013/03/27/1217220110.abstract?sid=84b01476-faf1-4407-ac83-a20ec77df1cd">considered how our natural biases affect the way we interpret information</a>. What they found is that people tend to view the world through red- or blue-colored glasses: when we see inconclusive information, <a href="http://onlinelibrary.wiley.com/doi/10.1111/j.1751-9004.2009.00203.x/abstract">we intepret it in ways that support our natural political biases</a> and ignore the aspects that don&#8217;t. So if you show the exact same piece of inconclusive information to a group of people, it will likely lead to more polarization rather than to general consensus on the meaning.</p>
<p>It turns out, this phenomenon extends beyond clearly biased media such as FOX or MSNBC and into more objective content sources on the web. When the researchers applied their model to online recommendation engines, they found that pieces of content most-relevant to users are &#8220;always polarizing,&#8221; whereas pieces of information that are merely similar to something someone already likes are only polarizing if the person is already biased. In short: While they&#8217;re able to ignore or at least view objectively less-important stuff, even pretty middle-of-the-road people will take a hard stance on stuff that matters to them.</p>
<p>Of course, how one reacts to research like this largely depends on what one is trying to accomplish. The researchers involved appear to be all about moving people toward the middle on some issues, which is why they created a federal-budget app called Widescope that lets people configure their own budgets and then shows them the similarities with the various budget proposals floating around Washington, D.C. They&#8217;ve also looked into creating social systems that counteract polarization by using trusted information sources (<a href="http://www.eurekalert.org/pub_releases/2013-03/ssoe-anm032913.php">a press release explaining the research suggests</a> Rush Limbaugh or Rachel Maddow) to present information that biased individuals might otherwise be inclined to dismiss.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/04/widescope.png"><img  alt="widescope" src="http://gigaom2.files.wordpress.com/2013/04/widescope.png?w=708&#038;h=418" width="708" height="418" class="aligncenter size-large wp-image-626112" /></a></p>
<p>Applied generally to the web, this approach might help <a href="http://gigaom.com/2013/01/02/why-big-data-might-be-more-about-automation-than-insights/">mitigate some of the effects of the hyper-personalized experience</a> that&#8217;s now possible. You know, the kind of thing that happens when you fill up RSS readers with sources you like, follow like-minded people on Twitter, and  sign up for <a href="http://gigaom.com/2012/10/02/prismatics-bradford-cross-first-we-understand-media-then-the-world/">services that use machine learning</a> to surface even more of the same content based on that homogeneous reading activity. Or when you <a href="http://gigaom.com/2013/01/29/you-might-also-like-to-know-how-online-recommendations-work/">keep searching for the same stuff on Amazon</a> or viewing the same types of movies on Netflix.</p>
<p>Services that go beyond &#8220;injecting serendipity&#8221; into their content feeds could actually try to broaden users&#8217; minds by surfacing content that&#8217;s in some ways very different or counterintutive to what a simple interest graph might show. I&#8217;m not sure how this would look algorithmically, but I&#8217;m envisioning, for example, a semi-regular insertion of content from sources or genres considered the opposite of a readers&#8217; norms but that touch upon topics they&#8217;re interested in. Or vice versa.</p>
<p>I genuinely believe most web startups trying to tackle the problem of content curation want to be helpful as possible, are aware of issues such as biased assimilation and are at least considering methods for counteracting it in order to give users a broader view beyond just what those users <em>think</em> they want to see.</p>
<p>On the other hand, if you wanna lock people into their current beliefs or their current content-consumption habits, that&#8217;s probably a lot easier to do. And sadly, for some politicians and special interest groups, that probably suits them just fine.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-79400p1.html">Shutterstock user Kutlayev Dmitry</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=625945&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=517727"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=517727" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=625945+politics-and-personalization-have-more-in-common-than-you-think&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/report/how-energy-data-will-impact-the-smart-grid/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=625945+politics-and-personalization-have-more-in-common-than-you-think&utm_content=dharrisstructure">How energy data will impact the smart grid</a></li><li><a href="http://pro.gigaom.com/2012/06/over-the-top-video-in-2012-trends-and-technologies-to-watch/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=625945+politics-and-personalization-have-more-in-common-than-you-think&utm_content=dharrisstructure">Over the top in 2012: trends and technologies to watch</a></li><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=625945+politics-and-personalization-have-more-in-common-than-you-think&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/01/politics-and-personalization-have-more-in-common-than-you-think/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/04/shutterstock_79579165.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/04/shutterstock_79579165.jpg?w=150" medium="image">
			<media:title type="html">diverging tracks</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/widescope.png?w=708" medium="image">
			<media:title type="html">widescope</media:title>
		</media:content>
	</item>
		<item>
		<title>3 shades of latency: How Netflix built a data architecture around timeliness</title>
		<link>http://gigaom.com/2013/03/28/3-shades-of-latency-how-netflix-built-a-data-architecture-around-timeliness/</link>
		<comments>http://gigaom.com/2013/03/28/3-shades-of-latency-how-netflix-built-a-data-architecture-around-timeliness/#comments</comments>
		<pubDate>Thu, 28 Mar 2013 19:00:17 +0000</pubDate>
		<dc:creator>Jordan Novet</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Netflix]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=625379</guid>
		<description><![CDATA[Netflix computes information in different ways, depending on how soon the data needs to get served up to customers or evaluated internally. The nuanced approach extends to Facebook, LinkedIn and other webscale companies.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=625379&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Like other companies operating at webscale, Netflix knows that processing and serving up lots of data &#8212; some to customers, some for use on the backend &#8212; doesn&#8217;t have to happen either right away or never. It&#8217;s more like a gray area, and Netflix detailed the uses for three shades of gray &#8212; online, offline and nearline processing &#8212; <a href="http://techblog.netflix.com/2013/03/system-architectures-for.html">in a post on its tech blog on Wednesday</a>.</p>
<div id="attachment_625387" class="wp-caption alignright" style="width: 282px"><a href="http://gigaom2.files.wordpress.com/2013/03/netflix-machinelearningarchitecture-v3.jpg"><img  alt="The Netflix way of processing data online, offline and nearline." src="http://gigaom2.files.wordpress.com/2013/03/netflix-machinelearningarchitecture-v3.jpg?w=272&#038;h=300" width="272" height="300" class="size-medium wp-image-625387" /></a><p class="wp-caption-text">The Netflix way of processing data online, offline and nearline.</p></div>
<p>The whole point of its data architecture is to tackle latency by pointing workloads and tasks toward systems designed to work at their speed. People love to think about Hadoop when they think about web data, but the reality is that relying solely on batch processing means data can get stale and applications probably don&#8217;t include the newest user input.</p>
<p>Netflix uses online processing for receiving information from users in real time and serving up responses right away, such as looking at a new rating or some other customer action to change the set of movies shown to the customer. Real-time processing works best when algorithms are relatively simple and when data is on the smaller side. The data feeding in to computations must also be available right away.</p>
<p>Nearline processing happens when the data needs to be computed in real time but can be stored for serving up at a later point in time. This option makes sense when computations are more complex and are amenable to a more-traditional database-oriented approach. Netflix uses a variety of databases, including MySQL, the NoSQL Cassandra database and its own homemade EVcache system.</p>
<p>Offline processing in Netflix&#8217;s world might also be called batch processing &#8212; think bigger and longer-term Hadoop jobs. It also fits for compute-heavy projects to train new models that will come into use at a later date. And it&#8217;s a backup for situations when real-time processing isn&#8217;t possible.</p>
<p>This online-nearline-offline approach is fairly common among web companies that understand that different applications can tolerate different latencies. LinkedIn has <a href="http://gigaom.com/2013/03/03/how-and-why-linkedin-is-becoming-an-engineering-powerhouse/">built its data infrastructure with the same general theory in mind</a>. Facebook, too, has thought deeply about this. The social network <a href="http://gigaom.com/2013/03/05/facebook-kisses-dram-goodbye-builds-memcached-for-flash/">recently detailed a new memcached-like data store called McDipper</a> that foregoes DRAM for flash in order to cut costs for tasks that can live with slightly higher latency.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=625379&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=302350"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=302350" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=625379+3-shades-of-latency-how-netflix-built-a-data-architecture-around-timeliness&utm_content=gigajordan">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/why-service-providers-matter-for-the-future-of-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=625379+3-shades-of-latency-how-netflix-built-a-data-architecture-around-timeliness&utm_content=gigajordan">Why service providers matter for the future of big data</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=625379+3-shades-of-latency-how-netflix-built-a-data-architecture-around-timeliness&utm_content=gigajordan">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2011/11/dissecting-the-data-5-issues-for-our-digital-future/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=625379+3-shades-of-latency-how-netflix-built-a-data-architecture-around-timeliness&utm_content=gigajordan">Dissecting the data: 5 issues for our digital future</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/03/28/3-shades-of-latency-how-netflix-built-a-data-architecture-around-timeliness/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/03/signalsandmodels-v3.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/03/signalsandmodels-v3.jpg?w=150" medium="image">
			<media:title type="html">SignalsAndModels-v3</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/c00ab753df107b639e76ed4c3ab07ba7?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigajordan</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/netflix-machinelearningarchitecture-v3.jpg?w=272" medium="image">
			<media:title type="html">The Netflix way of processing data online, offline and nearline.</media:title>
		</media:content>
	</item>
		<item>
		<title>How researchers are fighting lung cancer using PageRank</title>
		<link>http://gigaom.com/2013/03/26/how-researchers-are-fighting-lung-cancer-using-pagerank/</link>
		<comments>http://gigaom.com/2013/03/26/how-researchers-are-fighting-lung-cancer-using-pagerank/#comments</comments>
		<pubDate>Tue, 26 Mar 2013 18:45:27 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Cancer]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[graph databases]]></category>
		<category><![CDATA[graph processing]]></category>
		<category><![CDATA[mathematics]]></category>
		<category><![CDATA[medical research]]></category>
		<category><![CDATA[pagerank]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=624307</guid>
		<description><![CDATA[Medical researchers are using a mathematical process similar to Google PageRank in order to identify organs most likely to spread lung cancer throughout the human body.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=624307&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Google&#8217;s PageRank algorithm has forever changed the way we access information by putting the best stuff first, and now researchers are <a href="http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0034637">using the same mathematical models that Google uses to fight the spread of lung cancer</a> within the human body. While there&#8217;s no &#8220;best&#8221; when it comes cancer cells, the aim is to identify tumors more likely to metastasize and then hit them with targeted treatment before the cells have a chance to spread.</p>
<p>The researchers &#8212; who come from the University of Southern California, Scripps Clinic, the Scripps Research Institute, the University of California, San Diego Moores Cancer Center and Memorial Sloan-Kettering &#8212; combined autopsy data from 163 cancer cases (all from before the advent of radiation therapy in order to analyze the natural spread) with applied mathematics in order to carry out their study. What they found,<a href="http://www.scripps.edu/news/press/2013/20130325lung_cancer.html"> according to a press release about the research</a> is that</p>
<blockquote id="quote-metastatic-lung-canc"><p>metastatic lung cancer does not progress in a single direction from primary tumor site to distant locations, which has been the traditional medical view. Instead &#8230; cancer cell movement around the body likely occurs in more than one direction at a time.</p></blockquote>
<div id="attachment_624447" class="wp-caption alignright" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/03/journal-pone-0034637-g001.png"><img  alt="How cancer cells spread. Source: PLOS One" src="http://gigaom2.files.wordpress.com/2013/03/journal-pone-0034637-g001.png?w=300&#038;h=297" width="300" height="297" class="size-medium wp-image-624447" /></a><p class="wp-caption-text">How cancer cells spread. Source: PLOS One</p></div>
<p>Moreover, they found certain organs tend to spread cancer cells more aggressively, while others tend to act as sponges for cancer cells. These sponge organs might still grow tumors, they just don&#8217;t disperse the cells.</p>
<h2 id="the-pagerank-analogy">The PageRank analogy</h2>
<p>The mathematics involved here &#8212; called Markov chain models &#8212; are <a href="http://en.wikipedia.org/wiki/PageRank">similar to what Google uses</a> to determine what web pages are the highest-quality for any given search query. Only whereas Google uses the number and quality of links to determine the probability of a web surfer landing on any given page, these researchers are trying to predict the PageRank of tumors, if you will. So, generally speaking, a kidney would likely have a higher PageRank than a liver because the kidney is more likely to spread cancer cells throughout the body (or, in web-search terms, generate a lot of links to itself).</p>
<div id="attachment_624441" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/03/journal-pone-0034637-g009.png"><img  alt="The network path of cancer cells from lung to liver. Source: PLOS One" src="http://gigaom2.files.wordpress.com/2013/03/journal-pone-0034637-g009.png?w=708&#038;h=596" width="708" height="596" class="size-large wp-image-624441" /></a><p class="wp-caption-text">The network path of cancer cells from lung to liver. Source: PLOS One</p></div>
<p>As data volumes proliferate and relationships between data points become more complex, Markov models are actually becoming pretty popular. Netflix <a href="http://gigaom.com/2012/06/14/netflix-analyzes-a-lot-of-data-about-your-viewing-habits/">uses them in order to predict the movies</a> users will want to watch next.</p>
<p>The weighted connections between various states or web pages or whatever someone is ranking are <a href="http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html">often expressed as the nodes and edges of a graph</a>. Graphs, of course, have become part of the everyday web lexicon thanks to the various <a href="http://gigaom.com/2013/03/14/facebook-tweaks-its-algorithms-to-improve-graph-search-comment-search-coming/">social graphs</a> and <a href="http://gigaom.com/2012/03/15/the-personalized-web-is-just-an-interest-graph-away/">interest graphs</a> that analyze who we&#8217;re connected to (and how) and the types of topics we browse online.</p>
<h2 id="the-web-as-a-data-science-prov">The web as a data science proving ground</h2>
<p>So in the end, perhaps, the most-important contribution of the worldwide web won&#8217;t be the revolution in terms of how we access information, but the web&#8217;s function as a proving ground for advanced statistical methods starring very large and complex data sets like those found in the medical world. Already, for example, another group of medical researchers has used a Markov variant in order to <a href="http://gigaom.com/2013/02/11/researchers-say-ai-prescribes-better-treatment-than-doctors/">create a model they think can prescribe better treatment plans</a> because it analyzes the costs and patient outcomes usually associated with a given treatment for a given symptom.</p>
<div id="attachment_624480" class="wp-caption alignleft" style="width: 307px"><a href="http://gigaom2.files.wordpress.com/2013/03/cholera-copy.jpg"><img  alt="Tracking a cholera outbreak across a river network. Source: Physical Review Letters" src="http://gigaom2.files.wordpress.com/2013/03/cholera-copy.jpg?w=708"   class="size-full wp-image-624480" /></a><p class="wp-caption-text">Tracking a cholera outbreak across a river network. Source: Physical Review Letters</p></div>
<p>Last year, a group of Swiss researchers developed an algorithm that, having access to a relatively small amount of data, <a href="http://gigaom.com/2012/08/13/an-algorithm-for-tracking-viruses-and-twitter-rumors-to-their-source/">can track anything from Twitter rumors to disease outbreaks</a> back to their source. A company called Syapse <a href="http://gigaom.com/2013/01/22/biotech-startup-syapse-wants-to-be-salesforce-com-for-our-genomes/">uses the graph structure to chart the relationships</a> among words across different medical specialties.</p>
<p>One would also be remiss in ignoring the computing and data-storage innovation spurred by the web that has <a href="http://gigaom.com/2012/11/27/why-data-is-the-key-to-better-medicine-and-maybe-a-cure-for-cancer/">improved our ability to handle massive amounts of genetic and other data</a>. As the lung cancer researchers <a href="http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0034637">explain in their paper</a>:</p>
<blockquote id="quote-one-of-the-strengths2"><p>One of the strengths of such a statistical approach is that we need not offer specific biomechanical, genetic, or biochemical reasons for the spread from one site to another, those reasons presumably will become available through more research on the interactions between CTCs and their microenvironment. We [have created] a quantitative and computational framework for the seed-and-soil hypothesis as an ensemble based first step, [that] then can be further refined primarily by using larger, better, and more targeted databases such as ones that focus on specific genotypes or phenotypes, or by more refined modeling of the correlations between the trapping of a CTC at a specific site, and the probability of secondary tumor growth at that location.</p></blockquote>
<p>The long story short is that the more data we have and the easier we can analyze and map it, the better we can treat &#8212; and perhaps even cure &#8212; cancer and other complicated diseases.</p>
<p><em>Feature image is a network map of how lung cancer spreads between organs, where each numbered node correlates with a specific organ.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=624307&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=215516"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=215516" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=624307+how-researchers-are-fighting-lung-cancer-using-pagerank&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=624307+how-researchers-are-fighting-lung-cancer-using-pagerank&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2011/11/connected-world-the-consumer-technology-revolution/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=624307+how-researchers-are-fighting-lung-cancer-using-pagerank&utm_content=dharrisstructure">Connected world: the consumer technology revolution</a></li><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=624307+how-researchers-are-fighting-lung-cancer-using-pagerank&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/03/26/how-researchers-are-fighting-lung-cancer-using-pagerank/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/03/journal-pone-0034637-g003.png?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/03/journal-pone-0034637-g003.png?w=150" medium="image">
			<media:title type="html">journal.pone.0034637.g003</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/journal-pone-0034637-g001.png?w=300" medium="image">
			<media:title type="html">How cancer cells spread. Source: PLOS One</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/journal-pone-0034637-g009.png?w=708" medium="image">
			<media:title type="html">The network path of cancer cells from lung to liver. Source: PLOS One</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/cholera-copy.jpg" medium="image">
			<media:title type="html">Tracking a cholera outbreak across a river network. Source: Physical Review Letters</media:title>
		</media:content>
	</item>
		<item>
		<title>How energy data will impact the smart grid</title>
		<link>http://pro.gigaom.com/report/how-energy-data-will-impact-the-smart-grid/</link>
		<comments>http://pro.gigaom.com/report/how-energy-data-will-impact-the-smart-grid/#comments</comments>
		<pubDate>Tue, 19 Mar 2013 21:40:19 +0000</pubDate>
		<dc:creator><a href="http://pro.gigaom.com/members/adamlesser/" rel="author">Adam Lesser</a></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[3G]]></category>
		<category><![CDATA[algorithms]]></category>
		<category><![CDATA[analytics software]]></category>
		<category><![CDATA[Apple]]></category>
		<category><![CDATA[AutoGrid]]></category>
		<category><![CDATA[automation]]></category>
		<category><![CDATA[Bidgely]]></category>
		<category><![CDATA[Broadband]]></category>
		<category><![CDATA[cellular-networks]]></category>
		<category><![CDATA[clean energy investment]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Comcast]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[Demand Response]]></category>
		<category><![CDATA[demand-side energy management]]></category>
		<category><![CDATA[deregulation]]></category>
		<category><![CDATA[distribution management]]></category>
		<category><![CDATA[dynamic pricing technology]]></category>
		<category><![CDATA[EcoFactor]]></category>
		<category><![CDATA[Ecologic Analytics]]></category>
		<category><![CDATA[Electric power]]></category>
		<category><![CDATA[Electric power distribution]]></category>
		<category><![CDATA[electrical grid]]></category>
		<category><![CDATA[eMeter]]></category>
		<category><![CDATA[ENBALA Power Networks]]></category>
		<category><![CDATA[Energy]]></category>
		<category><![CDATA[energy efficiency]]></category>
		<category><![CDATA[energy hogs]]></category>
		<category><![CDATA[energy management]]></category>
		<category><![CDATA[energy management system]]></category>
		<category><![CDATA[energy savings]]></category>
		<category><![CDATA[Energy Storage]]></category>
		<category><![CDATA[energy visualization technology]]></category>
		<category><![CDATA[energy-data]]></category>
		<category><![CDATA[energy-smart technologies]]></category>
		<category><![CDATA[grid sensors]]></category>
		<category><![CDATA[grid-balancing software]]></category>
		<category><![CDATA[home energy consumption]]></category>
		<category><![CDATA[Home Energy Management]]></category>
		<category><![CDATA[Landis+Gyr]]></category>
		<category><![CDATA[MDMS]]></category>
		<category><![CDATA[Nest]]></category>
		<category><![CDATA[networks]]></category>
		<category><![CDATA[openADR]]></category>
		<category><![CDATA[OPower]]></category>
		<category><![CDATA[Powerit Solutions]]></category>
		<category><![CDATA[public utilities commissions]]></category>
		<category><![CDATA[Reliant]]></category>
		<category><![CDATA[Renewable Energy]]></category>
		<category><![CDATA[renewable energy integration]]></category>
		<category><![CDATA[renewable-energy-generation]]></category>
		<category><![CDATA[Siemens]]></category>
		<category><![CDATA[smart devices]]></category>
		<category><![CDATA[smart energy]]></category>
		<category><![CDATA[Smart Grid]]></category>
		<category><![CDATA[smart meter]]></category>
		<category><![CDATA[smart meters]]></category>
		<category><![CDATA[software as a service]]></category>
		<category><![CDATA[software-based solutions]]></category>
		<category><![CDATA[software-services]]></category>
		<category><![CDATA[state-grid-corporation]]></category>
		<category><![CDATA[Sustainable energy]]></category>
		<category><![CDATA[system operators]]></category>
		<category><![CDATA[Tendril]]></category>
		<category><![CDATA[The HEM]]></category>
		<category><![CDATA[thermostat hardware]]></category>
		<category><![CDATA[toyota]]></category>
		<category><![CDATA[U.S. Energy Information Administration]]></category>
		<category><![CDATA[utilities]]></category>
		<category><![CDATA[venture]]></category>
		<category><![CDATA[venture capital funding]]></category>
		<category><![CDATA[Vinod Khosla]]></category>
		<category><![CDATA[wastewater treatment]]></category>
		<category><![CDATA[wi-fi]]></category>

		<guid isPermaLink="false">http://pro.gigaom.com/?post_type=go-report&#038;p=171585/</guid>
		<description><![CDATA[The deployment of smart meters combined with the growth of cloud computing infrastructure has created opportunities to build business models around the volume of emerging energy data. Those who use data to solve customer problems and leverage decades of software development and advances in big data will attract investment dollars.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=648560&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>The deployment of smart meters combined with the growth of cloud computing infrastructure has created opportunities to build business models around the volume of emerging energy data. Those who use data to solve customer problems and leverage decades of software development and advances in big data will attract investment dollars.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=648560&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=273124"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=273124" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=648560+how-energy-data-will-impact-the-smart-grid-2&utm_content=gigaedit">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=648560+how-energy-data-will-impact-the-smart-grid-2&utm_content=gigaedit">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2011/11/connected-world-the-consumer-technology-revolution/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=648560+how-energy-data-will-impact-the-smart-grid-2&utm_content=gigaedit">Connected world: the consumer technology revolution</a></li><li><a href="http://pro.gigaom.com/2012/04/green-it-q1-ups-downs-for-evs-quest-for-low-power-server/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=648560+how-energy-data-will-impact-the-smart-grid-2&utm_content=gigaedit">Ups and downs for cleantech in Q1</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://pro.gigaom.com/report/how-energy-data-will-impact-the-smart-grid/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://pro.gigaom.com/wp-content/uploads/2013/03/smartmeter.jpg?w=150" />
		<media:content url="http://pro.gigaom.com/wp-content/uploads/2013/03/smartmeter.jpg?w=150" medium="image">
			<media:title type="html">smartmeter</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/4f3860069d181dbeeb398304f5940a9e?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigaedit</media:title>
		</media:content>
	</item>
		<item>
		<title>Researchers&#8217; algorithm intends to get more work out of cloud database servers</title>
		<link>http://gigaom.com/2013/03/12/researchers-algorithm-intends-to-get-more-work-out-of-cloud-database-servers/</link>
		<comments>http://gigaom.com/2013/03/12/researchers-algorithm-intends-to-get-more-work-out-of-cloud-database-servers/#comments</comments>
		<pubDate>Tue, 12 Mar 2013 19:30:14 +0000</pubDate>
		<dc:creator>Jordan Novet</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Databases]]></category>
		<category><![CDATA[Virtual machines]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=619710</guid>
		<description><![CDATA[Researchers at the Massachusetts Institute of Technology have developed an algorithm for predicting workloads, which cloud providers can use to distribute workloads across servers in a more efficient way.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=619710&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Researchers at the Massachusetts Institute of Technology have <a href="http://web.mit.edu/newsoffice/2013/making-cloud-computing-more-efficient-0312.html">developed an algorithm</a> that aims to make database cloud infrastructure more efficient by pushing more similar workloads onto fewer servers, rather than distributing them as widely as possible.</p>
<p>The premise is surprising, given that many database companies make a point of divvying up the responsibility of processing to keep latency low. But if cloud providers build on and adopt the researchers&#8217; <a href="http://dbseer.org/">DBSeer</a> algorithm, it could improve cloud database performance.</p>
<p>Infrastructure-as-a-Service (IaaS) providers run virtual machines on servers. That might not be the most efficient approach for databases, because resources aren&#8217;t shared among the applications running on any given server, the researchers argued in a recent <a href="http://people.csail.mit.edu/barzan/papers/cidr_2013.pdf">paper</a>. It might be better to observe current workloads, predict the needs of future workloads and bring together the different sorts of loads on different servers. Then cloud providers could adjust service-level agreements to promise a certain level of latency rather than charge customers based on the number and size of virtual machines, the researchers noted.</p>
<p>DBSeer might also be of interest to database appliance and server vendors. Teradata is incorporating the algorithm into proprietary software. Meanwhile, one of the MIT researchers, Carlo Curino, now works at Microsoft, and Chinese webscale server vendor Quanta funded the research.</p>
<p>So far, DBSeer, which is <a href="https://github.com/barzan/dbseer">available on GitHub</a>, has only been shown to accurately predict workload needs for transactional MySQL databases. More research would be necessary to apply the algorithm to other database management systems.</p>
<p>The change in thinking could make good financial sense. The more hardware in use inside a cloud provider&#8217;s data centers, the more expensive it is for customers. If the appliances could work more efficiently, costs could drop.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/pic-7222396/stock-photo-network-administrator-shot-with-long-exposure-while-working-in-server-room.html">Shutterstock user Johann Helgason</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=619710&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=242816"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=242816" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=619710+researchers-algorithm-intends-to-get-more-work-out-of-cloud-database-servers&utm_content=gigajordan">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/04/aws-storage-gateway-jolts-cloud-storage-ecosystem/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=619710+researchers-algorithm-intends-to-get-more-work-out-of-cloud-database-servers&utm_content=gigajordan">AWS Storage Gateway jolts cloud-storage ecosystem</a></li><li><a href="http://pro.gigaom.com/report/how-fourth-quarter-2012-will-affect-it-spending-in-2013/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=619710+researchers-algorithm-intends-to-get-more-work-out-of-cloud-database-servers&utm_content=gigajordan">How fourth-quarter 2012 will affect IT spending in 2013</a></li><li><a href="http://pro.gigaom.com/2012/12/why-converged-infrastructure-is-crucial-to-the-data-center/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=619710+researchers-algorithm-intends-to-get-more-work-out-of-cloud-database-servers&utm_content=gigajordan">The role of converged infrastructure in the data center</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/03/12/researchers-algorithm-intends-to-get-more-work-out-of-cloud-database-servers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/03/shutterstock_7222396.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/03/shutterstock_7222396.jpg?w=150" medium="image">
			<media:title type="html">shutterstock_7222396</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/c00ab753df107b639e76ed4c3ab07ba7?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigajordan</media:title>
		</media:content>
	</item>
		<item>
		<title>The big data world is operating at 1 percent</title>
		<link>http://gigaom.com/2013/03/10/the-big-data-world-is-operating-at-1-percent/</link>
		<comments>http://gigaom.com/2013/03/10/the-big-data-world-is-operating-at-1-percent/#comments</comments>
		<pubDate>Sun, 10 Mar 2013 17:30:13 +0000</pubDate>
		<dc:creator>Gurjeet Singh, Guest Contributor</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[Ayasdi]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[Structure Data 2013]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=618443</guid>
		<description><![CDATA[We talk a lot about big data, but only analyze 1 percent of what's available. In order to take advantage of the other 99 percent, we need to reconsider how we do big data.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=618443&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Many would be shocked to know that researchers analyze and gather insights from only 1 percent of the world’s data. That 1 percent of analyzed data has been the only driver of innovation and insights in what we now know as “big data.” The other 99 percent of the 1 quintillion bytes of data that is collected every day (according to a recent study from IDC) remains untouched.</p>
<p>We all know that big data has so much promise. For a very large number of problems today, the effective use of data is a bottleneck. The drug discovery problem is more about data than chemistry. The discovery of new energy sources is more about data than geology. It’s the same for tracking terrorists, detecting fraud, and more.</p>
<p>Today, we recognize that these, and many other critical global issues, are all data problems. This fact alone has given rise to a huge investment into big data, created the hottest job title around –data scientist — and propelled the valuations of private data analytics providers into the billions. However, imagine the endless possibilities when the world is operating on the insights gathered from 100 percent of its data.</p>
<h2 id="realizations">Realizations</h2>
<p>Where do you start when you have a data set as large as the human genome, for example, or President Obama’s recent call to map the human brain? To achieve the breakthroughs we need to address the world’s most perplexing problems, we need to fundamentally change the way we gain knowledge from data. Here’s what we need to start thinking about:</p>
<ul><li><strong>Starting with queries is a dead end:</strong> Queries are not inherently bad. In fact, they are essential once you know what question to ask. That’s the key: the flaw is starting with queries in the hope that they will uncover a needle in the massive digital haystack. (Spoiler alert: they won’t.)</li>
<li><strong>Data has a cost:</strong> Storing data is no longer expensive, in most cases. Even querying large amounts of data is becoming more cost effective with tools like Hadoop and Amazon’s Redshift. This is just the hard cost side of the equation, though.</li>
<li><strong>Insights are value:</strong> The only reason why we bear the cost is because we believe that data has insights that unlock value. Ultimately, the undiscovered insights that organizations miss have a much higher cost in terms of being able to solve big problems quickly, accelerate innovation and drive growth. The cost of data collection can be high, but the cost of ineffectual analysis is even higher. The tools for getting at insights don’t exist today. Today, we rely on very smart human beings to come up hypotheses and use our tools to validate — or invalidate — those hypotheses. This is a flawed strategy since it relies on (arguably smart) guesswork.</li>
<li><strong>You have the right data today:</strong> There’s often a belief that, “If we only had more data, we could get the answer we’re looking for.” Far too much time and money is wasted collecting new data when more can be done with the data already at hand. For example, Ayasdi recently published a study in Nature Scientific Reports that shows important new insights from a 12-year-old breast cancer study that had been thoroughly analyzed for over a decade.</li>
</ul><h2 id="big-data-is-the-beginning-not-">Big data is the beginning, not the end</h2>
<p>I’m very concerned that the growing hype around the term big data has set us all up for disappointment. Query-based analysis is fine for a certain class of problems, but it will never be able to deliver on the expectations the market has for big data.</p>
<p>We are on the cusp of critical breakthroughs in cancer research, energy exploration, drug discovery, financial fraud detection and more. It would be a crime if the passion, interest and dollars invested to solve critical global problems like these were sidetracked by a “big data bubble.”</p>
<p>We can and should expect more from data analysis, and we need to recognize the capabilities that the next generation of solutions must be able to deliver:</p>
<ul><li><strong>Empower domain experts:</strong> The world cannot produce data scientists fast enough to scale to the size of the problem set. Let’s stop developing tools just for them. Instead, we need to develop tools for the business users: biologists, geologists, security analysts and the like. They understand the context of the business problem better than anyone, but might not be up to date with the latest in technology or mathematics.</li>
<li><strong>Accelerate discovery:</strong> We need to get to critical insights faster. The promise of big data is to “operate at the speed of thought.” It turns out that the speed of thought is not that fast. If we depend on this approach, then we will never get to the critical insights quickly enough because we’ll never be able to ask all of the questions of all of the data.</li>
<li><strong>Marriage of man and machine:</strong> To get to those insights faster, we need to invest in machine intelligence. We need machines to do more of the heavy lifting when it comes to finding the clusters, connections and relationships between data points that gives business users a much better starting point to begin discovering insights. In fact, algorithmic discovery approaches can solve these problems by looking for rare, but statistically significant signals in large datasets that humans would never be able to find. For example, in a recent study, previously unreported drug side effects were found by algorithmically searching through web search engine logs.</li>
<li><strong>Analyze data in all its forms:</strong> It’s understood that researchers need to analyze both structured and unstructured data. We need to recognize the diversity and depth of unstructured data: text in all languages, voice, video and facial recognition.</li>
</ul><p>When it comes to the evolution of big data, we’ve only begun to scratch the surface. It stands to reason that if we continue to analyze 1 percent of data, then we’ll only tap into 1 percent of it’s potential. If we’re able to analyze the other 99 percent, then think about all of the ways that we can change the world. We can accelerate economic growth, cure cancer and other diseases, reduce the risk of terrorist attacks, and many other big ticket challenges that we’re faced with.</p>
<p>That’s something that we can all rally around.</p>
<p><em>Gurjeet Singht is the co-founder and CEO of <a href="http://www.ayasdi.com/">Ayasdi</a></em><em>, an insight discovery platform built on topological data analysis technology. He will be speaking at <a href="http://event.gigaom.com/structuredata/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=618443+the-big-data-world-is-operating-at-1-percent&amp;utm_content=gigaguest">Structure: Data</a>, March 20-21 in New York.</em></p>
<p><em>Have an idea for a post you’d like to contribute to GigaOm? Click <a href="http://gigaom.com/2012/11/28/have-an-idea-for-a-great-guest-post-heres-what-you-need-to-know/">here for our guidelines</a> and contact info.</em></p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-56831p1.html">Shutterstock user Sergey Lavrentev</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=618443&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=47460"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=47460" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=618443+the-big-data-world-is-operating-at-1-percent&utm_content=gigaguest">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2011/11/connected-world-the-consumer-technology-revolution/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=618443+the-big-data-world-is-operating-at-1-percent&utm_content=gigaguest">Connected world: the consumer technology revolution</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=618443+the-big-data-world-is-operating-at-1-percent&utm_content=gigaguest">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/01/newnet-q4-platform-mania-and-social-commerce-shakeout/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=618443+the-big-data-world-is-operating-at-1-percent&utm_content=gigaguest">NewNet Q4: Platform mania and social commerce shakeout</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/03/10/the-big-data-world-is-operating-at-1-percent/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/03/shutterstock_77074264.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/03/shutterstock_77074264.jpg?w=150" medium="image">
			<media:title type="html">needle in a haystack</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/4411542bbd7a2a9a2fc2a1b38809e45c?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigaguest</media:title>
		</media:content>
	</item>
	</channel>
</rss>