<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>GigaOM &#187; R</title>
	<atom:link href="http://gigaom.com/tag/r/feed/" rel="self" type="application/rss+xml" />
	<link>http://gigaom.com</link>
	<description></description>
	<lastBuildDate>Wed, 19 Jun 2013 13:47:17 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='gigaom.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/0db8f6557d022075dbbf010c54d46d93?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>GigaOM &#187; R</title>
		<link>http://gigaom.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://gigaom.com/osd.xml" title="GigaOM" />
	<atom:link rel='hub' href='http://gigaom.com/?pushpress=hub'/>
		<item>
		<title>We don&#8217;t need more data scientists &#8212; just make big data easier to use</title>
		<link>http://gigaom.com/2012/12/22/we-dont-need-more-data-scientists-just-simpler-ways-to-use-big-data/</link>
		<comments>http://gigaom.com/2012/12/22/we-dont-need-more-data-scientists-just-simpler-ways-to-use-big-data/#comments</comments>
		<pubDate>Sat, 22 Dec 2012 20:00:30 +0000</pubDate>
		<dc:creator>Scott Brave, Baynote</dc:creator>
				<category><![CDATA[baynote]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[CMS]]></category>
		<category><![CDATA[data architecture]]></category>
		<category><![CDATA[data scientists]]></category>
		<category><![CDATA[Guest Post]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[hive]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[scott brave]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=596109</guid>
		<description><![CDATA[Sure, more data scientists would be great. But Scott Brave, of Baynote, says the better solution is to create analytics products that are so easy to use that you don't even need a data scientist.
<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=596109&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Virtually any article today about big data inevitably turns to the notion that the country is suffering from a crucial shortage of data scientists. A much-talked-about 2011 <a href="http://www.mckinsey.com/insights/mgi/research/technology_and_innovation/big_data_the_next_frontier_for_innovation">McKinsey &amp; Co. survey</a> pointed out that many organizations lack both the skilled personnel needed to mine big data for insights and the structures and incentives required to use big data to make informed decisions and act on them.</p>
<p>What seems to be missing from all of these discussions, though, is a dialogue about how to steer around this bottleneck and make big data <i>directly</i> accessible to business leaders. We have done it before in the software industry, and we can do it again.</p>
<p>To accomplish this goal, it&#8217;s helpful to understand the data scientist&#8217;s role in big data. Currently, big data is a melting pot of distributed data architectures and tools like Hadoop, NoSQL, Hive and R. In this highly technical environment, data scientists serve as the gatekeepers and mediators between these systems and the people who run the business &#8211; the domain experts.</p>
<p>While difficult to generalize, there are three main roles served by the data scientist: data architecture, machine learning, and analytics. While these roles are important, the fact is that not every company actually needs a highly specialized data team of the sort you&#8217;d find at Google or Facebook. The solution then lies in creating fit-to-purpose products and solutions that abstract away as much of the technical complexity as possible, so that the power of big data can be put into the hands of business users.</p>
<p>By way of example, think back to the web content management revolution at the turn of the century. Websites were all the rage, but the domain experts were continually banging their heads against the wall – we had an IT bottleneck. Every new piece of content had to be scheduled and sometimes hard-coded by the IT elite. So how was it resolved? We generalized and abstracted the basic needs into web content management systems and made them easy for non-techies to use. As long as you didn&#8217;t need anything too crazy, the problem was solved easily, and the bottleneck averted.</p>
<p>Let&#8217;s dig a little deeper into the three main roles of today&#8217;s data scientist, using online commerce as a backdrop.</p>
<h2>Data Architecture</h2>
<p>The key to reducing complexity is to limit scope. Nearly every ecommerce business is interested in capturing user behavior – engagements, purchases, offline transactions and social data – and almost every one of them has a catalog and customer profiles.</p>
<p>Limiting scope to this basic functionality would allow us to create templates for the standard data inputs, making both data capture and connecting the pipes much simpler. We&#8217;d also need to find meaningful ways to package the different data architectures and tools, which currently include Hadoop, Hbase, Hive, Pig, Cassandra and Mahout. These packages should be fit for purpose. It comes down to the 80/20 rule: 80 percent of big data use cases (which is all most ecommerce businesses need), can be achieved with 20 percent of the effort and technology.</p>
<h2>Machine Learning</h2>
<p>Surely we need data scientists in machine learning, right? Well, if you have very customized needs, perhaps. But most of the standard challenges that require big data, like recommendation engines and personalization systems, can be abstracted out. For example, a large part of the job of a data scientist is crafting &#8220;features,&#8221; which are meaningful combinations of input data that make machine learning effective. As much as we&#8217;d like to think that all data scientists have to do is plug data into the machine and hit &#8220;go,&#8221; the reality is people need to help the machine by giving it useful ways of looking at the world.</p>
<p>On a per domain basis, however, feature creation could be templatized, too. Every commerce site has a notion of buy flow and user segmentation, for example. What if domain experts could directly encode their ideas and representations of their domains into the system, bypassing the data scientists as middleman and translator?</p>
<h2>Analytics</h2>
<p>It&#8217;s never easy to automatically surface the most valuable insights from data. There are ways to provide domain-specific lenses, however, that allow business experts to experiment – much like a data scientist. This seems to be the easiest problem to solve, as there are a variety of domain-specific analytics products already on the market.</p>
<p>But these products are still more constrained and less accessible to domain experts than they could be. There is definitely room for a friendlier interface. We also need to take into consideration how the machine learns from the results that analytics deliver. This is the critical feedback loop, and business experts want to provide modifications into that loop. This is another opportunity to provide a templatized interface.</p>
<p>As we learned in the CMS space, these solutions won&#8217;t solve every problem every time. But applying a technology solution to the broader set of data issues will relieve the data scientist bottleneck. Once domain experts are able to work directly with machine learning systems, we may enter a new age of big data where we learn from each other. Maybe then, big data will actually solve more problems than it creates.</p>
<p><em>Scott Brave is co-founder and CTO of <a href="http://www.baynote.com">Baynote</a>, an e-tail and e-commerce advisory business. </em><em>He is also an editor of the &#8220;International Journal of Human-Computer Studies” (Amsterdam: Elsevier) and co-author of “Wired for speech: How voice activates and advances the human-computer relationship” (Cambridge, MA: MIT Press).</em></p>
<p><em>Photo courtesy of <a href="http://www.shutterstock.com/gallery-461077p1.html">Sergey Nivens</a>/Shutterstock.com</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=596109&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=498358"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=498358" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=596109+we-dont-need-more-data-scientists-just-simpler-ways-to-use-big-data&utm_content=gigaguest">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=596109+we-dont-need-more-data-scientists-just-simpler-ways-to-use-big-data&utm_content=gigaguest">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/04/sector-roadmap-hadoop-platforms-2012/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=596109+we-dont-need-more-data-scientists-just-simpler-ways-to-use-big-data&utm_content=gigaguest">2012: The Hadoop infrastructure market booms</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=596109+we-dont-need-more-data-scientists-just-simpler-ways-to-use-big-data&utm_content=gigaguest">Infrastructure Q1: Cloud and big data woo enterprises</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/12/22/we-dont-need-more-data-scientists-just-simpler-ways-to-use-big-data/feed/</wfw:commentRss>
		<slash:comments>60</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/12/shutterstock_115491706.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/12/shutterstock_115491706.jpg?w=150" medium="image">
			<media:title type="html">shutterstock_115491706</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/4411542bbd7a2a9a2fc2a1b38809e45c?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigaguest</media:title>
		</media:content>
	</item>
		<item>
		<title>How 0xdata wants to help everyone become data scientists</title>
		<link>http://gigaom.com/2012/08/14/how-0xdata-wants-to-help-everyone-become-data-scientists/</link>
		<comments>http://gigaom.com/2012/08/14/how-0xdata-wants-to-help-everyone-become-data-scientists/#comments</comments>
		<pubDate>Tue, 14 Aug 2012 19:45:04 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[0xdata]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[HDFS]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[Platfora]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=552457</guid>
		<description><![CDATA[Although it's still a work in progress, 0xdata thinks it has the answer to the problem of doing advanced statistical analysis at scale: Build on HDFS for scale, use the widely known R programming language and hide it all under a simple interface.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=552457&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>There&#8217;s a trend afoot in the big data space <a href="http://gigaom.com/cloud/want-to-ditch-your-data-scientists-heres-are-7-startups-that-can-help/">to turn data science from black magic into child&#8217;s play</a>, and one of the newest companies trying to pull of this technological alchemy is <a href="http://www.0xdata.com/index.html">0xdata</a>. The bootstrapped startup, pronounced &#8220;hexadata,&#8221; is the brainchild of former DataStax engineer, and Platfora co-founder, SriSatish Ambati, and it&#8217;s trying to blend Hadoop, R and Google BigQuery into the ultimate tool for statistical analysis. Scientists, data analysts or whoever ultimately uses the product only need to be experts in their domains, not in statistics.</p>
<p>At its core, <a href="http://www.0xdata.com/faq.html">oxdata&#8217;s flagship product, called H2O</a>, is a statistical analysis engine that uses the Hadoop Distributed File System (HDFS) as its storage platform, but the goal is to make it <a href="http://gigaom.com/cloud/google-opens-up-its-biq-query-data-analytics-service-to-all/">as simple as using a Google service such as BigQuery</a>. Users will interact with H2O via a simple web-search-like bar and standard <a href="http://www.r-project.org/">R statistical-analysis</a> syntax, but H2O will run machine-learning algorithms behind the scenes. Alternatively, users can call out to H2O from Microsoft Excel or the <a href="http://rstudio.org/">RStudio</a> integrated development environment using a REST API.</p>
<div id="attachment_552941" class="wp-caption alignleft" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2012/08/big_banner-copy.jpg"><img  title="big_banner copy" src="http://gigaom2.files.wordpress.com/2012/08/big_banner-copy.jpg?w=300&#038;h=114" alt="" width="300" height="114" class="size-medium wp-image-552941" /></a><p class="wp-caption-text">Although BigQuery is a SQL service hosted by Google, 0xdata follows a similar theory on simplicity.</p></div>
<p>However they choose to leverage the product, Ambati said, the scale of the underlying data and the complexity of running advanced analysis are details that need to be hidden. It&#8217;s the same theory that underlies Platfora, the company Ambati co-founded last year with his former DataStax colleague Ben Werther, although their approaches appear to be different. Whereas Platfora is <a href="http://gigaom.com/cloud/platfora-gets-5-7m-to-make-hadoop-mainstream/">trying to disrupt the data warehouse market</a> by building a next-generation user experience atop Hadoop, 0xdata is trying to change the way users interact with popular statistical software such as R.</p>
<p>But either way, Ambati says of new data-analysis products, &#8220;[There are] no bragging rights for making it simple. If you don&#8217;t do that, you won&#8217;t be able to go forward.&#8221;</p>
<p>oxdata is also putting a focus on speed, both in terms of how fast it processes data and how fast it lets users react. Google search changed our thinking around how many questions people can ask successively, Ambati explained, and data analysts should have the same experience. That&#8217;s why H2O provides approximate results at every step in the analysis process. Rather than wait for the entire job to run and the exact results to be computed, users can get a general idea of results and kill the job and start over quicker if they&#8217;re completely outside the expected range.</p>
<p>But it will be a while before the public gets a chance to see whether H2O lives up to its promises. Ambati said the product is just four months into development and won&#8217;t have its first set of algorithms available for another few months. His team of eight engineers has &#8220;built a lot of cool stuff,&#8221; but now it needs to round out the process and turn its code for H2O into an actual product.</p>
<p>Still, having decided to tackle data as a system, Ambati and his team are having a lot of fun. &#8220;We are live-and-die-with-infrastructure people,&#8221; he said, but for a bunch of folks who spent a lot of time learning math, it&#8217;s like going back to the their days as computer science students.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-11418p1.html">Shutterstock user Bruce Rolff</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=552457&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=866934"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=866934" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=552457+how-0xdata-wants-to-help-everyone-become-data-scientists&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=552457+how-0xdata-wants-to-help-everyone-become-data-scientists&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/report/the-new-economics-of-enterprise-data-warehousing/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=552457+how-0xdata-wants-to-help-everyone-become-data-scientists&utm_content=dharrisstructure">How data warehousing is now a cost-effective solution for businesses</a></li><li><a href="http://pro.gigaom.com/report/how-to-manage-big-data-without-breaking-the-bank/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=552457+how-0xdata-wants-to-help-everyone-become-data-scientists&utm_content=dharrisstructure">How to manage big data without breaking the bank</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/08/14/how-0xdata-wants-to-help-everyone-become-data-scientists/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/08/shutterstock_107081264-e1344971009541.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/08/shutterstock_107081264-e1344971009541.jpg?w=150" medium="image">
			<media:title type="html">shutterstock_107081264</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/08/big_banner-copy.jpg?w=300" medium="image">
			<media:title type="html">big_banner copy</media:title>
		</media:content>
	</item>
		<item>
		<title>Data for doctors: Big data meets a big business</title>
		<link>http://gigaom.com/2011/08/08/data-for-doctors-big-data-meets-a-big-business/</link>
		<comments>http://gigaom.com/2011/08/08/data-for-doctors-big-data-meets-a-big-business/#comments</comments>
		<pubDate>Tue, 09 Aug 2011 00:00:14 +0000</pubDate>
		<dc:creator>Stacey Higginbotham</dc:creator>
				<category><![CDATA[@CNN]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Cloud Storage]]></category>
		<category><![CDATA[data analysis]]></category>
		<category><![CDATA[data collection]]></category>
		<category><![CDATA[Enterprise]]></category>
		<category><![CDATA[karmasphere]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[project-dallas]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Revolution Analytics]]></category>
		<category><![CDATA[SAS]]></category>
		<category><![CDATA[tableau]]></category>
		<category><![CDATA[tibco]]></category>
		<category><![CDATA[Visualization]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=389452</guid>
		<description><![CDATA[Forget the division between structured and unstructured data. For the benefits of the big data era to reach businesses bottom lines or to change behaviors, companies will have to figure out how to bring the results of Hadoop analytics to HR and middle managers.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=389452&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom2.files.wordpress.com/2010/09/istock_000001007494xsmall-1.jpg"><img  title="istock_000001007494xsmall (1)" src="http://gigaom2.files.wordpress.com/2010/09/istock_000001007494xsmall-1.jpg?w=708" alt=""   class="alignleft size-full wp-image-156144" /></a> While it&#8217;s fashionable to focus on data, one has to remember that what matters is what you can do with the data and how it can help grow a business. That is the rallying cry in the web world from Twitter to Facebook but also in an unexpected place: a Seattle hospital.</p>
<p>Ted Corbett, the director of knowledge management at Seattle Children&#8217;s Hospital, is using software from a company called Tableau to draw smart inferences from the 10 terabytes of data locked up in his servers and warehousing appliances. The hospital, which employs 5,000 people, uses the visualizations and easy access to data hidden away in multiple places to cut down on waste, reduce errors in medicine and help plan clinical trials. As organizations seek to store, analyze and derive insights from their data, companies are creating software to help them make sense of it all &#8212; because it&#8217;s not how big the data is, but what you can do with it.</p>
<h2><strong>Digestible data </strong></h2>
<p><a href="http://www.tableausoftware.com/">Tableau</a> is one of several companies attempting to funnel massive amounts of data into a more human understanding. Others include <a href="http://www.bigdatacloud.com/2011/05/20/karmasphere-joins-ibm-to-help-clients-adopt-big-data-analytics/">Karmasphere</a>, <a href="http://gigaom.com/cloud/the-data-whisperer-norman-nie-of-revolution-analytics/">Revolution Analytics</a>,TIBCO and SAS. Corbett explains that offering employees access to data via a simplified dashboard has helped the hospital better schedule its time in operating rooms and eliminate waste from the supply chain by improving the information needed for the hospital to implement some of its lean manufacturing tenets.</p>
<p>&#8220;Nurses and doctors, when they need test tubes or syringes, they would hoard things so when it got busy they would have them, but knowing whether you are pulling enough of the right supplies and having them available is a way to save on costs. So far we&#8217;ve saved $3 million out of the supply chain, and using Tableau we can find new ways to eke out more savings,&#8221; Corbett said.</p>
<h2><strong>Tapping data for real insights</strong></h2>
<p>Visualizations are one way companies are mining their existing data. Other companies are making products that integrate into existing ways of managing data such as Karmasphere or even Microsoft with its <a href="http://www.microsoft.com/windowsazure/features/marketplace/">data marketplace product</a>, where a manager or analyst can buy access to data and import it into Excel. Instead of creating more frivolous infographics, these products are able to help businesses tap into existing data better and make some kind of meaning from it.</p>
<p>These companies may not have huge adoption today, but as the amount of data companies analyze expands, such solutions should become more popular. As Corbett notes he&#8217;s expecting to see up to a tenfold increase in data over the next few years thanks to things such as M2M communications within the hospital, personal genomics in medicines and electronic health records. &#8220;We&#8217;ll have an expansion in machine data &#8212; everything is Wi-Fi-enabled and pumping data all around the hospital&#8211; and there will be genomic data and electronic records,&#8221; Corbett said. He estimates that each patient record contains a &#8220;couple of gigs of data per patient,&#8221; which can add up.</p>
<h2><strong>Data tools for the masses</strong></h2>
<p>It&#8217;s helpful to have tools that researchers and statisticians can use to make sense of the massive piles of data they have to sift through. But for big data to really become a game-changing force in business, companies will have to develop tools for the common man &#8212; or at least the middle manager. Much like broadband, computers, electricity and other big changes in productivity, it has to filter down to the masses to really change the world.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=389452&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=37016"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=37016" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=389452+data-for-doctors-big-data-meets-a-big-business&utm_content=shigginbotham">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=389452+data-for-doctors-big-data-meets-a-big-business&utm_content=shigginbotham">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2012/01/newnet-q4-platform-mania-and-social-commerce-shakeout/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=389452+data-for-doctors-big-data-meets-a-big-business&utm_content=shigginbotham">NewNet Q4: Platform mania and social commerce shakeout</a></li><li><a href="http://pro.gigaom.com/2012/01/newnet-q4-platform-mania-and-social-commerce-shakeout/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=389452+data-for-doctors-big-data-meets-a-big-business&utm_content=shigginbotham">NewNet Q4: Platform mania and social commerce shakeout</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2011/08/08/data-for-doctors-big-data-meets-a-big-business/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2010/09/istock_000001007494xsmall-1.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2010/09/istock_000001007494xsmall-1.jpg?w=150" medium="image">
			<media:title type="html">istock_000001007494xsmall (1)</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/aee37121e18bf76bb9fee4494bab237a?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">shigginbotham</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2010/09/istock_000001007494xsmall-1.jpg" medium="image">
			<media:title type="html">istock_000001007494xsmall (1)</media:title>
		</media:content>
	</item>
		<item>
		<title>How OkCupid Demystifies Dating With Big Data</title>
		<link>http://gigaom.com/2011/02/11/okcupid-demystifies-dating-with-big-data/</link>
		<comments>http://gigaom.com/2011/02/11/okcupid-demystifies-dating-with-big-data/#comments</comments>
		<pubDate>Fri, 11 Feb 2011 21:59:33 +0000</pubDate>
		<dc:creator>Mike Minelli</dc:creator>
				<category><![CDATA[@CNN]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[OkCupid]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=297098</guid>
		<description><![CDATA[The interesting story behind OkCupid, the online dating site recently acquired by Match.com, is OkTrends, its blog that analyzes the site's wealth of data to shed light on our love lives. But the interesting story behind OkTrends is its use of R to power those analytics.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=297098&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom2.files.wordpress.com/2011/02/iphone-heart.png"><img title="iphone-heart" src="http://gigaom2.files.wordpress.com/2011/02/iphone-heart.png?w=300&#038;h=200" alt="" width="300" height="200" class="size-medium wp-image-297008 alignright"></a>The <em>Boston Globe</em> describes OkCupid as the “Google of online dating.” That’s a good description. With more than 3.5 million active members and more than 7 million unique logins per month, OkCupid, <a href="http://techcrunch.com/2011/02/02/match-com-acquires-online-dating-site-okcupid-for-50-million-in-cash/">which was recently acquired by Match.com</a>, probably is what it claims to be — the fastest-growing free online dating site on the planet.</p>
<p>As much as I’m in favor of anything that improves the chances of finding true love (or any kind of love, for that matter), what fascinates me the most about OkCupid is <a href="http://blog.okcupid.com/">OkTrends</a>, the company’s official blog.</p>
<p><strong>OkTrends Brings Big Data to the Masses</strong><br>
Written largely by Christian Rudder, the company’s co-founder and editorial director, OkTrends is a veritable treasure chest of sexy social insight generated through a highly creative mash-up of off-the-shelf and DIY analytic techniques.</p>
<p>A recent post, “Big Lies People Tell in Online Dating,” reveals some pretty amazing truths behind the tall tales that are told regularly in the quest for romance. With unflinching mathematical precision, accompanied by several convincing charts, the post shows how people routinely lie about their height, income and looks. This revelation isn’t exactly surprising, but it’s fascinating to see how Rudder and his merry crew of quants use the power of analytics to strip away the mythology and tell the real story.</p>
<p>One of the most interesting aspects of the blog is the size of the samples. As Rudder noted in the blog’s inaugural post back in June 2009:</p>
<blockquote><p>… a word about statistical validity: the best questions on OkCupid have been answered over a million times. Therefore we have unique insights into the American mindset. A quick comparison:</p></blockquote>
<p><a href="http://gigaom2.files.wordpress.com/2011/02/okcupid.png"><img title="OkCupid chart" src="http://gigaom2.files.wordpress.com/2011/02/okcupid.png?w=300&#038;h=181" alt="OkCupid chart" width="300" height="181" class="aligncenter size-medium wp-image-297111"></a></p>
<p>Old media could only get 3,050 people to answer a poll about Obama. And it was enough to call the election with confidence. OkCupid, on the other hand, can ask the world’s most personal questions and get hundreds of thousands of answers.</p>
<p><strong>Open Source Is the Key to Deeper Insights</strong><br>
The analytic techniques used to crunch the numbers and surface the patterns tend to vary. In the early days, when OkCupid had fewer members and the data sets were smaller, Excel sufficed.  But as the site’s membership rapidly grew into the millions, it was not uncommon for surveys to generate responses from 500,000 members. It became clear to Rudder that Excel by itself could not handle data from 500,000 responses; a more robust solution was required. Recently, OkCupid added statistical packages written in R to its mix of analytic tools.</p>
<p>R is an open-source language designed specifically for statistical analysis (see disclosure below). Unlike some of the more widely used, proprietary, analytic tools, programs written in R can handle the larger and more complex data sets generated by OkCupid’s growing base of users. That makes R a good choice for data scientists who are interested in pushing the envelope of traditional statistical analysis.</p>
<p>“R helps us get a quick overview of the data, which can save us a tremendous amount of time,” says Rudder. “If we had to do everything in Excel, it would take forever.”</p>
<p>Rudder’s crew uses R to visualize big data quickly, something they couldn’t do with Excel. “R lets us get a ‘zoomed-out’ view of what’s going on with the data, which helps us decide quickly if the tack we’re taking with the data is yielding something interesting. Once we figure out what we’re looking for, and we start narrowing our focus, then we can move into Excel,” says Rudder.</p>
<p>Rudder has also benefitted from the open-source community. “If I have some data and I have an idea for analyzing that data, the chances are good that someone in the R community has already written a program that does what I need,” says Max Shron, a data scientist at OkCupid. “That’s the nice thing about open source. People just go ahead and write programs. It’s a tremendous time saver.”</p>
<p>Shron and Rudder recently used R packages to analyze data for <a href="http://blog.okcupid.com/index.php/gay-sex-vs-straight-sex/">a study comparing gay and straight dating habits</a>. “There was a tremendous amount of data,” says Rudder. “When we were looking at things like sex partners and messaging, R was very helpful. We could see the patterns very quickly and we knew we had something to write about.”</p>
<p>The low cost of open-source software is also a factor. “If we had to buy a package from a major vendor — even just one license for Max — we’d blow our budget for software. It’s much easier just doing some of this stuff in R,” says Rudder.<br>
R is not a panacea for solving every data challenge – at least not yet, says Rudder. “I am a long-term Excel user; I’ve been using it since I was 12. So when it comes down to generating the final graphics, I usually like Excel. But I feel that with the right amount of work, R could probably do anything.”</p>
<p>OkTrends shows how big data and analytic science can hold a mirror to society, revealing its strengths and weaknesses. It’s gratifying to know that many of the advanced analytic processes running “under the hood” at OkTrends are hand-crafted from open-source components created by a worldwide community of people writing code for love, not money.</p>
<p>For more insights from the big data landscape, come to  GigaOM’s <a href="http://event.gigaom.com/bigdata/?utm_source=tech&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=297098+okcupid-demystifies-dating-with-big-data&amp;utm_content=gigaguest">Structure: Big Data</a> conference on March 23 in New York City.</p>
<p><em>Mike Minelli is an executive VP at <a href="http://www.revolutionanalytics.com">Revolution Analytics</a>, a company founded in 2007 to foster R analytics by creating programs to make it easier for data scientists to analyze large amounts of data. Note: Neither Minelli nor Revolution Analytics has a business relationship with OkCupid.</em></p>
<p><strong>Related content from GigaOM Pro (subscription req’d):</strong></p>
<ul><li><a href="http://pro.gigaom.com/2011/01/big-data-2011-preview/?utm_source=tech&amp;utm_medium=editorial&amp;utm_content=gigaguest&amp;utm_campaign=intext&amp;utm_term=297098+okcupid-demystifies-dating-with-big-data">Big Data 2011 Preview</a></li>
<li><a href="http://pro.gigaom.com/2011/01/big-data-arm-and-legal-troubles-transformed-infrastructure-in-q4/?utm_source=tech&amp;utm_medium=editorial&amp;utm_content=gigaguest&amp;utm_campaign=intext&amp;utm_term=297098+okcupid-demystifies-dating-with-big-data">Big Data, ARM and Legal Troubles Transformed Infrastructure in Q4</a></li>
<li><a href="http://pro.gigaom.com/2010/11/report-health-cares-climb-to-the-cloud/?utm_source=tech&amp;utm_medium=editorial&amp;utm_content=gigaguest&amp;utm_campaign=intext&amp;utm_term=297098+okcupid-demystifies-dating-with-big-data">Report: Health Care’s Climb To the Cloud</a></li>
</ul>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=297098&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=593429"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=593429" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2011/02/11/okcupid-demystifies-dating-with-big-data/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2011/02/iphone-heart.png?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2011/02/iphone-heart.png?w=150" medium="image">
			<media:title type="html">iphone-heart</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/4411542bbd7a2a9a2fc2a1b38809e45c?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigaguest</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/02/iphone-heart.png?w=300" medium="image">
			<media:title type="html">iphone-heart</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/02/okcupid.png?w=300" medium="image">
			<media:title type="html">OkCupid chart</media:title>
		</media:content>
	</item>
		<item>
		<title>The Data Whisperer: Norman Nie of Revolution Analytics</title>
		<link>http://gigaom.com/2011/02/02/the-data-whisperer-norman-nie-of-revolution-analytics/</link>
		<comments>http://gigaom.com/2011/02/02/the-data-whisperer-norman-nie-of-revolution-analytics/#comments</comments>
		<pubDate>Wed, 02 Feb 2011 20:00:04 +0000</pubDate>
		<dc:creator>Stacey Higginbotham</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Revolution Analytics]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=293132</guid>
		<description><![CDATA[Norman Nie helped create SPSS, one of the first companies to take advantage of the data computers enabled researchers and businesses to track. He spoke with me about why we need to speak to our data and how that conversation can change the way we innovate. 
<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=293132&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Norman Nie helped create SPSS, one of the first companies to take advantage of the massive amounts of data that computers enabled researchers and businesses to track. After almost 40 years at SPSS, which was <a href="http://www-03.ibm.com/press/us/en/pressrelease/27936.wss">later acquired by IBM</a>, Nie re-focused his energy on academia. But in 2009 he was pulled out of his Ivory Tower by the challenge of building a business around the open source R language at <a href="http://www.revolutionanalytics.com/">Revolution Analytics</a>. Now as CEO and chairman, Nie is evangelizing Revolution R as a modern replacement to SPSS, SAS and other older statistical analysis languages.</p>
<p>In the video below, he talked about why we need to speak to our data and how that conversation can change the way we innovate and do business.</p>
<span class="embed-youtube" style="text-align:center; display: block;"><iframe class="youtube-player" type="text/html" width="604" height="370" src="http://www.youtube.com/embed/3RmLjXmzcN0?version=3&amp;rel=1&amp;fs=1&amp;showsearch=0&amp;showinfo=1&amp;iv_load_policy=1&amp;wmode=transparent" frameborder="0"></iframe></span>
<p>He also covers the changes that have taken place in his 40-odd years of working with data. Essentially, the introduction of scalable architectures such as Hadoop and cheaper computing have enabled people to look at data in real time. Combine that with a means of analyzing it, such as Revolution R, and it’s possible to offer people who aren’t statisticians a way to interact with big data and use that information to inform their daily decision-making. For the full story, watch the video, and make plans to come see Lee Edlefsen, chief scientist at Revolution Analytics, at our<a href="http://event.gigaom.com/bigdata/?utm_source=cloud&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=293132+the-data-whisperer-norman-nie-of-revolution-analytics&amp;utm_content=shigginbotham"> Structure Big Data conference</a> taking place on March 23 in New York City.</p>
<p><strong>Related content from GigaOM Pro (subscription req’d):</strong></p>
<ul><li><a href="http://pro.gigaom.com/2011/01/big-data-2011-preview/?utm_source=cloud&amp;utm_medium=editorial&amp;utm_content=shigginbotham&amp;utm_campaign=intext&amp;utm_term=293132+the-data-whisperer-norman-nie-of-revolution-analytics">Big Data 2011 Preview</a></li>
<li><a href="http://pro.gigaom.com/2011/01/big-data-arm-and-legal-troubles-transformed-infrastructure-in-q4/?utm_source=cloud&amp;utm_medium=editorial&amp;utm_content=shigginbotham&amp;utm_campaign=intext&amp;utm_term=293132+the-data-whisperer-norman-nie-of-revolution-analytics">Big Data, ARM and Legal Troubles Transformed Infrastructure in Q4</a></li>
<li><a href="http://pro.gigaom.com/2010/11/report-health-cares-climb-to-the-cloud/?utm_source=cloud&amp;utm_medium=editorial&amp;utm_content=shigginbotham&amp;utm_campaign=intext&amp;utm_term=293132+the-data-whisperer-norman-nie-of-revolution-analytics">Report: Health Care’s Climb To the Cloud</a></li>
</ul>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=293132&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=949121"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=949121" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2011/02/02/the-data-whisperer-norman-nie-of-revolution-analytics/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2011/02/norman.png?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2011/02/norman.png?w=150" medium="image">
			<media:title type="html">norman</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/aee37121e18bf76bb9fee4494bab237a?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">shigginbotham</media:title>
		</media:content>
	</item>
	</channel>
</rss>
