<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>GigaOM &#187; Structure Big Data</title>
	<atom:link href="http://gigaom.com/tag/structure-big-data/feed/" rel="self" type="application/rss+xml" />
	<link>http://gigaom.com</link>
	<description></description>
	<lastBuildDate>Tue, 21 May 2013 07:22:43 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='gigaom.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/0db8f6557d022075dbbf010c54d46d93?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>GigaOM &#187; Structure Big Data</title>
		<link>http://gigaom.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://gigaom.com/osd.xml" title="GigaOM" />
	<atom:link rel='hub' href='http://gigaom.com/?pushpress=hub'/>
		<item>
		<title>Is big data new, or have we forgotten its old heroes?</title>
		<link>http://gigaom.com/2012/03/11/is-big-data-new-or-have-we-forgotten-its-old-heroes/</link>
		<comments>http://gigaom.com/2012/03/11/is-big-data-new-or-have-we-forgotten-its-old-heroes/#comments</comments>
		<pubDate>Sun, 11 Mar 2012 19:00:51 +0000</pubDate>
		<dc:creator>Robert Greene, Versant Corporation</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Structure Big Data]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=496872</guid>
		<description><![CDATA[Seemingly overnight, big data became the behemoth to conquer. But the truth is, tried and true technologies have been tackling the problem for years. Versant's Robert Greene gives respect to three unsung heroes of big data.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=496872&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom.com/2012/03/11/is-big-data-new-or-have-we-forgotten-its-old-heroes/greene_is-big-data-really-new_superhero_image/" rel="attachment wp-att-496887"><img title="Greene_Is Big Data Really New_superhero_image" src="http://gigaom2.files.wordpress.com/2012/03/greene_is-big-data-really-new_superhero_image1.jpg?w=253&#038;h=300" alt="" width="253" height="300" class="alignleft size-medium wp-image-496887"></a>In the past few years, big data has essentially gone from zero to hero in the enterprise tech world. Except for one small thing: it hasn’t, really. Many seem to have forgotten that big data was around, and being put to good use, well before it became the buzzword du jour.</p>
<p>Without question, <a href="http://www.versant.com/index.aspx?utm_source=GigaOM&amp;utm_medium=byline&amp;utm_term=enterprise%2Bdata&amp;utm_content=SEOlink&amp;utm_campaign=GigaOM%2BByline">enterprise data</a> volumes have grown immensely, and organizations have indeed begun to recognize the value hidden in these larger stores. According to a 2011 study by the <a href="http://in.reuters.com/article/2011/08/18/idUS115775+18-Aug-2011+GNW20110818">Aberdeen Group</a>, organizations that effectively integrate complex data are able to use up to 50 percent larger data sets for business intelligence and analytics, to integrate external unstructured data into business processes twice as successfully, and to slash error incidences almost in half. The connection between a company’s success and its ability to leverage big data is very clear. In that sense, the media firestorm around big data has been completely valid.</p>
<p><strong>Don’t believe the NoSQL hype</strong></p>
<p>However, much of the buzz treats big data as though it remains to be conquered. The hype surrounding NoSQL, for instance, makes it seem like it’s the only database management system that can effectively manage big data and that, without it, immense value would remain untapped.</p>
<p>The first iterations of the NoSQL relation provided knee-jerk solutions for companies such as <a href="http://www.amazon.com/">Amazon</a> and <a href="http://www.ebay.com/">eBay</a> that needed to solve a crushing scaling problem – and <em>fast</em>. Although it solved the scaling issues, NoSQL isn’t the best solution for today’s complex enterprise-class applications. For one, NoSQL technologies all use their own proprietary coding interfaces. So moving to NoSQL creates headaches and a drain on resources, because database administrators and programmers have to learn new skill sets.</p>
<p>Additionally, in <a href="http://hadoop.apache.org/">Hadoop</a>, for example, links between data are not automated as they are in relational or object-oriented engines, and they must be manually joined by the developer writing some custom code to do the set operation. Plus, these newer technologies don’t yet play well with enterprise-class management and monitoring protocols, making them a high risk factor for mission-critical applications. Since these are only a few of NoSQL’s pitfalls, thinking that a solution to big data problems still doesn’t exist is in some ways warranted.</p>
<p>The reality, however, is quite the opposite.</p>
<p>Other technologies have been tackling big data for years</p>
<p>The zero-to-hero perception of big data neglects the fact that many companies and industries jumped on the big data bandwagon long ago. When the amount and complexity of the data became too much for relational database management systems (RDBMs) to handle, big data pioneers began using somewhat more obscure technologies such as <a href="http://www.versant.com/index.aspx?utm_source=GigaOM&amp;utm_medium=byline&amp;utm_term=object%2Boriented%2Bsystems&amp;utm_content=SEOlink&amp;utm_campaign=GigaOM%2BByline">object oriented systems</a> and databases (ODB).</p>
<p>At this point, you might be thinking, “Okay, but can a lesser-known technology really tackle big data better than the latest cutting-edge innovation?<em>” </em> Based on the following three examples, I’d say the answer is “yes.”</p>
<p><strong>1. Big data on the rails</strong></p>
<p><strong></strong> The U.S. Federal Railroad Administration, expecting rail freight traffic to double by 2020, created the <a href="http://www.versant.com/solution/GE_Transportation_Case_Study.aspx?utm_source=GigaOM&amp;utm_medium=byline&amp;utm_term=RailEdge%2BMovement%2BPlanner&amp;utm_content=link&amp;utm_campaign=GigaOM%2BByline">RailEdge Movement Planner</a> application to perform analysis of highly complex object models more than 30 times faster than an RDB. In shipping, time really is money: fuel consumption and delivery times are determined almost exclusively by the availability of an aging infrastructure. Yet there is money to be made by mastering minutiae and affecting change in real time rather than relying on <em>predicted</em> outcomes.</p>
<p>RailEdge organizes minute details and readings from a vast network of information sensors and physical items — the number of engines and cars per train, payloads, rail traffic, congestion at depots, etc. — all against the backdrop of time. That is worth repeating. Analyzing all this data against time creates really big data. RailEdge has improved average train velocity and fuel-efficiency and saved about $200 million in annual capital and expenses.</p>
<p><strong>2. Big data in the air</strong></p>
<p>Processing airline tickets poses an even bigger big data challenge. The massive amount of transactional throughput of <a href="http://www.versant.com/solution/Sabre.aspx?utm_source=GigaOM&amp;utm_medium=byline&amp;utm_term=Travelocity.com&amp;utm_content=link&amp;utm_campaign=GigaOM%2BByline">Travelocity.com</a> and other online ticketing services puts huge pressure on databases to handle every detail quickly and with perfect<em> </em>accuracy.</p>
<p>To solve this problem, Travelocity uses the <a href="http://151.193.182.63/products/market/inventory.htm">SabreSonic Inventory System</a>, the world’s most popular ticketing inventory solution. The big-data needs of the participating airlines — 30, at last count — requires an object-oriented system to maintain high performance and minimize IT infrastructure costs.</p>
<p>Harnessing big data to quickly and accurately process millions of transactions per day has saved Travelocity.com money and boosted the brand’s reputation. The system allowed them to switch from using multi-million-dollar mainframe hardware to relatively low-cost commodity infrastructure without sacrificing performance. Even more impressive? Reliability: Since turning the system on almost four years ago, it has never gone offline. Ever.</p>
<p><strong>3. Big data on ice</strong></p>
<p>My favorite story about the forgotten heroes of big data is one that delivered value beyond mere dollars and cents.</p>
<p>Tracking the effects of Arctic ice sheets on the world’s climate is an intricate process that requires scientists to monitor incredibly minute pieces of both historical and contemporary data in petabyte volumes. <a href="http://www.versant.com/solution/NSIDC.aspx?utm_source=GigaOM&amp;utm_medium=byline&amp;utm_term=NSIDC&amp;utm_content=link&amp;utm_campaign=GigaOM%2BByline">The National Snow and Ice Data Center</a>’s (NSIDC) scientists needed to process billions of complex data objects to analyze how changes in Greenland’s ice sheet over time have affected global climate. To boot, the system also needed to enable real-time queries of the data to answer new questions about the ice sheet’s changes as the researchers made new inferences.</p>
<p>Traditional databases can’t do this. Hadoop can’t do this. In those systems, the relationships have to be rebuilt for every query. The only way for the NSIDC to do this was to drive an object-oriented model deep into the database’s architecture.</p>
<p>Without the system that the NSIDC developed, processing this amount of information would have taken years, rendering any results a matter of historical record rather than actionable intelligence.</p>
<p>Harnessing big data to create real value is certainly taking a quantum leap in necessity across every industry. But while today’s media hype surrounds predictive analytics and NoSQL technologies, we shouldn’t forget that tried and true technologies are out there that have been the silent heroes of big data for years.</p>
<p><em>Robert Greene, vice president of technology at </em><a href="http://www.versant.com/index.aspx"><em>Versant Corporation</em></a><em>, has more than 15 years experience working on high-performance, mission-critical software systems. He provides the technical direction for Versant’s database technology, used by Fortune 1000 companies, including Dow Jones, Ericsson and China Telecom. </em></p>
<p><em><a title="Attribution License" href="http://creativecommons.org/licenses/by/2.0/">Some rights reserved</a> by <a href="http://www.flickr.com/photos/creative_tools/">Creative Tools</a>.</em></p>
<p>For more on this big data phenomenon, be sure to check out GigaOM’s <a href="http://event.gigaom.com/structuredata/?utm_source=tech&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=496872+is-big-data-new-or-have-we-forgotten-its-old-heroes&amp;utm_content=aprilkilcrease">Structure:Data Conference</a> in New York City on March 21 and 22.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=496872&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=750332"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=750332" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=496872+is-big-data-new-or-have-we-forgotten-its-old-heroes&utm_content=aprilkilcrease">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/12/big-data-2013-key-trends-and-companies-to-watch/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=496872+is-big-data-new-or-have-we-forgotten-its-old-heroes&utm_content=aprilkilcrease">Big data 2013: key trends and companies to watch</a></li><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=496872+is-big-data-new-or-have-we-forgotten-its-old-heroes&utm_content=aprilkilcrease">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=496872+is-big-data-new-or-have-we-forgotten-its-old-heroes&utm_content=aprilkilcrease">A near-term outlook for big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/03/11/is-big-data-new-or-have-we-forgotten-its-old-heroes/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/03/greene_is-big-data-really-new_superhero_image1.jpg?w=126" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/03/greene_is-big-data-really-new_superhero_image1.jpg?w=126" medium="image">
			<media:title type="html">Greene_Is Big Data Really New_superhero_image</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/f61183cf1974afda4981596f4a1e7cde?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">aprilkilcrease</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/03/greene_is-big-data-really-new_superhero_image1.jpg?w=253" medium="image">
			<media:title type="html">Greene_Is Big Data Really New_superhero_image</media:title>
		</media:content>
	</item>
		<item>
		<title>Why Big Data Will Need Big Gear</title>
		<link>http://gigaom.com/2011/03/30/why-big-data-will-need-big-gear/</link>
		<comments>http://gigaom.com/2011/03/30/why-big-data-will-need-big-gear/#comments</comments>
		<pubDate>Thu, 31 Mar 2011 00:31:26 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[@NYT]]></category>
		<category><![CDATA[arista-networks]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Cisco Systems]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[low latency]]></category>
		<category><![CDATA[networking]]></category>
		<category><![CDATA[storage]]></category>
		<category><![CDATA[Structure Big Data]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=324283</guid>
		<description><![CDATA[Hardware rarely comes up in discussions about big data, save for those centered on data warehouse appliances. But the omission hardly means hardware is irrelevant. In fact, big gear might become a big deal as companies look to bolster the performance of their big data systems.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=324283&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom2.files.wordpress.com/2010/11/speed.jpg"><img title="speed" src="http://gigaom2.files.wordpress.com/2010/11/speed.jpg?w=300&#038;h=225" alt="" width="300" height="225" class="alignleft size-medium wp-image-258459"></a>Hardware is often treated as a second-class citizen in discussions about big data, whereas software innovations such as Hadoop and the latest and greatest predictive analytics algorithms reign supreme, but that won’t be the case forever.</p>
<p>Wednesday, GigaOM Pro published a report I wrote about <a href="http://pro.gigaom.com/2011/03/defining-hadoop-the-players-technologies-and-challenges-of-2011/?utm_source=cloud&amp;utm_medium=editorial&amp;utm_term=324283+why-big-data-will-need-big-gear&amp;utm_content=dharrisstructure&amp;utm_campaign=intext">the fast-growing ecosystem of Hadoop vendors, projects and users</a>. The underlying theme of that report, and right now for big data in general, is software: the products, algorithms and languages that help companies process, analyze and act upon their mountains of information. In my experience, hardware rarely comes up in discussions about big data, and when it does, it’s generally centered on data warehouse appliances. But the omission hardly means hardware is irrelevant. In fact, big gear might become a big deal as companies look to bolster the performance of their big data systems.</p>
<p>Perhaps nowhere is this more clear than at the network level. As Sun Microsystems  Co-Founder <a href="http://gigaom.com/2011/03/23/andy-bechtolsheim-arista-networks/">Andy Bechtolsteim pointed out in a fireside chat at Structure: Big Data</a>, the network has become the bottleneck in many systems as server hardware has continued to become faster and more powerful. This shouldn’t be a surprising sentiment, considering Bechtolsteim’s current company, Arista Networks, sells high-performance networking gear — including a <a href="http://www.theregister.co.uk/2011/03/28/arista_7050s_7124sx_switches/">new pair of low-latency switches</a> just announced this week — but it’s also true. It’s why, for example, mobile-app-analytics startup Flurry recently <a href="http://gigaom.com/cloud/got-big-data-youre-gonna-need-a-faster-network/">upgraded its network infrastructure</a> with Arista gear to handle skyrocketing network traffic, including across its growing Hadoop cluster.</p>
<p>When you’re moving terabytes of data across hundreds or thousands of nodes, or from one environment to another (e.g., from a Hadoop cluster to an analytic database) you can’t afford to have it lagging across the network for too long. If you’re looking to process and analyze in real time, latency needs to be as close to zero as possible.</p>
<p><a href="http://gigaom2.files.wordpress.com/2011/03/cisco-c260.png"><img title="Cisco C260" src="http://gigaom2.files.wordpress.com/2011/03/cisco-c260.png?w=300&#038;h=210" alt="" width="300" height="210" class="alignright size-medium wp-image-324317"></a>Storage can be critical for better big data performance, too, as <a href="http://newsroom.cisco.com/dlls/2011/prod_033011.html">Cisco</a><a href="http://newsroom.cisco.com/dlls/2011/prod_033011.html"> highlighted Wednesday with its new C260 M2 servers</a> designed for OLTP and data warehouse applications. The new box can house up to a terabyte of memory and 16 solid-state or hard-disk drives (up to 9.6TB in total), which are important factors in its target use cases, as they must not only be able to store large amounts of data, but also be able to access it in a hurry to answer time-sensitive queries of feed applications. This is why large systems vendors such as Oracle, IBM and EMC sell massive appliances armed to the teeth with storage and memory capacity and running either transactional or analytic database software.</p>
<p>But Cisco’s server is somewhat unique in that it’s, well, a server, as opposed to a full-on appliance. It doesn’t seem like too big of a stretch to think some company that wants to seriously improve the performance of its Hadoop cluster would transition to the C260 M2 to get the performance gains of so much memory and SSD capacity while still being able to add to the cluster one rackmount server at a time. Further, a smaller cluster in terms of nodes means even less latency, because there are fewer network points for data to traverse as it crosses the system.</p>
<p>In their current states, many Hadoop clusters and other big data systems probably are just fine in terms of compute, network and storage performance. That’s why, as I explain in my report, and <a href="http://gigaom.com/cloud/yahoo-suggests-mapreduce-overhaul-to-improve-hadoop-performance/">as I’ve reported here  before</a>, much work is underway to improve software performance for tools  like Hadoop, while still focusing on less-expensive commodity gear. But that might change as companies start to rely more on analytics to make everyday business decisions.</p>
<p>Much like banks have spent untold millions on finely tuned software <em>and </em>hardware to carry out their risk analyses and high-frequency trading applications, more-mainstream companies also will start feeling the pressure to build analytic systems that deliver results in as close to real time as possible, even for batch processes. Certainly, Arista and Cisco aren’t alone in seeing this trend, and I suspect we’ll see a lot more network and server vendors targeting big data applications beyond OLTP analytic databases as the trend gains momentum.</p>
<p><em>Image courtesy of Flickr user <a href="http://www.flickr.com/photos/laserstars/908946494/in/photostream/" target="_blank">jpctalbot</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=324283&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=731584"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=731584" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=324283+why-big-data-will-need-big-gear&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2011/03/defining-hadoop-the-players-technologies-and-challenges-of-2011/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=324283+why-big-data-will-need-big-gear&utm_content=dharrisstructure">Defining Hadoop: the Players, Technologies and Challenges of 2011</a></li><li><a href="http://pro.gigaom.com/2012/06/cloud-computing-infrastructure-2012-and-beyond/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=324283+why-big-data-will-need-big-gear&utm_content=dharrisstructure">Cloud computing infrastructure: 2012 and beyond</a></li><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=324283+why-big-data-will-need-big-gear&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2011/03/30/why-big-data-will-need-big-gear/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2010/11/speed-e1318892183924.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2010/11/speed-e1318892183924.jpg?w=150" medium="image">
			<media:title type="html">speed</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2010/11/speed.jpg?w=300" medium="image">
			<media:title type="html">speed</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/03/cisco-c260.png?w=300" medium="image">
			<media:title type="html">Cisco C260</media:title>
		</media:content>
	</item>
		<item>
		<title>Why Big Data Startups Should Take a Narrow View</title>
		<link>http://gigaom.com/2011/03/28/why-big-data-startups-should-take-a-narrow-view/</link>
		<comments>http://gigaom.com/2011/03/28/why-big-data-startups-should-take-a-narrow-view/#comments</comments>
		<pubDate>Mon, 28 Mar 2011 23:03:46 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Startups]]></category>
		<category><![CDATA[Structure Big Data]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=322998</guid>
		<description><![CDATA[One of the statements that struck me most from Structure: Big Data was CA CTO Donald Ferguson's notion that big data represents a "very promising" opportunity for startups, particularly those targeting specific target use cases. I think he's right, particularly with regard to the latter part.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=322998&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom2.files.wordpress.com/2011/03/tunnel-vision.jpg"><img title="tunnel vision" src="http://gigaom2.files.wordpress.com/2011/03/tunnel-vision.jpg?w=300&#038;h=225" alt="" width="300" height="225" class="alignleft size-medium wp-image-323067"></a>Looking back on last week’s <a href="http://event.gigaom.com/bigdata/?utm_source=cloud&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=322998+why-big-data-startups-should-take-a-narrow-view&amp;utm_content=dharrisstructure">Structure Big Data conference</a>, one of the statements that struck me most was <a href="http://gigaom.com/2011/03/23/donald-ferguson-ca-technologies/">CA CTO Donald Ferguson’s</a> notion that big data represents a “very promising” opportunity for startups, particularly those targeting specific target use cases. I think he’s right, particularly with regard to the latter part: the market for horiztontally focused products is filling up fast with both startups and large vendors, so innovative companies might look at how to best tune big data tools for specific industries.</p>
<p>As I <a href="http://gigaom.com/cloud/as-big-data-takes-off-the-hadoop-wars-begin/">explained in detail last week</a>, Hadoop has become popular among companies of all sizes, but most products at this point target broad use cases across industries. Yes, there’s still room for startups to get in here, but the door looks to be closing fast. It’s not just Hadoop, either; other techniques, from tradtional data warehouses to, arguably, predictive analytics, all are nearing the saturation point in terms of vendors selling the core technologies. Even a step up the stack from the core Hadoop layer are vendors like <a href="http://datameer.com">Datameer</a> selling familiar-looking interfaces that abstract the complexities of processing and analyzing data with Hadoop.</p>
<p>But Ferguson made a particularly poignant, if not novel, observation: analyzing social media data is not the same, either in technique or in purpose, as analyzing user data to feed a recommendation engine for a site like Netflix. And herein lies the opportunity. Organizations keep on hearing about big data and about how big an opportunity it is, but even though the technology to capitalize on this opportunity is getting democratized, organizations still face a big challenge to hire personnel that understand not only the technology, but also how to ask right the right questions. Sure, analyzing social media data sounds great to find out what consumers like or how they might act sounds great, but actually being able to do it accurately is another issue. It’s a situation just begging for startups to fill the void between big data tools and actually using them for a particular task.</p>
<p>Whether the focus is by industry (e.g., tools for financial services, retail, etc.) or by use case (e.g., sentiment analysis, recommendation engines, etc.), one can easily envision an emerging class of companies tuning technologies like Hadoop or predictive analytics software to directly address these discrete classes of users. Organizations won’t necessarily need <a href="http://gigaom.com/2010/12/16/wanted-data-scientists-to-turn-information-into-gold/">data scientists to “turn information into gold”</a> if the data scientists employed by their software vendors have already done most of the work. Think about it like functions within spreadsheet applications tuned to specific industries, or like how PaaS startups took cloud computing a step further by <a href="http://gigaom.com/cloud/dotclouds-paas-for-the-masses-gets-it-800k/">configuring infrastructure with the push of a button</a>. Just feed the application some data, push a button, and get results — no Ph. D. required.</p>
<p>To a degree, this is already starting to happen, but primarily by large vendors using their existing software (e.g., <a href="http://www.sas.com/software/customer-intelligence/social-media-analytics/">SAS for social media</a>) and in the form of fairly limited-scope analytic technologies (e.g., <a href="http://gigaom.com/cloud/ravel-hopes-to-open-source-graph-databases/">graph databases</a>), but I think these are just baby steps toward what could be a huge opportunity. Companies of all types want to be the next Yahoo or Facebook in terms of big data, and there are plenty of companies willing to help them do that in terms of infrastructure. The real opportunity now is in helping companies figure out how to use it.</p>
<p><em>Image courtesy of <a href="http://www.geograph.org.uk/profile/40">Pam Brophy</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=322998&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=756024"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=756024" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=322998+why-big-data-startups-should-take-a-narrow-view&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=322998+why-big-data-startups-should-take-a-narrow-view&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=322998+why-big-data-startups-should-take-a-narrow-view&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=322998+why-big-data-startups-should-take-a-narrow-view&utm_content=dharrisstructure">A near-term outlook for big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2011/03/28/why-big-data-startups-should-take-a-narrow-view/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2011/03/tunnel-vision.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2011/03/tunnel-vision.jpg?w=150" medium="image">
			<media:title type="html">tunnel vision</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/03/tunnel-vision.jpg?w=300" medium="image">
			<media:title type="html">tunnel vision</media:title>
		</media:content>
	</item>
		<item>
		<title>Structure, Nasdaq and More: Why GigaOM Loves New York</title>
		<link>http://gigaom.com/2011/03/27/structure-nasdaq-and-more-why-gigaom-loves-new-york/</link>
		<comments>http://gigaom.com/2011/03/27/structure-nasdaq-and-more-why-gigaom-loves-new-york/#comments</comments>
		<pubDate>Sun, 27 Mar 2011 18:21:07 +0000</pubDate>
		<dc:creator>Nicole Solis, Managing Editor</dc:creator>
				<category><![CDATA[GigaOm]]></category>
		<category><![CDATA[Nasdaq]]></category>
		<category><![CDATA[New York Times]]></category>
		<category><![CDATA[Structure Big Data]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=321717</guid>
		<description><![CDATA[The GigaOM team spent the past week in New York City for a number of events, including our first (and definitely not last) East Coast conference, Structure Big Data. It's all part of our plan to expand our footprint into new markets, including New York.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=321717&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom2.files.wordpress.com/2011/03/group-1.jpg"><img title="The GigaOM staff in front of Nasdaq" src="http://gigaom2.files.wordpress.com/2011/03/group-1.jpg?w=300&#038;h=200" alt="The GigaOM staff in front of Nasdaq, March 22, 2011, by Pinar Ozger" width="300" height="200" class="alignleft size-medium wp-image-322568"></a><br>
What a week! The GigaOM team spent the past week in New York City for a number of events, including our first (and definitely not last) East Coast conference, <a href="http://event.gigaom.com/bigdata/?utm_source=tech&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=321717+structure-nasdaq-and-more-why-gigaom-loves-new-york&amp;utm_content=nsolisgigaom">Structure Big Data</a>. The conference is part of our plan to expand our footprint into new markets, including New York.</p>
<p>We kicked off the week with <a href="http://omis.me/2011/03/23/times-square-is-fun-photos-from-nasdaq-closing-bell-ceremony/">Om joining Nasdaq’s VP, David Wicks</a>, to <a href="http://www.nasdaq.com/marketsite/marketsite-events-detail.aspx?fn=201103-close03222011.txt">ring the closing bell</a>. This is just the start of a closer partnership between GigaOM and the folks at Nasdaq.</p>
<p>That evening, we held a reception at the New York Times building, bringing together media executives, data scientists and journalists. A bunch of bloggers walked among the <em>Times</em> reporters’ Pulitzers — an interesting juxtaposition of old and new media.</p>
<p>All of this was a precursor to our main event: <a href="http://event.gigaom.com/bigdata/?utm_source=tech&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=321717+structure-nasdaq-and-more-why-gigaom-loves-new-york&amp;utm_content=nsolisgigaom">Structure Big Data</a>, held at the Chelsea Piers. And it was a great success — not just in terms of attendance, though it was standing room only in the main hall. The sold-out event was also the first time that people from the Silicon Valley and New York tech communities, as well as leaders in finance, media and other New York-based industries, came together in one place to discuss how they were all using big data to gain valuable insights into their businesses. If you couldn’t make it, you can see some of the highlights in our photo gallery below or <a href="http://gigaom.com/2011/03/23/structure-big-data-live-coverage/">read through our coverage</a> (with video clips) of the entire event.</p>
<p>Structure Big Data may have been our first big step into New York, but expect more to come. Last fall, we hired <a href="http://gigaom.com/author/oryankim/">Ryan Kim</a>, a crackerjack tech reporter from the <em>San Francisco Chronicle</em>, to be our “man in New York,” covering local startups and the growing app economy. In February, we hosted a Media Meetup, along with Automattic (see disclosure), to gather together the New York tech media, startup founders and venture capitalists.</p>
<p>This is just the beginning for us. It’s easy for tech publications to myopically focus on Silicon Valley, but increasingly, many of the exciting ideas and innovations are developing elsewhere. Our readers rely on us to find the next big ideas — wherever they are. That’s why more of our staff will be joining Ryan in New York, including Om himself who will return to his former life as a bicoastal writer. We’re looking forward to telling more of New York’s great tech stories. In the meantime, enjoy the photo gallery from Structure Big Data and a behind-the-scenes video from Nasdaq.</p>

<div class="flex-video"><div id="ooyala-video_439e489aff00d400f0aabe2ec4d74647" class="video-player ooyala-video" width="600" height="338"><p>
			<a href="http://gigaom.com/2011/03/27/structure-nasdaq-and-more-why-gigaom-loves-new-york/"><img src="http://ak.c.ooyala.com/h2djZjMjpDmWtGgKRHUkus0dq-6BwAOw/Ut_HKthATH4eww8X5iMDoxOm9pO9a5tR" alt="Ooyala Video Thumbnail"></a><br><a href="http://gigaom.com/2011/03/27/structure-nasdaq-and-more-why-gigaom-loves-new-york/">Watch this video for free</a> on <a href="http://gigaom.com/">GigaOM</a>
		</p></div></div>
<p><em>Automattic, maker of WordPress.com, is backed by True Ventures, a venture capital firm that is an investor in the parent company of this blog, GigaOm. Om Malik, founder of GigaOm, is also a venture partner at True.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=321717&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=419818"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=419818" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=321717+structure-nasdaq-and-more-why-gigaom-loves-new-york&utm_content=nsolisgigaom">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/12/connected-consumer-2013-how-2012-laid-the-groundwork-for-change/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=321717+structure-nasdaq-and-more-why-gigaom-loves-new-york&utm_content=nsolisgigaom">How consumer media will change in 2013</a></li><li><a href="http://pro.gigaom.com/2012/04/newnet-q1-advertising-commerce-and-discovery-dominate/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=321717+structure-nasdaq-and-more-why-gigaom-loves-new-york&utm_content=nsolisgigaom">Social media in Q1: commerce and discovery dominated</a></li><li><a href="http://pro.gigaom.com/2011/04/newnet-q1-content-farms-and-niche-networks-on-the-rise/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=321717+structure-nasdaq-and-more-why-gigaom-loves-new-york&utm_content=nsolisgigaom">NewNet Q1: Content Farms and Niche Networks on the Rise</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2011/03/27/structure-nasdaq-and-more-why-gigaom-loves-new-york/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2011/03/group-1.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2011/03/group-1.jpg?w=150" medium="image">
			<media:title type="html">The GigaOM staff in front of Nasdaq</media:title>
		</media:content>

		<media:content url="http://2.gravatar.com/avatar/562164ecbc2c4b27fc6878c8d55c5c7d?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">nsolisgigaom</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/03/group-1.jpg?w=300" medium="image">
			<media:title type="html">The GigaOM staff in front of Nasdaq</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/03/bmf_7713.jpg?w=150" medium="image">
			<media:title type="html">Om in front of Nasdaq, March 22, 2011, by Pinar Ozger</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/03/mc_032211_hires-2.jpg?w=150" medium="image">
			<media:title type="html">Om at Nasdaq closing bell ceremony, March 23, 2011</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/03/bmf_7516.jpg?w=150" medium="image">
			<media:title type="html">Team GigaOM just after ringing the closing bell at Nasdaq</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/03/1z5o0504.jpg?w=100" medium="image">
			<media:title type="html">Structure Big Data 2011</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/03/jeff-jonas1.jpg?w=150" medium="image">
			<media:title type="html">Jeff Jonas, IBM, at Structure Big Data 2011</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/03/1z5o0530.jpg?w=150" medium="image">
			<media:title type="html">The audience at Structure Big Data 2011</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/03/1z5o13071.jpg?w=150" medium="image">
			<media:title type="html">Michelle Munson, Aspera, at Structure Big Data 2011</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/03/1z5o9998.jpg?w=150" medium="image">
			<media:title type="html">Exhibit hall at Structure Big Data 2011</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/03/1z5o12491.jpg?w=150" medium="image">
			<media:title type="html">The Master Data Wranglers panel at Structure Big Data 2011</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/03/1z5o0056.jpg?w=150" medium="image">
			<media:title type="html">Exhibit hall at Structure Big Data 2011</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/03/1z5o07351.jpg?w=150" medium="image">
			<media:title type="html">Andy Bechtolsheim of Arista Networks at Structure Big Data 2011</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/03/1z5o0494.jpg?w=150" medium="image">
			<media:title type="html">Structure Big Data 2011 at Chelsea Pier</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/03/1z5o01701.jpg?w=150" medium="image">
			<media:title type="html">Moe Khosravy of Windows Azure DataMarket and Flip Kromer of Infochimps discuss data markets with Stacey Higginbotham at Structure Big Data 2011</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/03/1z5o0043.jpg?w=150" medium="image">
			<media:title type="html">Exhibit hall at Structure Big Data 2011</media:title>
		</media:content>
	</item>
		<item>
		<title>Meet Mapr, a Competitor to Hadoop Leader Cloudera</title>
		<link>http://gigaom.com/2011/03/24/meet-mapr-a-competitor-to-hadoop-leader-cloudera/</link>
		<comments>http://gigaom.com/2011/03/24/meet-mapr-a-competitor-to-hadoop-leader-cloudera/#comments</comments>
		<pubDate>Thu, 24 Mar 2011 17:04:47 +0000</pubDate>
		<dc:creator>Om Malik</dc:creator>
				<category><![CDATA[@NYT]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Mapr]]></category>
		<category><![CDATA[Structure Big Data]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=321713</guid>
		<description><![CDATA[Mapr, a stealth-mode start-up with about 30 employees is developing a version of Hadoop and plans to compete with the likes of Cloudera. The company is likely to launch later this year and has been funded by Lightspeed Venture Partners and NEA.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=321713&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<div id="attachment_321320" class="wp-caption alignleft" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2011/03/1z5o0588.jpg"><img title="Cloudera's Amr Awadallah, Pervasive Software's Mike Hoskins, 10gen's Dwight Merriman, Yahoo's Todd Papaioannou, and DataStax Ben Werther" src="http://gigaom2.files.wordpress.com/2011/03/1z5o0588.jpg?w=300&#038;h=200" alt="Cloudera's Amr Awadallah, Pervasive Software's Mike Hoskins, 10gen's Dwight Merriman, Yahoo's Todd Papaioannou, and DataStax Ben Werther" width="300" height="200" class="size-medium wp-image-321320"></a><p class="wp-caption-text">The  Hadoop and Beyond Panel at Structure: Big Data</p></div>
<p>Hadoop, the open-source file system and MapReduce implementation for massive-scale data, <a href="http://gigaom.com/cloud/hadoop-cluster-mapreduce-distributed-file-system/">was the talk of the conference Wednesday</a> at our <a href="http://event.gigaom.com/bigdata/?utm_source=cloud&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=321713+meet-mapr-a-competitor-to-hadoop-leader-cloudera&amp;utm_content=om">Structure Big Data conference in New York</a>. From new Hadoop distributions to end-customers’ plans, Hadoop was all anyone could talk about. One of the companies whose name crept up in conversations was a stealth-mode company called <a href="http://www.maprtech.com/">Mapr</a>, which is building a proprietary version of Hadoop and is likely to launch later this year.</p>
<p>Mapr, based in <del datetime="2011-03-25T01:11:49+00:00">Saratoga</del> San Jose, Calif., has been in the works for nearly two years. <a href="http://sec.gov/Archives/edgar/data/1468803/000146880309000001/xslFormDX01/primary_doc.xml">The Securities and Exchange Commission filings show</a> the company has raised about $9 million in funding from <a href="http://www.lightspeedvp.com/TeamMember.aspx?m=6">Barry Eggers</a> of Lightspeed Venture Partners and Peter Sonsini of the New Enterprise Associates.  On its web site, the company says it’s “engineering game changing Map/Reduce related technologies.” Its ambitions aren’t limited by that somewhat ambiguous statement.</p>
<h2><strong>People Behind Mapr</strong>:</h2>
<ul><li><strong>M.C. <del datetime="2011-03-27T16:57:18+00:00">Srinivas</del> Srivas,</strong> an ex-Googler  is the founder and CTO of the company.</li>
<li><strong>John Schroeder,</strong> formerly of Lightspeed VC and former CEO of Calista Technologies (acquired by Microsoft) and Rainfinity (acquired by EMC) is the CEO and co-founder of Mapr.</li>
<li>The company has close to 30 employees, many of them based in India.</li>
<li><strong>Ted Dunning,</strong> chief scientist at Site Tuner and Veoh Networks, is the chief application architect at Mapr. He created the recommendation engine for Musicmatch, a music service that was popular before iTunes came on the scene. He is also one of the key guys behind the Apache Mahout data-mining project.</li>
</ul><h2><strong>What Is Mapr Doing</strong>?</h2>
<p>They are said to be building a proprietary replacement for the Hadoop Distributed File System that’s allegedly three times faster than the current open-source version. It comes with snapshots and no NameNode single point of failure (SPOF), and is supposed to be API-compatible with HDFS, so it can be a drop-in replacement.</p>
<h2><strong>The Road Ahead</strong></h2>
<p>Mapr might have an edge over Apache Hadoop in the interim, but Apache is working to improve the HDFS architecture in its distribution, and should have its own snapshot feature sometime in 2012. Also, Appistry sells a NameNode-free HDFS alternative based on its distributed CloudIQ Storage offering. As for the speed advantage, I don’t have any details for now, but if you have some thoughts, please share them with us.</p>
<p>On a broader canvas, I think Mapr is up against a whole lot of major competitors. Cloudera has a lead in the commercial market place, and the Apache Hadoop distribution on which it’s based keeps improving thanks to upgrades from contributors like Facebook and Yahoo. Apache Hadoop companies more control over their data, as they are not at all held hostage by a vendor, and surveys and anecdotal evidence alike suggest that Apache Hadoop is still the most widely-used version.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=321713&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=71760"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=71760" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=321713+meet-mapr-a-competitor-to-hadoop-leader-cloudera&utm_content=om">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/12/big-data-2013-key-trends-and-companies-to-watch/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=321713+meet-mapr-a-competitor-to-hadoop-leader-cloudera&utm_content=om">Big data 2013: key trends and companies to watch</a></li><li><a href="http://pro.gigaom.com/2012/11/real-%c2%adtime-query-for-hadoop-democratizes-access-to-big-data-analytics/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=321713+meet-mapr-a-competitor-to-hadoop-leader-cloudera&utm_content=om">Real-­time query for Hadoop democratizes access to big data analytics</a></li><li><a href="http://pro.gigaom.com/2012/07/scaling-hadoop-clusters-the-role-of-cluster-management/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=321713+meet-mapr-a-competitor-to-hadoop-leader-cloudera&utm_content=om">Scaling Hadoop clusters: the role of cluster management</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2011/03/24/meet-mapr-a-competitor-to-hadoop-leader-cloudera/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2011/03/1z5o0588.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2011/03/1z5o0588.jpg?w=150" medium="image">
			<media:title type="html">Cloudera&#039;s Amr Awadallah, Pervasive Software&#039;s Mike Hoskins, 10gen&#039;s Dwight Merriman, Yahoo&#039;s Todd Papaioannou, and DataStax Ben Werther</media:title>
		</media:content>

		<media:content url="http://2.gravatar.com/avatar/89c6ff98059617751fcf312690965fa0?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">om</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/03/1z5o0588.jpg?w=300" medium="image">
			<media:title type="html">Cloudera&#039;s Amr Awadallah, Pervasive Software&#039;s Mike Hoskins, 10gen&#039;s Dwight Merriman, Yahoo&#039;s Todd Papaioannou, and DataStax Ben Werther</media:title>
		</media:content>
	</item>
		<item>
		<title>The Hurdles for Moving Big Data &#8216;Round the World</title>
		<link>http://gigaom.com/2011/03/23/munson-aspera-hanafi-alloy/</link>
		<comments>http://gigaom.com/2011/03/23/munson-aspera-hanafi-alloy/#comments</comments>
		<pubDate>Wed, 23 Mar 2011 22:59:41 +0000</pubDate>
		<dc:creator>Katie Fehrenbacher</dc:creator>
				<category><![CDATA[Alloy Ventures]]></category>
		<category><![CDATA[Ammar Hanafi]]></category>
		<category><![CDATA[aspera]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[fasp]]></category>
		<category><![CDATA[infrastructure]]></category>
		<category><![CDATA[Michelle Munson]]></category>
		<category><![CDATA[Structure Big Data]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=321453</guid>
		<description><![CDATA[Underlying all the useful applications, like Hadoop, that have emerged out of the big data ecosystem, there's a fundamental assumption: The data that companies want will be able to be accessed when companies want and need it, explained Michelle Munson, CEO and co-founder of Aspera.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=321453&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom2.files.wordpress.com/2011/03/1z5o1307.jpg"><img src="http://gigaom2.files.wordpress.com/2011/03/1z5o1307.jpg?w=210&#038;h=140" alt="Michelle Munson from Aspera at Structure Big Data 2011" title="Michelle Munson from Aspera at Structure Big Data 2011" width="210" height="140"  class="alignleft size-thumbnail wp-image-321494" /></a>Underlying all the useful and inspiring applications, like Hadoop, that have emerged out of the Big Data ecosystem, is a fundamental assumption: The data that companies want will be able to be accessed when companies want and need it. That functionality requires the ability to transfer files at the speeds that people expect it, and is one of the constraints of the big data world, explained Michelle Munson, CEO and co-founder of Aspera.</p>
<p>Aspera has built a proprietary<a href="http://www.asperasoft.com/en/technology/fasp_overview_1/fasp_technology_overview_1"> high-speed file-transport technology, fasp</a>, that helps data move across networks with issues like over-burdened WANs. Aspera is primarily the province of large companies dealing with big data, including digital media companies sending content among supply-chain partners, life sciences researchers sending genome-sequencing data among institutes and government intelligence customers sending video files between agencies.</p>
<p>Munson said current Internet infrastructure lacks three qualities:</p>
<ol>
<li>availability</li>
<li>geographic independence</li>
<li>security</li>
</ol>
<p>While all these issues need to be addressed in the fundamental architecture itself, the constraint has created an opportunity for Aspera&#8217;s transfer product. The reliability of Internet services is going up, which creates an expectation that this data will be available quickly, said Ammar Hanafi, general partner with Alloy Ventures.</p>
<p>While consumer web services can easily meet customer expectations, Aspera&#8217;s customers are a different story. &#8220;Our customers are moving many gigabytes and larger [quantities] of data that has to be chunked up and then distributed,&#8221; said Munson. But even if Aspera&#8217;s file transfer tech can make sure the delivery is as fast as the consumer web, the company has learned it can provide something else: predictability. &#8220;After solving the bottleneck, then you can offer customers predictability,&#8221; that manage their expectations, Munson said.</p>
<p>At the end of the day, its a physics problem, both Munson and Hanafi said. TCP, the transmission protocol used by IP networks, just doesn&#8217;t perform all that well for moving big data long distances. That&#8217;s both a big opportunity for startups like Aspera and big data infrastructure companies.</p>
<p><iframe width="560" height="340" src="http://cdn.livestream.com/embed/gigaombigdata?layout=4&amp;clip=pla_3db15a04-38ba-4175-bdf1-40a828c8d368&amp;autoplay=false" style="border:0;outline:0" frameborder="0" scrolling="no"></iframe>
<div style="font-size: 11px;padding-top:10px;text-align:center;width:560px">Watch <a href="http://www.livestream.com/?utm_source=lsplayer&amp;utm_medium=embed&amp;utm_campaign=footerlinks" title="live streaming video">live streaming video</a> from <a href="http://www.livestream.com/gigaombigdata?utm_source=lsplayer&amp;utm_medium=embed&amp;utm_campaign=footerlinks" title="Watch gigaombigdata at livestream.com">gigaombigdata</a> at livestream.com</div>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=321453&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=32200"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=32200" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=321453+munson-aspera-hanafi-alloy&utm_content=katiefehren">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2011/11/connected-world-the-consumer-technology-revolution/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=321453+munson-aspera-hanafi-alloy&utm_content=katiefehren">Connected world: the consumer technology revolution</a></li><li><a href="http://pro.gigaom.com/2010/12/will-facebook-or-apple-be-the-next-great-hadoop-champion/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=321453+munson-aspera-hanafi-alloy&utm_content=katiefehren">Will Facebook (or Apple) Be the Next Great Hadoop Champion?</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=321453+munson-aspera-hanafi-alloy&utm_content=katiefehren">A near-term outlook for big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2011/03/23/munson-aspera-hanafi-alloy/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2011/03/1z5o1307.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2011/03/1z5o1307.jpg?w=150" medium="image">
			<media:title type="html">Michelle Munson from Aspera at Structure Big Data 2011</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/0c61eb5d3c638c5b371fc84afd2831b4?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">katiefehren</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/03/1z5o1307.jpg?w=210" medium="image">
			<media:title type="html">Michelle Munson from Aspera at Structure Big Data 2011</media:title>
		</media:content>
	</item>
		<item>
		<title>Reducing Data Latency Leads to Faster Decisions</title>
		<link>http://gigaom.com/2011/03/23/reducing-data-latency-leads-to-faster-decisions/</link>
		<comments>http://gigaom.com/2011/03/23/reducing-data-latency-leads-to-faster-decisions/#comments</comments>
		<pubDate>Wed, 23 Mar 2011 22:15:48 +0000</pubDate>
		<dc:creator>Kevin C. Tofel</dc:creator>
				<category><![CDATA[Structure Big Data]]></category>
		<category><![CDATA[Sybase]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=321404</guid>
		<description><![CDATA[Estimates say that 90 percent of all data was created in the last two years alone. That staggering figure can lead to analysis paralysis for some organizations, but those that can sift through, analyze and take action on information faster than others will enjoy competitive advantages.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=321404&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom2.files.wordpress.com/2011/03/1z5o1154.jpg"><img src="http://gigaom2.files.wordpress.com/2011/03/1z5o1154.jpg?w=300&#038;h=200" alt="Irfan Khan, Sybase, at Structure Big Data 2011" title="Irfan Khan, Sybase, at Structure Big Data 2011" width="300" height="200"  class="alignleft size-medium wp-image-321464" /></a>Big data is getting bigger, with some estimates suggesting that 90 percent of all data was created in the last two years alone. That staggering figure can lead to analysis paralysis for some organizations, but those that can sift through, analyze and take action on information faster than others will have a competitive advantage. Irfan <del datetime="2011-03-28T16:49:54+00:00">Kahn</del> Khan, the CTO at Sybase, explained at the Structure Big Data conference certain strategies that can reduce data latency and increase the value of faster decisions.</p>
<p><del datetime="2011-03-28T16:49:54+00:00">Kahn</del> Khan says the challenge lies mainly in three particular latency areas: data, analysis and decision making all take time from the initial trigger event to the final action event. And as more time passes at each of these hotspots, the value of any decision made from the data is reduced. As if that weren&#8217;t bad enough, I.T. shops are slowed by other challenges such as mobile computing adoption of employee-owned devices, the mobile commerce revolution and pressure to boost worker productivity.</p>
<p>What&#8217;s needed to respond to these challenges, according to <del datetime="2011-03-28T16:49:54+00:00">Kahn</del> Khan are new technology stacks, deployment models and shifts in both data and application paradigms. For starters, the cost of faster DRAM is now approaching that of high-end storage and should be considered for data. Using column stores of data can reduce the size of information arrays by several factors while the adoption of complex event processing engines (CEP) can bring real-time analytics. Data analysis also needs to be pushed down closer to where the data is actually stored &#8212; a tighter merging of application stores and data tiers, says <del datetime="2011-03-28T16:49:54+00:00">Kahn</del> Khan &#8212; and the use of in-memory analytics can also diminish information latency, leading to faster decision making in the enterprise.</p>
<p><iframe width="560" height="340" src="http://cdn.livestream.com/embed/gigaombigdata?layout=4&amp;clip=pla_792874d1-8473-43be-91ab-2b78ab2a40c0&amp;autoplay=false" style="border:0;outline:0" frameborder="0" scrolling="no"></iframe>
<div style="font-size: 11px;padding-top:10px;text-align:center;width:560px"><a href="http://www.livestream.com/gigaombigdata?utm_source=lsplayer&amp;utm_medium=embed&amp;utm_campaign=footerlinks" title="Watch gigaombigdata">gigaombigdata</a> on livestream.com. <a href="http://www.livestream.com/?utm_source=lsplayer&amp;utm_medium=embed&amp;utm_campaign=footerlinks" title="Broadcast Live Free">Broadcast Live Free</a></div>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=321404&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=228350"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=228350" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=321404+reducing-data-latency-leads-to-faster-decisions&utm_content=kevintofel">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=321404+reducing-data-latency-leads-to-faster-decisions&utm_content=kevintofel">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/01/infrastructure-q4-big-data-gets-bigger-and-saas-startups-shine/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=321404+reducing-data-latency-leads-to-faster-decisions&utm_content=kevintofel">Infrastructure Q4: Big data gets bigger and SaaS startups shine</a></li><li><a href="http://pro.gigaom.com/2011/11/dissecting-the-data-5-issues-for-our-digital-future/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=321404+reducing-data-latency-leads-to-faster-decisions&utm_content=kevintofel">Dissecting the data: 5 issues for our digital future</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2011/03/23/reducing-data-latency-leads-to-faster-decisions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2011/03/1z5o1154.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2011/03/1z5o1154.jpg?w=150" medium="image">
			<media:title type="html">Irfan Khan, Sybase, at Structure Big Data 2011</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/6cbb45abac59965c2626e40155358d1b?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">Kevin C. Tofel</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/03/1z5o1154.jpg?w=300" medium="image">
			<media:title type="html">Irfan Khan, Sybase, at Structure Big Data 2011</media:title>
		</media:content>
	</item>
		<item>
		<title>Data Science Toolkit Brings Big Data Analysis to the People</title>
		<link>http://gigaom.com/2011/03/23/pete-warden-openheatmap-data-science-toolkit/</link>
		<comments>http://gigaom.com/2011/03/23/pete-warden-openheatmap-data-science-toolkit/#comments</comments>
		<pubDate>Wed, 23 Mar 2011 20:27:43 +0000</pubDate>
		<dc:creator>Janko Roettgers</dc:creator>
				<category><![CDATA[Amazon EC2]]></category>
		<category><![CDATA[Structure Big Data]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=321359</guid>
		<description><![CDATA[Pete Warden got famous for scraping 220 million Facebook profiles and then analyzing this data to unearth U.S.-wide user connection trends. Now he wants you to be able to do the same with a new web service released at GigaOM's Structure Big Data conference today.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=321359&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom2.files.wordpress.com/2011/03/1z5o1088.jpg"><img src="http://gigaom2.files.wordpress.com/2011/03/1z5o1088.jpg?w=300&#038;h=200" alt="Pete Warden, OpenHeatMap, at Structure Big Data 2011" title="Pete Warden, OpenHeatMap, at Structure Big Data 2011" width="300" height="200"  class="alignleft size-medium wp-image-321400" /></a>Pete Warden has been analysing big data on the cheap for years, and he wants you to be able to do the same: Warden, who got famous for scraping 220 million Facebook profiles, unveiled his Data Science Toolkit at <a href="http://gigaom.com/2011/03/23/structure-big-data-live-coverage/">at GigaOM’s Structure Big Data conference</a> in New York today, allowing anyone to do automate conversions and analysis needed to make sense of massive amounts of data. </p>
<p>For example, Data Science Toolkit offers OCR functionality to convert PDFs or scanned image files to text files, filter geographic locations from news articles and other types of unstructured data or find political district and neighborhood information for any given location. <a href="http://www.datasciencetoolkit.org/">Data Science Toolkit is available as a web service</a> online, but it can also be downloaded and run on an Amazon EC2 or VM virtual machine.</p>
<p>Explaining the motivation for this release, Warden said during his talk that he has been living off of ramen noodles for years, but that didn’t stop him from getting creative with data analysis. &#8220;I’ve had my service living off this same kind of budget,&#8221; he said, adding: &#8220;You can hire a hundred servers from Amazon for $10 an hour.&#8221;</p>
<p>That’s exactly what Warden did last year after <a href=”http://gigaom.com/2010/02/08/the-7-somewhat-united-states-of-facebook/”>crawling Facebook and scraping 500 million pages</a> that represented about 22 million users. He let Amazon’s servers loose on the scraped data, and 10 hours later had it boiled down to a database-ready format. &#8220;That was about a hundred bucks,&#8221; recollected Warden. </p>
<p>The result of these efforts was a massive analysis of friendship relationships on Facebook, which got him interviews with NPR, 500,000 visitors to his blog &#8212; and <a href="http://gigaom.com/2010/04/01/facebook-data-deleted-after-lawsuit-threat/">an angry call from Facebook’s chief legal counsel</a>, who didn’t like what Warden was doing with the company’s data. </p>
<p>The conflict with Facebook lasted several months and ended up costing Warden $3000 in legal fees. &#8220;Big data? Cheap. Lawyers? Not so cheap,&#8221; he quipped. That episode may be one reason that Warden released the Data Science Toolkit under the GPL &#8212; no lawyers necessary to use it.</p>
<p><iframe width="560" height="340" src="http://cdn.livestream.com/embed/gigaombigdata?layout=4&amp;clip=pla_b6c19059-7812-4c56-81cf-e8738cd5c5da&amp;autoplay=false" style="border:0;outline:0" frameborder="0" scrolling="no"></iframe>
<div style="font-size: 11px;padding-top:10px;text-align:center;width:560px"><a href="http://www.livestream.com/gigaombigdata?utm_source=lsplayer&amp;utm_medium=embed&amp;utm_campaign=footerlinks" title="Watch gigaombigdata">gigaombigdata</a> on livestream.com. <a href="http://www.livestream.com/?utm_source=lsplayer&amp;utm_medium=embed&amp;utm_campaign=footerlinks" title="Broadcast Live Free">Broadcast Live Free</a></div>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=321359&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=203306"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=203306" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=321359+pete-warden-openheatmap-data-science-toolkit&utm_content=jroettgers">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/10/helix-nebula-and-the-future-of-europes-cloud/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=321359+pete-warden-openheatmap-data-science-toolkit&utm_content=jroettgers">Helix Nebula and the future of Europe&#8217;s cloud</a></li><li><a href="http://pro.gigaom.com/2012/10/sector-roadmap-platform-as-a-service-in-2012/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=321359+pete-warden-openheatmap-data-science-toolkit&utm_content=jroettgers">Platform as a Service in 2012</a></li><li><a href="http://pro.gigaom.com/2012/08/understanding-and-managing-the-cost-of-the-cloud/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=321359+pete-warden-openheatmap-data-science-toolkit&utm_content=jroettgers">Understanding and managing the cost of the cloud</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2011/03/23/pete-warden-openheatmap-data-science-toolkit/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2011/03/1z5o1088.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2011/03/1z5o1088.jpg?w=150" medium="image">
			<media:title type="html">Pete Warden, OpenHeatMap, at Structure Big Data 2011</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/08bc62ecf138202f06b74dfa01376e74?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">jroettgers</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/03/1z5o1088.jpg?w=300" medium="image">
			<media:title type="html">Pete Warden, OpenHeatMap, at Structure Big Data 2011</media:title>
		</media:content>
	</item>
		<item>
		<title>How Google Uses Data to Make a Better Google</title>
		<link>http://gigaom.com/2011/03/23/alfred-spector-google/</link>
		<comments>http://gigaom.com/2011/03/23/alfred-spector-google/#comments</comments>
		<pubDate>Wed, 23 Mar 2011 19:53:27 +0000</pubDate>
		<dc:creator>Kevin C. Tofel</dc:creator>
				<category><![CDATA[Alfred Spector]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Google Maps]]></category>
		<category><![CDATA[location]]></category>
		<category><![CDATA[Structure Big Data]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=321290</guid>
		<description><![CDATA[Google may have more distributed data than any other company but it still takes user input to create smarter machines. Google's Voice Search speech recognition, for example, began to improve when the service started to train itself and improve accuracy through the use of end-user data<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=321290&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom2.files.wordpress.com/2011/03/1z5o0788.jpg"><img src="http://gigaom2.files.wordpress.com/2011/03/1z5o0788.jpg?w=300&#038;h=200" alt="Alfred Spector, Google, at Structure Big Data 2011" title="Alfred Spector, Google, at Structure Big Data 2011" width="300" height="200"  class="alignleft size-medium wp-image-321346" /></a>Making sense of vast amounts of data is made easier through processor improvements, faster networks and a growing amount of cloud storage capacity, but there&#8217;s another factor that&#8217;s accelerating the ability to sift through information: user communities. At the Structure Big Data event on Wednesday, Alfred Spector, a VP of Research and Special Initiatives at Google, illustrated how to combine low-level user data with the massive information stores and cloud computing services offered by his company.</p>
<p>Perhaps the most prominent example is Google&#8217;s geographic data used both in both the Google Maps and Earth products. The company harvests global information to create useful products in their own right, but each can be supplemented through localized user data. A modern data management web app makes it easy for Google to host, manage, allow collaboration and publication of data tables or personalized maps. For example, Google Maps data combined with information from hospitals and doctors can easily show which nearby health-care providers have flu vaccines available.</p>
<p>Making large amounts of data usable and modifiable by end users has the potential to create solutions that Google hasn&#8217;t envisioned yet. But what it has done is allowed for what Spector calls a &#8220;hybrid intelligence&#8221; because users and computers are doing more together than either could do individually. Scientists that track global warming may only have access to limited datasets which show only a small picture of the overall situation. Google Earth, however, can augment its base data with sensor information from various satellites and datapoints, providing a more holistic view of global warming.</p>
<p>This user community and data combination approach is leading to smarter machines as well. The voice search features offered by Google are becoming more accurate due to speech recognition data provided by users. In effect, the speech service is training itself because it&#8217;s learning from all of the incoming data.</p>
<p>Just as they can with Google Maps data, end users can leverage these smarter machines as well. Spector said that a spam-killing blog moderator could be created by end users if they train the system with both good blog posts and spam comments. Those inputs, combined with Google&#8217;s prediction APIs and Python scripts, would effectively create an intelligent automated moderator that could continuously improve its own performance.</p>
<p><iframe width="560" height="340" src="http://cdn.livestream.com/embed/gigaombigdata?layout=4&amp;clip=pla_62eded76-b3a6-4cbe-a346-a52b795aec37&amp;autoplay=false" style="border:0;outline:0" frameborder="0" scrolling="no"></iframe>
<div style="font-size: 11px;padding-top:10px;text-align:center;width:560px">Watch <a href="http://www.livestream.com/?utm_source=lsplayer&amp;utm_medium=embed&amp;utm_campaign=footerlinks" title="live streaming video">live streaming video</a> from <a href="http://www.livestream.com/gigaombigdata?utm_source=lsplayer&amp;utm_medium=embed&amp;utm_campaign=footerlinks" title="Watch gigaombigdata at livestream.com">gigaombigdata</a> at livestream.com</div>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=321290&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=89238"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=89238" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=321290+alfred-spector-google&utm_content=kevintofel">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=321290+alfred-spector-google&utm_content=kevintofel">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=321290+alfred-spector-google&utm_content=kevintofel">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/01/12-tech-leaders-resolutions-for-2012/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=321290+alfred-spector-google&utm_content=kevintofel">12 tech leaders’ resolutions for 2012</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2011/03/23/alfred-spector-google/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2011/03/1z5o0788.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2011/03/1z5o0788.jpg?w=150" medium="image">
			<media:title type="html">Alfred Spector, Google, at Structure Big Data 2011</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/6cbb45abac59965c2626e40155358d1b?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">Kevin C. Tofel</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/03/1z5o0788.jpg?w=300" medium="image">
			<media:title type="html">Alfred Spector, Google, at Structure Big Data 2011</media:title>
		</media:content>
	</item>
		<item>
		<title>Building a Better Starbucks With Big Data</title>
		<link>http://gigaom.com/2011/03/23/netezza-jim-baum/</link>
		<comments>http://gigaom.com/2011/03/23/netezza-jim-baum/#comments</comments>
		<pubDate>Wed, 23 Mar 2011 15:17:34 +0000</pubDate>
		<dc:creator>Stacey Higginbotham</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[data warehousing]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[Jim Baum]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Structure Big Data]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=320980</guid>
		<description><![CDATA[Data isn't the solution to business problems. Pulling data into applications and using it to make decisions and improve the user experience is the way to solve business problems said Jim Baum, the CEO of Netezza, at Structure Big Data.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=320980&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom2.files.wordpress.com/2011/03/1z5o9964.jpg"><img src="http://gigaom2.files.wordpress.com/2011/03/1z5o9964.jpg?w=300&#038;h=200" alt="Jim Baum, IBM Netezza, at Structure Big Data 2011" title="Jim Baum, IBM Netezza, at Structure Big Data 2011" width="300" height="200"  class="alignleft size-medium wp-image-321063" /></a>Data isn&#8217;t the solution to business problems. Pulling data into applications and using it to make decisions and improve the user experience is the way to solve business problems said Jim Baum, the CEO of Netezza, an IBM company. Speaking at the Structure Big Data event in New York today, Baum explained that the technology to deal with big data is the hard problem many big and small companies are trying to solve today, but once the infrastructure is in place the industry needs to turn to integrating that into products.</p>
<p>But in order to use data to help deliver an impact for consumers, such as regionally customized clothing stores or a Starbucks that keeps serving hot caramel apple ciders through the summer months during colder years, the physical distribution chain has to change. Baum pointed out that grocery stores do a masterful job crunching data from their customers and about what is selling to them in order to predict inventory, but that inventory is still delivered on pallets in trucks and then loaded and unloaded by people. </p>
<p>He doesn&#8217;t see a huge wave of change coming at translating the data gains made in the digital world to the physical world anytime soon because many of those businesses aren&#8217;t appealing to venture capitalists, who are looking for less capital-intensive deals.  </p>
<p>Much of his talk was about data interacting with people and enabling people to make faster and better decisions, but he also looked ahead to how machines can use data to interact and react in real time without human intervention. He gave the example of smart grids, which would involve appliances talking to the electric grid and utilities to determine the optimal time to use energy. For example, your dishwasher may wait till power is cheaper or the load on the electric grid is lighter before running. His thesis was that data can make the world more efficient, help companies make more money and improve the user experience, but we&#8217;ve got to solve the technology issues first. </p>
<p><iframe width="560" height="340" src="http://cdn.livestream.com/embed/gigaombigdata?layout=4&amp;clip=pla_b8a12885-96de-4de0-9914-5e9d23ffc877&amp;autoplay=false" style="border:0;outline:0" frameborder="0" scrolling="no"></iframe>
<div style="font-size: 11px;padding-top:10px;text-align:center;width:560px">Watch <a href="http://www.livestream.com/?utm_source=lsplayer&amp;utm_medium=embed&amp;utm_campaign=footerlinks" title="live streaming video">live streaming video</a> from <a href="http://www.livestream.com/gigaombigdata?utm_source=lsplayer&amp;utm_medium=embed&amp;utm_campaign=footerlinks" title="Watch gigaombigdata at livestream.com">gigaombigdata</a> at livestream.com</div>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=320980&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=41439"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=41439" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=320980+netezza-jim-baum&utm_content=shigginbotham">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2010/09/the-red-hot-data-warehouse-market-whos-buying-next/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=320980+netezza-jim-baum&utm_content=shigginbotham">The Red-Hot Data Warehouse Market: Who&#8217;s Buying Next?</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=320980+netezza-jim-baum&utm_content=shigginbotham">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=320980+netezza-jim-baum&utm_content=shigginbotham">A near-term outlook for big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2011/03/23/netezza-jim-baum/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2011/03/1z5o9964.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2011/03/1z5o9964.jpg?w=150" medium="image">
			<media:title type="html">Jim Baum, IBM Netezza, at Structure Big Data 2011</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/aee37121e18bf76bb9fee4494bab237a?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">shigginbotham</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/03/1z5o9964.jpg?w=300" medium="image">
			<media:title type="html">Jim Baum, IBM Netezza, at Structure Big Data 2011</media:title>
		</media:content>
	</item>
	</channel>
</rss>
