<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>GigaOM &#187; database</title>
	<atom:link href="http://gigaom.com/tag/database/feed/" rel="self" type="application/rss+xml" />
	<link>http://gigaom.com</link>
	<description></description>
	<lastBuildDate>Thu, 20 Jun 2013 05:49:14 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='gigaom.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/0db8f6557d022075dbbf010c54d46d93?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>GigaOM &#187; database</title>
		<link>http://gigaom.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://gigaom.com/osd.xml" title="GigaOM" />
	<atom:link rel='hub' href='http://gigaom.com/?pushpress=hub'/>
		<item>
		<title>Now anyone can buy the NSA&#8217;s database tech</title>
		<link>http://gigaom.com/2013/06/19/now-anyone-can-buy-the-nsas-database-tech/</link>
		<comments>http://gigaom.com/2013/06/19/now-anyone-can-buy-the-nsas-database-tech/#comments</comments>
		<pubDate>Wed, 19 Jun 2013 16:18:43 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Accumulo]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Cybersecurity]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[sqrrl]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=658967</guid>
		<description><![CDATA[Sqrrl Enterprise, a commercial version of the National Security Agency's Accumulo database technology, is now generally available. As one might expect, it's all about security and analytics at a massive scale.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=658967&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Say what you will about the National Security Agency, but you can&#8217;t say it doesn&#8217;t know how to share &#8212; or how to build technology that can scale. In fact, Accumulo, the petabyte-scale database technology the agency built, has been available as an open-source project for a couple of years. Now, however, a more-polished version of Accumulo is up for sale to the general public thanks to a Cambridge, Mass.-based startup called <a href="http://www.sqrrl.com/">Sqrrl</a>.</p>
<p>On Wednesday the company announced the general availability of its product, Sqrrl Enterprise, which is a cleaned-up and more-functional version of the open source Accumulo software. That means users will get an experience a lot more similar to what NSA data analysts get than what the core database code allows.</p>
<p>How do we know this? Because Sqrrl&#8217;s co-founder and CTO Adam Fuchs helped build Accumulo and the applications that run on top of it during his previous life working for the spy agency. (If you want to know more about the history of Accumulo and the types of massive graph analyses the NSA is using it for, you can check out my coverage of the NSA citizen-spying scandal from two weeks ago (<a href="http://gigaom.com/2013/06/06/heres-how-the-nsa-analyzes-all-that-call-data/">here</a> and <a href="http://gigaom.com/2013/06/07/under-the-covers-of-the-nsas-big-data-effort">here</a>).) So, instead of just downloading an open-source take on Google&#8217;s BigTable data store, Sqrrl users get things like built-in analytic functions and search; support for JSON data structures; and data encryption both at rest and in motion.</p>
<div id="attachment_659096" class="wp-caption aligncenter" style="width: 650px"><a href="http://gigaom2.files.wordpress.com/2013/06/sqrrl.jpg"><img  alt="The Sqrrl architecture" src="http://gigaom2.files.wordpress.com/2013/06/sqrrl.jpg?w=708"   class="size-full wp-image-659096" /></a><p class="wp-caption-text">The Sqrrl architecture</p></div>
<p>It&#8217;s the latter features around security that Sqrrl Co-founder and VP of Business Development Ely Kahn said have many early Sqrrl users most excited. Health care companies, in particular, highlight an ideal use case for security features like those that Sqrrl provides. Because of its cell-level security and access control, Kahn explained, providers can try to do new things around data sharing while still keeping compliant with regulations such as HIPAA and the data requirements that come along with the Affordable Care Act.</p>
<p>But the applications of Accumulo and Sqrrl could be much broader across industries. Because it&#8217;s based on Hadoop, Sqrrl gives companies peace of mind when it comes to storing big data securely, Kahn said, which has been a big reason that many companies are afraid to do Hadoop in production. And Sqrrl&#8217;s analytic capabilities make it easier to analyze all that data, including log files and network data that could help a company track down the causes of any cyberattacks they might suffer.</p>
<p>At this point, said Kahn, who was previously director of cybersecurity strategy at the National Security Staff in the White House, that should be a major concern. For most organizations, he said, it&#8217;s not a question of whether they&#8217;ve been breached but &#8220;a question of whether they know that they know they&#8217;ve been breached.&#8221;</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=658967&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=280015"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=280015" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=658967+now-anyone-can-buy-the-nsas-database-tech&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/report/cloud-security-market-landscape-2013-2017/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=658967+now-anyone-can-buy-the-nsas-database-tech&utm_content=dharrisstructure">Cloud security market landscape, 2013–2017</a></li><li><a href="http://pro.gigaom.com/report/the-new-economics-of-enterprise-data-warehousing/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=658967+now-anyone-can-buy-the-nsas-database-tech&utm_content=dharrisstructure">How data warehousing is now a cost-effective solution for businesses</a></li><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=658967+now-anyone-can-buy-the-nsas-database-tech&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/06/19/now-anyone-can-buy-the-nsas-database-tech/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/09/shutterstock_110961494.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/09/shutterstock_110961494.jpg?w=150" medium="image">
			<media:title type="html">Database rows</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/06/sqrrl.jpg" medium="image">
			<media:title type="html">The Sqrrl architecture</media:title>
		</media:content>
	</item>
		<item>
		<title>Stealth-mode 28msec wants to build a Tower of Babel for databases</title>
		<link>http://gigaom.com/2013/06/11/stealth-mode-28msec-wants-to-build-a-tower-of-babel-for-databases/</link>
		<comments>http://gigaom.com/2013/06/11/stealth-mode-28msec-wants-to-build-a-tower-of-babel-for-databases/#comments</comments>
		<pubDate>Tue, 11 Jun 2013 13:00:01 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[28msec]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[launchpad]]></category>
		<category><![CDATA[MongoDB]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Structure]]></category>
		<category><![CDATA[Structure 2013]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=656262</guid>
		<description><![CDATA[28msec is about to exit stealth mode and take the covers off its database platform that lets users query data from any source in real time.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=656262&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.28msec.com/">28msec</a> is not your average database startup but, then again, neither is its mission. The company — still in stealth mode (until our <a href="http://event.gigaom.com/structure/?utm_source=cloud&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=656262+stealth-mode-28msec-wants-to-build-a-tower-of-babel-for-databases&amp;utm_content=dharrisstructure">Structure Launchpad event</a> on June 20) after about seven years of existence — has created a data-processing platform that it says can take and analyze data from any source, and then deliver the results in real time.</p>
<p>The company took so long to officially launch, CEO Eric Kish told me, because it took such a long time to build. The 28msec history goes like this: The early investors are database industry veterans (one was employee No. 6 at Oracle) who, at some point in 2006, envisioned an explosion in data formats and databases. Their solution was to create a platform able to extract data from any of these sources, transform it into a standard format, and then let users analyze it using a single query language that looks a lot like the SQL they already know. 28msec is based on the open source <a href="http://www.jsoniq.org/">JSONiq</a> and <a href="http://www.zorba-xquery.com/">Zorba</a> query languages and will be available as a cloud service.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/06/structure_launchpad_in-article.png"><img alt="Structure_Launchpad" src="http://gigaom2.files.wordpress.com/2013/06/structure_launchpad_in-article.png?w=708"   class="aligncenter size-full wp-image-654130"></a></p>
<p>That’s about all Kish is willing to spill right now with regard to the technology.</p>
<p>As for the company itself, it has been staffed thus far primarily by Ph.Ds. in query technologies from ETH Zurich in Switzerland, where co-founder Donald Kossmann is a professor. Every year since 28msec was founded, it has hired one of his graduates to help build the product. The company brought on Kish, a serial entrepreneur, as CEO in 2012.</p>
<p>28msec was originally based in Zurich, but is in the process of shifting its base to Palo Alto, where Kish lives. It has raised $5.5 million in capital from friends and family, and already has paying customers.</p>
<p>As for the name, 28msec, it’s a reference to the time it takes for a database to access data stored on a hard disk. After the headquarters, maybe that name will be the next thing to change given the prevalence of flash and RAM as database storage media. “Seven years later,” Kish acknowledged, “it’s not relevant anymore.”</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=656262&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=277726"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=277726" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=656262+stealth-mode-28msec-wants-to-build-a-tower-of-babel-for-databases&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/report/cloud-and-data-first-quarter-2013-analysis-and-outlook/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=656262+stealth-mode-28msec-wants-to-build-a-tower-of-babel-for-databases&utm_content=dharrisstructure">Cloud and data first-quarter 2013: analysis and outlook</a></li><li><a href="http://pro.gigaom.com/2013/01/cloud-and-data-fourth-quarter-2012-analysis/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=656262+stealth-mode-28msec-wants-to-build-a-tower-of-babel-for-databases&utm_content=dharrisstructure">The fourth quarter of 2012 in cloud</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=656262+stealth-mode-28msec-wants-to-build-a-tower-of-babel-for-databases&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/06/11/stealth-mode-28msec-wants-to-build-a-tower-of-babel-for-databases/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/06/json-xml-e1370889222654.png?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/06/json-xml-e1370889222654.png?w=150" medium="image">
			<media:title type="html">json-xml</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/06/structure_launchpad_in-article.png" medium="image">
			<media:title type="html">Structure_Launchpad</media:title>
		</media:content>
	</item>
		<item>
		<title>Under the covers of the NSA&#8217;s big data effort</title>
		<link>http://gigaom.com/2013/06/07/under-the-covers-of-the-nsas-big-data-effort/</link>
		<comments>http://gigaom.com/2013/06/07/under-the-covers-of-the-nsas-big-data-effort/#comments</comments>
		<pubDate>Sat, 08 Jun 2013 02:15:19 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Accumulo]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Cybersecurity]]></category>
		<category><![CDATA[data privacy]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[intelligence]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[NSA]]></category>
		<category><![CDATA[privacy]]></category>
		<category><![CDATA[spying]]></category>
		<category><![CDATA[sqrrl]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=655599</guid>
		<description><![CDATA[There's much debate still to be had over the NSA's recently uncovered data-collection practices, but some of the technologies underlying them are out in the open. Here's what we know already.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=655599&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>The <a href="http://gigaom.com/2013/06/07/through-a-prism-darkly-tracking-the-ongoing-nsa-surveillance-story/">NSA&#8217;s data collection practices</a> have much of America &#8212; and certainly the tech community &#8212; on edge, but sources familiar with the agency&#8217;s technology are saying the situation isn&#8217;t as bad as it seems. Yes, the agency has a lot of data and can do some powerful analysis, but, the argument goes, there are strict limits in place around how the agency can use it and who has access. Whether that&#8217;s good enough is still an open debate, but here&#8217;s what we know about the technology that&#8217;s underpinning all that data.</p>
<h2 id="what-is-accumulo">What is Accumulo?</h2>
<p>The technological linchpin to everything the NSA is doing from a data-analysis perspective is <a href="http://en.wikipedia.org/wiki/Apache_Accumulo">Accumulo</a> &#8212; an open-source database the agency built in order to store and analyze huge amounts of data. Adam Fuchs knows Accumulo well because he helped build it during a nine-year stint with the NSA; he&#8217;s now co-founder and CTO of a company called <a href="http://www.sqrrl.com/">Sqrrl</a> that sells a commercial version of the database system. I spoke with him earlier this week, days before news broke of the NSA collecting data from Verizon and the country&#8217;s largest web companies.</p>
<div id="attachment_655914" class="wp-caption alignright" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/06/fuchs.jpg"><img  alt="fuchs" src="http://gigaom2.files.wordpress.com/2013/06/fuchs.jpg?w=300&#038;h=173" width="300" height="173" class="size-medium wp-image-655914" /></a><p class="wp-caption-text">Adam Fuchs</p></div>
<p>The NSA began building Accumulo in late 2007, Fuchs said, because they were trying to do automated analysis for tracking and discovering new terrorism suspects. &#8220;We had a set of applications that we wanted to develop and we were looking for the right infrastructure to build them on,&#8221; he said.</p>
<p>The problem was those technologies weren&#8217;t available. He liked what projects like HBase were doing by using Hadoop to mimic Google&#8217;s famous BigTable data store, but it still wasn&#8217;t up to the NSA requirements around scalability, reliability or security. So, they began work on a project called CloudBase, which eventually was renamed Accumulo.</p>
<p>Now, Fuchs said, &#8220;It&#8217;s operating at thousands-of-nodes scale&#8221; within the NSA&#8217;s data centers. There are multiple instances each storing tens of petabytes (1 petabyte equals 1,000 terabyes or 1 million gigabytes) of data and it&#8217;s the backend of the agency&#8217;s most widely used analytical capabilities. Accumulo&#8217;s ability to handle data in a variety of formats (a characteristic called <a href="http://stackoverflow.com/questions/15589184/what-does-being-schema-less-mean-for-a-nosql-database">&#8220;schemaless&#8221;</a> in database jargon) means the NSA can store data from numerous sources all within the database and add new analytic capabilities in days or even hours.</p>
<p>&#8220;It&#8217;s quite critical,&#8221; he added.</p>
<h2 id="what-the-nsa-can-and-cant-do-w">What the NSA can and can&#8217;t do with all this data</h2>
<p>As I <a href="http://gigaom.com/2013/06/06/heres-how-the-nsa-analyzes-all-that-call-data/">explained on Thursday</a>, Accumulo is especially adept at analyzing trillions of data points in order to build massive graphs that can detect the connections between them and the strength of the connections. Fuchs didn&#8217;t talk about the size of the NSA&#8217;s graph, but he did say the database is designed to handle months or years worth of information and let analysts move from query to query very fast. When you&#8217;re talking about analyzing call records, it&#8217;s easy to see where this type of analysis would be valuable in determining how far a suspected terrorist&#8217;s network might spread and who might be involved.</p>
<p>Stewart Baker, former NSA general counsel under George W. Bush, <a href="http://www.skatingonstilts.com/skating-on-stilts/2013/06/the-fisa-court-order-flap-take-a-deep-breath.html">wrote on his blog Thursday</a> that this type of data could also be used for for general pattern recognition &#8212; the kinds of stuff that targeted advertisers love to do. Only, instead of the system serving someone an ad because of what they&#8217;ve been searching for and the operating system they&#8217;re using, Baker presented the hypothetical of &#8220;[an] American who makes a call to Yemen at 11 a.m., Sanaa time, hangs up after a few seconds, and then gets a call from a different Yemeni number three hours later.&#8221;</p>
<p>The big legal question here is around probable cause and whether the government should further investigate this caller based on call patterns similar to those of known terrorists, but the big data question is around false positives. Baker&#8217;s hypothetical might appear pretty cut and dry but, data scientist <a href="http://www.linkedin.com/in/turian">Joseph Turian</a> explains, call records in general probably don&#8217;t offer too strong of a signal and could lead to situations where innocent behavior patterns looks a lot like nefarious ones. &#8220;But once you start connecting the dots with other pieces of information you have from other sources,&#8221; he said via email, &#8220;you can start making more predictions.&#8221;</p>
<p>This is where a program like PRISM, the NSA&#8217;s reported effort to collect data straight from the likes of Google, Facebook and Apple could come into play. If you&#8217;re able to tie a name or web account to a phone number, you can figure out all sorts of information. If you can prove that certain people are radical Islamists, for example, you can start to infer more things about the others in that social graph.</p>
<p>And if Sqrrl&#8217;s capabilities are any indicator of what Accumulo is supporting within the NSA, the agency can perform a lot of simpler functions on its data as well. In addition to graph processing, said Ely Kahn, Sqrrl&#8217;s co-founder and VP of business development, their product includes pre-packaged analytic capabilities around SQL queries and full-text search, and also supports streaming data. This means Sqrrl&#8217;s version can support any number of interesting use cases &#8212; from processing data as it hits the system to keeping a massive index that can be searched in the same way someone searches the web.</p>
<h2 id="how-much-data-is-the-nsa-colle">How much data is the NSA collecting? Follow the money</h2>
<p>We&#8217;re not quite sure how much data the two programs that came to light this week are actually collecting, but the evidence suggests it&#8217;s not that much &#8212; at least from a volume perspective. Take the PRISM program that&#8217;s gathering data from web properties including Google, Facebook, Microsoft, Apple, Yahoo and AOL. It seems the NSA would have to be selective in what it grabs.</p>
<p>Assuming it includes every cost associated with running the program, the $20 million per year allocated to PRISM, <a href="http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/">according to the slides published by the</a> <em>Washington Post</em>, wouldn&#8217;t be nearly enough to store all the raw data &#8212; much less new datasets created from analyses &#8212; from such large web properties. Yahoo alone, I&#8217;m told, was spending over $100 million a year to operate its <a href="http://gigaom.com/2013/03/04/the-history-of-hadoop-from-4-nodes-to-the-future-of-data/">approximately 42,000-node Hadoop environment</a>, consisting of hundreds of petabytes, a few years ago. Facebook users <a href="http://gigaom.com/2012/08/22/facebook-is-collecting-your-data-500-terabytes-a-day/">are generating more than 500 terabytes of new data</a> every day.</p>
<p>Using about the least-expensive option around for mass storage &#8212; cloud storage provider Backblaze&#8217;s <a href="http://gigaom.com/2013/02/20/it-turns-out-a-lot-of-companies-like-building-their-own-storage-gear/">open source storage pod designs</a> &#8212; just storing 500 terabytes of Facebook data a day would cost more than $10 million in hardware alone over the course of a year. Using higher-performance hard drives or other premium gear &#8212; things Backblaze eschews because it&#8217;s concerned primarily about cost and scalability rather than performance &#8212; would cost even more.</p>
<p>Even at the Backblaze price point, though, which is pocket change for the NSA, the agency would easily run over $20 million trying to store too many emails, chats, Skype calls, photos, videos and other types data from the other companies it&#8217;s working with.</p>
<p>Actually, it&#8217;s possible the intelligence community is taking advantage of the Backblaze designs. In September 2011, Backblaze CEO Gleb Budman says, he met with CIA representatives who discussed that agency&#8217;s five-year plan &#8220;to centralize data services into a large private cloud&#8221; and how Backblaze&#8217;s technology might fit into it. Its plans for analyzing this data, as illustrated in the slide below (and <a href="http://gigaom.com/2013/03/20/even-the-cia-is-struggling-to-deal-with-the-volume-of-real-time-social-data/">discussed by CIA CTO Ira &#8220;Gus&#8221; Hunt at Structure: Data</a> in March), seem to mirror what the NSA has in mind.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/06/cia-big-data.jpg"><img  alt="cia big data" src="http://gigaom2.files.wordpress.com/2013/06/cia-big-data.jpg?w=708&#038;h=531" width="708" height="531" class="aligncenter size-large wp-image-655904" /></a>Whatever type of gear the NSA is using, though, and how ever much it&#8217;s spending on the Verizon data or PRISM specifically, we do know the agency is spending a lot of money on its data infrastructure. There are those dozens (at least) of petabytes of overall data in Accumulo, and the agency is famously building a 1-million-square-foot, $1.5 billion data center in Utah. It <a href="http://www.datacenterknowledge.com/archives/2013/06/06/nsa-to-build-860-million-hpc-center-in-maryland/">recently began construction on a 600,000-square-foot, $860 million facility</a> in Maryland.</p>
<h2 id="policies-are-in-place">Policies are in place</h2>
<p>Sqrrl&#8217;s Kahn &#8212; who previously served as director of cybersecurity strategy at the National Security Staff in the White House &#8212; says even with all the effort it&#8217;s putting into data collection and analysis, the NSA really is concerned about privacy. Not only are there strict administrative and legal limitations in place about when the agency can actually search through collected data (something Stewart Baker <a href="http://www.skatingonstilts.com/skating-on-stilts/2013/06/stewart-baker-fisa-nsa-law.html">explains in more detail</a> in a Friday blog post), but Accumulo itself was designed with privacy in mind.</p>
<p>The system itself is designed to make sure there&#8217;s not a free-for-all on data, another individual familiar with Accumulo said.</p>
<p>It has what Kahn and Sqrrl CTO Fuchs described as &#8220;cell-level&#8221; security, meaning administrators can manage access to individual pieces of data within a table. Furthermore, Fuchs explained, those policies stick with the data as it&#8217;s transformed as part of the analysis process, so someone prohibited from seeing it won&#8217;t be able to see it just because it&#8217;s now part of a different dataset. When data would come into the NSA from the CIA, he said, there were policies in place around who could see it, and Accumulo helped enforce them.</p>
<p>Even agencies within the Department of Homeland Security are using or experimenting with Accumulo, Kahn added, because <a href="http://gigaom.com/2012/04/11/cispa-isnt-sopa-but-it-isnt-ideal-and-it-might-become-law/">proposed legislation</a> would put them in charge of ensuring privacy as cybersecurity data exchanges hands between the government and private corporations.</p>
<p>It&#8217;s ironic he acknowledged, but Accumulo actually flips the presumed paradigm that stricter security and privacy regulations mean less sharing. That might be a shallow victory for citizens concerned about their civil liberties, but data collection and sharing don&#8217;t seem likely to stop any time soon. At least it&#8217;s something.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=655599&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=283749"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=283749" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=655599+under-the-covers-of-the-nsas-big-data-effort&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=655599+under-the-covers-of-the-nsas-big-data-effort&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2010/10/with-scalable-data-stores-around-is-nosql-a-non-starter/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=655599+under-the-covers-of-the-nsas-big-data-effort&utm_content=dharrisstructure">With Scalable Data Stores Around, Is NoSQL a Non-Starter?</a></li><li><a href="http://pro.gigaom.com/2009/12/will-the-real-time-web-bring-high-performance-to-a-system-near-you/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=655599+under-the-covers-of-the-nsas-big-data-effort&utm_content=dharrisstructure">Will the Real-Time Web Bring High Performance to a System Near You?</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/06/07/under-the-covers-of-the-nsas-big-data-effort/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/02/shutterstock_37622056.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/02/shutterstock_37622056.jpg?w=150" medium="image">
			<media:title type="html">sql statement</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/06/fuchs.jpg?w=300" medium="image">
			<media:title type="html">fuchs</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/06/cia-big-data.jpg?w=708" medium="image">
			<media:title type="html">cia big data</media:title>
		</media:content>
	</item>
		<item>
		<title>Here&#8217;s how the NSA analyzes all that call data</title>
		<link>http://gigaom.com/2013/06/06/heres-how-the-nsa-analyzes-all-that-call-data/</link>
		<comments>http://gigaom.com/2013/06/06/heres-how-the-nsa-analyzes-all-that-call-data/#comments</comments>
		<pubDate>Thu, 06 Jun 2013 22:13:39 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Accumulo]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[data privacy]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[graph analysis]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[intelligence]]></category>
		<category><![CDATA[NSA]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=654984</guid>
		<description><![CDATA[How does the NSA analyze all the data it's collecting from cell phone users? With a massive database system built with just such scale and workloads in mind.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=654984&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>The National Security Agency might not have the names of Verizon&#8217;s wireless customers, but the agency probably can figure out what they&#8217;re up to if it&#8217;s so inclined. The metadata Verizon has provided the NSA &#8212; phone numbers, numbers called, duration of calls, location &#8212; is a veritable treasure trove to an organization with the right analytic skills and the right tools. The NSA has both.</p>
<p>There are numerous methods the NSA could use to extract some insights from what must be a mind-blowing number of phone calls and text messages, but graph analysis is likely the king. As <a href="http://gigaom.com/2013/05/14/were-witnessing-the-rise-of-the-graph-in-big-data/">we&#8217;ve explained numerous times over the past few months</a>, graph analysis is ideal for identifying connections among pieces of data. It&#8217;s what powers social graphs, product recommendations and even some fairly complex medical research.</p>
<div id="attachment_645089" class="wp-caption aligncenter" style="width: 677px"><a href="http://gigaom2.files.wordpress.com/2013/05/lnkdmap-1.jpg"><img  alt="My LinkedIn social graph" src="http://gigaom2.files.wordpress.com/2013/05/lnkdmap-1.jpg?w=708"   class="size-full wp-image-645089" /></a><p class="wp-caption-text">My LinkedIn social graph</p></div>
<p>But now it has really come to the fore as a tool for fighting crime (or intruding on civil liberties, however you want to look at it). The NSA is storing all those Verizon (and, presumably, other carrier records) in a <a href="http://en.wikipedia.org/wiki/Apache_Accumulo">massive database system called Accumulo</a>, which it built itself (on top of Hadoop) a few years ago because there weren&#8217;t any other options suitable for its scale and requirements around stability or security. The NSA is currently storing tens of petabytes of data in Accumulo.</p>
<p><strong>For a more thorough description of Accumulo and the NSA infrastructure, read our post <a href="http://gigaom.com/2013/06/07/under-the-covers-of-the-nsas-big-data-effort/">&#8220;Under the covers of the NSA&#8217;s big data effort.&#8221;</a></strong></p>
<p>In graph parlance, vertices are the individual data points (e.g., phone numbers or social network users) and edges are the connections among them. In late May, the NSA <a href="http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013_56002v1.pdf">released a slide presentation</a> detailing how fast fast Accumulo is able to process a 4.4-trillion-node, 70-trillion-edge graph. By way of comparison, the graph behind Facebook&#8217;s Graph Search feature <a href="https://www.facebook.com/notes/facebook-engineering/under-the-hood-indexing-and-ranking-in-graph-search/10151361720763920">contains billions of nodes and trillions of edges</a>. (In the low trillions, from what I understand.)</p>
<p>So, yes, the NSA is able to easily analyze the call and text-message records of hundreds of million of mobile subscribers. It&#8217;s also <a href="http://www.datacenterknowledge.com/archives/2013/06/06/nsa-to-build-860-million-hpc-center-in-maryland/">building out some massive data center real estate</a> to support all the data it&#8217;s collecting.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/06/nsa.jpg"><img  alt="nsa" src="http://gigaom2.files.wordpress.com/2013/06/nsa.jpg?w=708&#038;h=530" width="708" height="530" class="aligncenter size-large wp-image-655344" /></a></p>
<p>How might a graph analysis work within the NSA? The easy answer, <a href="http://arstechnica.com/tech-policy/2013/06/white-house-spying-on-us-citizens-critical-tool-for-fighting-terror/#p3">which the government has acknowledged</a>, is to figure out who else is in contact with suspected terrorists. If there&#8217;s a strong connection between you and Public Enemy No. 1, the NSA will find out and get to work figuring out who you are. That could be via a search warrant or wiretap authorization, or it could conceivably <a href="http://gigaom.com/2013/03/28/when-theres-no-such-thing-as-anonymous-data-does-privacy-just-mean-security/">figure out who someone likely is by using location data</a>.</p>
<p>Having such a big database of call records also provides the NSA with an easy way to go back and find out information about someone should their number pop up in a future investigation. Assuming the number is somewhere in their index, agents can track it down and get to work figuring out who it&#8217;s related to and from where it has been making calls.</p>
<p>Presumably, agents could begin with location data, too. If a bomb went off at Location X, bringing up all the numbers making calls from towers in that area might be a good starting point for investigation. Tracking someone&#8217;s movement from location data could be helpful, too.</p>
<p>If this all sounds a little creepy, maybe it should. After all, the world&#8217;s biggest, baddest intelligence agency can pretty much figure out who you are, who you know and where you go. And unlike web and retail companies that <a href="http://gigaom.com/2013/06/05/will-the-latest-nsa-surveillance-scandal-be-a-wake-up-call-for-the-power-of-data/">collect and analyze so much data about us</a>, the government can put you in jail.</p>
<p>It might be even creepier when you consider <a href="http://gigaom.com/2012/01/24/supreme-court-sidesteps-digital-privacy-for-now/">how much other data law enforcement agencies can collect about you </a>without a warrant.</p>
<p>However, someone familiar with NSA policy told me, the good news is that the vast majority of people are still anonymous even in this sea of data: There&#8217;s just too much data to care until someone pops up in the bad guys&#8217; networks or gets on the agency&#8217;s radar.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=654984&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=180672"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=180672" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=654984+heres-how-the-nsa-analyzes-all-that-call-data&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=654984+heres-how-the-nsa-analyzes-all-that-call-data&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2011/11/dissecting-the-data-5-issues-for-our-digital-future/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=654984+heres-how-the-nsa-analyzes-all-that-call-data&utm_content=dharrisstructure">Dissecting the data: 5 issues for our digital future</a></li><li><a href="http://pro.gigaom.com/report/the-new-economics-of-enterprise-data-warehousing/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=654984+heres-how-the-nsa-analyzes-all-that-call-data&utm_content=dharrisstructure">How data warehousing is now a cost-effective solution for businesses</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/06/06/heres-how-the-nsa-analyzes-all-that-call-data/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/06/nsa1.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/06/nsa1.jpg?w=150" medium="image">
			<media:title type="html">nsa</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/lnkdmap-1.jpg" medium="image">
			<media:title type="html">My LinkedIn social graph</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/06/nsa.jpg?w=708" medium="image">
			<media:title type="html">nsa</media:title>
		</media:content>
	</item>
		<item>
		<title>IBM throws its weight behind MongoDB for mobile apps</title>
		<link>http://gigaom.com/2013/06/04/ibm-throws-its-weight-behind-mongodb-for-mobile-apps/</link>
		<comments>http://gigaom.com/2013/06/04/ibm-throws-its-weight-behind-mongodb-for-mobile-apps/#comments</comments>
		<pubDate>Tue, 04 Jun 2013 21:17:35 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[10Gen]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[MongoDB]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=654192</guid>
		<description><![CDATA[IBM and 10gen are collaborating on a standard that would make it easier to write applications that can access data from both MongoDB and relational systems such as IBM DB2.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=654192&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>IBM helped propel SQL, Linux and Java into the mainstream, and now it&#8217;s looking to do the same for MongoDB. The company said it&#8217;s working with MongoDB creator 10gen on a new standard that will let mobile apps built atop the NoSQL database connect with data stored in business-critical systems.</p>
<p>At its core, the new standard &#8212; which encompasses the MongoDB API, data representation (<a href="http://bsonspec.org/">BSON</a>), query language and wire protocol &#8212; appears to be all about establishing a way for mobile and other next-generation applications to connect with enterprise database systems such as IBM&#8217;s popular DB2 database and its WebSphere eXtreme Scale data grid. MongoDB <a href="http://gigaom.com/2013/04/09/mongodb-ftw-fast-growing-10gen-hires-first-cfo/">is already immensely popular among web and mobile developers</a> who must deal with semi- and unstructured data, but its lack of transactional integrity (<a href="http://www.quora.com/MongoDB/Which-companies-have-moved-away-from-MongoDB-and-why">among other things</a>) means MongoDB isn&#8217;t often deployed for &#8220;mission-critical&#8221; applications that require ACID compliance and consistent performance.</p>
<p>In theory, the new standards would MongoDB-based applications easily and securely access mission-critical database systems. This could usher in a new wave of flexible applications that add significant value by spanning multiple data systems. According to a press release, &#8220;Customers can begin to use these new features later this summer by pairing eXtreme Scale with MongoDB, and by running their MongoDB applications on DB2 directly.&#8221;</p>
<p>The companies are also seeking participation from other parties interested in developing standard methods for interacting with MongoDB.</p>
<p>However, there&#8217;s a bigger shift at play here than the development of a new database standard, and it has everything to do with <a href="http://gigaom.com/2013/06/04/why-ibm-desperately-needed-to-buy-softlayer/">IBM&#8217;s planned acquisition of cloud provider SoftLayer</a>, also announced on Tuesday. If IBM wants to remain relevant as server sales and application platforms move to the cloud, it has to embrace the new business and application-development models that come along with cloud computing. IBM&#8217;s stable of enterprise developers might not be <a href="http://gigaom.com/2013/06/03/google-takes-on-parse-with-new-service-for-mobile-app-backends/">deploying mobile apps on Parse or Google</a> any time soon, but they will look for alternative platforms if IBM doesn&#8217;t at least try to keep up with a changing landscape.</p>
<p>Coincidentally, SoftLayer and 10gen <a href="http://gigaom.com/2012/12/04/new-slick-mongodb-managed-service-from-softlayer-and-10gen/">already have a strong partnership</a> around hosting MongoDB applications in the cloud.</p>
<p>If IBM is still an IT kingmaker, that bodes very well for MongoDB, as well as for the OpenStack cloud computing platform that IBM is also backing. If IBM&#8217;s influence in this realm is slipping, though, one could argue that IBM needs MongoDB and OpenStack more than they need it.</p>
<p>I am awaiting comment from IBM and/or 10gen for more details on the scope of their partnership and this standard, and will update when I hear more.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=654192&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=805352"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=805352" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=654192+ibm-throws-its-weight-behind-mongodb-for-mobile-apps&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/report/cloud-and-data-first-quarter-2013-analysis-and-outlook/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=654192+ibm-throws-its-weight-behind-mongodb-for-mobile-apps&utm_content=dharrisstructure">Cloud and data first-quarter 2013: analysis and outlook</a></li><li><a href="http://pro.gigaom.com/2013/01/cloud-and-data-fourth-quarter-2012-analysis/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=654192+ibm-throws-its-weight-behind-mongodb-for-mobile-apps&utm_content=dharrisstructure">The fourth quarter of 2012 in cloud</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=654192+ibm-throws-its-weight-behind-mongodb-for-mobile-apps&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/06/04/ibm-throws-its-weight-behind-mongodb-for-mobile-apps/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/06/mongodb-e1370373696392.png?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/06/mongodb-e1370373696392.png?w=150" medium="image">
			<media:title type="html">mongodb</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>
	</item>
		<item>
		<title>Why Hortonworks is riding a faster Hive to the bitter end</title>
		<link>http://gigaom.com/2013/05/29/why-hortonworks-is-riding-a-faster-hive-to-the-bitter-end/</link>
		<comments>http://gigaom.com/2013/05/29/why-hortonworks-is-riding-a-faster-hive-to-the-bitter-end/#comments</comments>
		<pubDate>Wed, 29 May 2013 23:14:51 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[data warehouse]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[ebay]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[hive]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[SQL on Hadoop]]></category>
		<category><![CDATA[Teradata]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=650170</guid>
		<description><![CDATA[While the rest of the Hadoop world is trying to distance itself from Hive with new interactive engines, Hortonworks is trying to make it faster. It might actually be a sound strategy.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=650170&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Hortonworks isn’t about to get off the Apache Hadoop elephant just because everyone around it is now trying to <a href="http://gigaom.com/2013/04/30/with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market/">ride impalas</a>. The company released version 1.3 of its Hortonworks Data Platform on Wednesday, a major aspect of which is an improved iteration of <a href="http://hive.apache.org/">Apache Hive</a> that the company claims runs 50 times faster the previous version. Over the next year or so, Hortonworks expects to improve the speed of Hive by 100x its previous limits — this while its competitors are all but leaving Hive in the dust in favor of newer, faster analytic systems.</p>
<p>If you’re unfamiliar with Hive, it’s a project that Facebook developed in 2008 to make Hadoop function more like a traditional enterprise data warehouse. Hive stores data inside the Hadoop Distributed File System in structured format, and then allows users to query it using a language very similar to SQL. Until very recently, Hive has been the de facto method for querying (in a traditional sense) data stored in Hadoop, and it has proven immensely popular as more companies have begun tackling their big data woes with Hadoop.</p>
<h2 id="hive-wasnt-built-for-speed">Hive wasn’t built for speed</h2>
<p>However, as more companies got used to Hadoop, they also began to notice its shortcomings. One of them is around MapReduce, a powerful but not-exactly-speedy method of processing data that requires running the job across every node in the cluster in order to find the right data. Although the Hive interface is that of a SQL query, it relies on on MapReduce as the processing engine.</p>
<p>(For more on how Hadoop and its flavor of MapReduce came to be, read <a href="http://gigaom.com/2013/03/04/the-history-of-hadoop-from-4-nodes-to-the-future-of-data/">this post on the history of Hadoop</a>. To see me speak with Google Fellow and MapReduce creator Jeff Dean about how far Google has moved from a MapReduce-centric computing model, <a href="http://event.gigaom.com/structure/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=650170+why-hortonworks-is-riding-a-faster-hive-to-the-bitter-end&amp;utm_content=dharrisstructure">come to Structure next month</a>.)</p>
<p>Users wanted faster, more-interactive query processing on top of Hadoop, similar to what they had grown accustomed to with data warehouse systems such as Teradata, Greenplum and Netezza. Hadoop vendors such as Cloudera (with Impala), MapR (with <a href="http://gigaom.com/2012/08/17/for-fast-interactive-hadoop-queries-drill-may-be-the-answer/">Drill</a>), IBM (with <a href="http://gigaom.com/2013/05/06/look-ibm-is-doing-sql-on-hadoop-too/">Big SQL</a>) — as well as <a href="http://gigaom.com/2013/02/21/sql-is-whats-next-for-hadoop-heres-whos-doing-it/">a spate of startups</a> — have obliged with their own new technologies that in various ways blend the familiarity of SQL with the scalability of Hadoop. EMC Greenplum, now Pivotal, has <a href="http://gigaom.com/2013/02/25/emc-to-hadoop-competition-see-ya-wouldnt-wanna-be-ya/">transplanted its existing database system</a> inside of Hadoop.</p>
<p>Even <a href="http://www.qubole.com/">Qubole,</a> a cloud-based startup from Hive creators Ashish Thusoo and Joydeep Sen Sarma, is <a href="http://gigaom.com/2013/04/23/hadoop-startup-qubole-raises-7m-for-hive-as-a-service/">keeping an eye on how projects such as Impala and Shark</a> (from <a href="http://gigaom.com/2013/04/17/welcome-to-berkeley-where-hadoop-isnt-nearly-fast-enough/">the University of California, Berkeley’s AMPLab</a>) might factor into its plans.</p>
<h2 id="giving-hive-a-better-stinger">Giving Hive a better “Stinger”</h2>
<p>Hortonworks, the Yahoo spinoff dedicated to driving the Apache Hadoop bus, is sticking with Hive. But is has a plan, and a point.</p>
<p>Essentially, VP of Products Bob Page told me during a recent briefing, “It just makes more sense from our view to have everything done in one place.” He means that Hive is already the method by which most people are already comfortable using SQL to access Hadoop data, so there’s no use rocking the boat by adding yet another technology into the mix. Hortonworks will just make Hive faster to the point (100x) where it’s at least in the ballpark of what these entirely new systems are capable of doing, but where users still use the same tools for interactive and batch queries.</p>
<p>It has in place a three-phase plan, under the <a href="http://hortonworks.com/stinger/">“Stinger” codename</a>, in order to make this happen. The first phase, now available as part of the Hive 0.11 release, is a new set of analytic functions and a columnar file format that Page says has resulted in a 50x performance increase over the previous version. The next phase is to move <del>YARN</del> Hive off of MapReduce and onto a still-under-development processing framework called <a href="http://wiki.apache.org/incubator/TezProposal">Tez</a>.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/05/stinger.png"><img alt="stinger" src="http://gigaom2.files.wordpress.com/2013/05/stinger.png?w=708"   class="aligncenter size-full wp-image-650283"></a>“You’ll see phase two come to bear later this year,” Page said, once <a href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html">YARN</a> — a new resource manager that lets Hadoop clusters run multiple processing engines simultaneously — is ready for production.</p>
<p>The third phase is a whole new vector query engine for Hive and new tools for intelligent query planning. Page didn’t have a target date in mind for that phase, except to note that “we’re not talking about a five-year cycle.”</p>
<h2 id="sql-isnt-the-end-game-for-hado">SQL isn’t the end game for Hadoop</h2>
<p>It would be easy to dismiss Page’s and Hortonworks’ optimism about Stinger as a sweet lemons type of rationalization — the company was founded around Apache Hadoop and can’t really go about developing entirely new products outside that foundation — but they also appear to have their eyes focused on a future where SQL isn’t too big a differentiator.</p>
<p>SQL is the way folks used to data for the last 30 years can see how Hadoop fits in their environment, Page said, but the compelling thing about Hadoop “is it really unlocks a new way about how one thinks about storing and processing data.” Once YARN is ready to go, he added, there will be new avenues of innovation in areas like graph analysis and stream processing.</p>
<p>Page comes from a place of credibility when he talks about this evolution in thinking. Before coming to Hortonworks in March, he was vice president of analytics platform and delivery at eBay, <a href="http://gigaom.com/2012/01/31/under-the-covers-of-ebays-big-data-operation/">a company that knows its way around big data</a>. When people get all their data in one place, they want to do more things with it, he explained. The thinking becomes less about using Hadoop to lower cost and more about “How do I use Hadoop to increase my top line?”.</p>
<p>Besides, Page noted (echoing the sentiment of just about everybody else in the Hadoop space, <a href="http://gigaom.com/2013/04/30/with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market/">including Cloudera CEO Mike Olson</a>), even as companies turn Hadoop into their primary data store, it’s difficult to see Hadoop ever entirely replacing high-value relational data warehouse systems like Teradata. One could argue, then, that there’s no real purpose in trying too hard to match those systems in terms of capabilities.</p>
<p>At eBay, he said, they ran an in-depth analysis to see if it was economically or technologically feasible to collapse its big data workloads onto a single system. eBay has dozens of petabytes stored in Hadoop and <a href="http://gigaom.com/2013/03/27/why-apple-ebay-and-walmart-have-some-of-the-biggest-data-warehouses-youve-ever-seen/">possibly more within various Teradata appliances</a>. The result: “We just couldn’t find a way in which we could justify collapsing everything we do into one system.”</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-486163p1.html">Shutterstock user vblinov</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=650170&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=134588"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=134588" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=650170+why-hortonworks-is-riding-a-faster-hive-to-the-bitter-end&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=650170+why-hortonworks-is-riding-a-faster-hive-to-the-bitter-end&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/report/sql-on-hadoop-roadmap-2013/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=650170+why-hortonworks-is-riding-a-faster-hive-to-the-bitter-end&utm_content=dharrisstructure">Sector RoadMap: SQL-on-Hadoop platforms in 2013</a></li><li><a href="http://pro.gigaom.com/2012/04/sector-roadmap-hadoop-platforms-2012/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=650170+why-hortonworks-is-riding-a-faster-hive-to-the-bitter-end&utm_content=dharrisstructure">2012: The Hadoop infrastructure market booms</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/29/why-hortonworks-is-riding-a-faster-hive-to-the-bitter-end/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/05/shutterstock_57327433.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/05/shutterstock_57327433.jpg?w=150" medium="image">
			<media:title type="html">wasps nest</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/stinger.png" medium="image">
			<media:title type="html">stinger</media:title>
		</media:content>
	</item>
		<item>
		<title>As robots get smarter, they&#8217;ll be pouring our coffee (and beer)</title>
		<link>http://gigaom.com/2013/05/29/as-robots-get-smarter-theyll-be-pouring-our-coffee-and-beer/</link>
		<comments>http://gigaom.com/2013/05/29/as-robots-get-smarter-theyll-be-pouring-our-coffee-and-beer/#comments</comments>
		<pubDate>Wed, 29 May 2013 17:43:38 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[data]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[internet of things]]></category>
		<category><![CDATA[robotics]]></category>
		<category><![CDATA[robots]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=650079</guid>
		<description><![CDATA[Researchers at Cornell University have created a robog capable of predicting human gestures. In theory, smarter robots are better at everything, from pouring drinks without spilling to just seeming more human.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=650079&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Humanoid robots. Useful in theory, not so much in practice. They&#8217;re kind of creepy, too. But if you&#8217;re desperate to have a silicon-powered helper pour your beer, have no fear &#8212; that day might be closer than you think.</p>
<p>Scientists are hard at work creating robots that are able to sense and predict human actions, which should make them better performing tasks and look more natural while doing so. Researchers at Cornell University have <a href="http://news.cornell.edu/stories/2013/04/think-ahead-robots-anticipate-human-actions">trained a robot to recognize</a> objects in its line of sight, as well as certain human actions, and then assign probabilities to the next set of possible actions. The video below (ignore the campiness of it) shows the robot in action trying to refill a cup of coffee &#8212; it must recognize the book and coffee cup on a person&#8217;s desk, predict which one he&#8217;s going to pick up, predict him taking a drink, then predict him setting the cup down for long enough for a pour to occur.</p>
<span class='embed-youtube' style='text-align:center; display: block;'><iframe class='youtube-player' type='text/html' width='604' height='370' src='http://www.youtube.com/embed/xaa_wEkCvG0?version=3&#038;rel=1&#038;fs=1&#038;showsearch=0&#038;showinfo=1&#038;iv_load_policy=1&#038;wmode=transparent' frameborder='0'></iframe></span>
<p>It&#8217;s kind of like IBM&#8217;s Watson system, only it&#8217;s predicting human actions instead of the correct answers to questions. In fact, both rely on databases full of relevant information &#8212; human movements and objects in the case of the robot &#8212; in order to make predictions. And instead of accepting natural-language queries like Watson does, the Cornell robot uses a Microsoft Kinect camera to visually detect what&#8217;s going on. (Interesting side note: Microsoft researchers used a Kinect and machine learning to <a href="http://gigaom.com/2013/03/12/microsofts-vision-of-our-future-is-big-screens-and-big-data/">train an elevator in the company&#8217;s research building</a> to detect the difference between someone who intends to get on and someone who&#8217;s stopping to chat in front of it.)</p>
<p>Teaching robots to predict human gestures isn&#8217;t just about saving us a trip to the coffee pot, refrigerator or elevator control panel, though. As Stacey Higginbotham explained last week when <a href="http://gigaom.com/2013/05/20/how-to-make-a-less-creepy-robot-simple-just-add-big-data/">covering a similar experiment by Disney Research</a>, the more human-like the robot, the more comfortable people will be interacting with it. Important if you&#8217;re in the theme-park business, yes, but also if you&#8217;re trying to automate some of the caregiving functions that aging baby boomers will require in the next couple decades.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-519241p1.html">Shutterstock user Ociacia</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=650079&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=665100"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=665100" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=650079+as-robots-get-smarter-theyll-be-pouring-our-coffee-and-beer&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2013/01/ces-2013-flash-analysis-disruptions-and-disappointments-from-consumer-techs-biggest-show/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=650079+as-robots-get-smarter-theyll-be-pouring-our-coffee-and-beer&utm_content=dharrisstructure">GigaOM Research highs and lows from CES 2013</a></li><li><a href="http://pro.gigaom.com/2011/11/the-internet-of-things-creating-tomorrows-health-care/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=650079+as-robots-get-smarter-theyll-be-pouring-our-coffee-and-beer&utm_content=dharrisstructure">The Internet of things: creating tomorrow&#8217;s health care</a></li><li><a href="http://pro.gigaom.com/2011/11/connected-world-the-consumer-technology-revolution/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=650079+as-robots-get-smarter-theyll-be-pouring-our-coffee-and-beer&utm_content=dharrisstructure">Connected world: the consumer technology revolution</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/29/as-robots-get-smarter-theyll-be-pouring-our-coffee-and-beer/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/05/shutterstock_119131771-e1369849209563.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/05/shutterstock_119131771-e1369849209563.jpg?w=150" medium="image">
			<media:title type="html">robot hand</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>
	</item>
		<item>
		<title>How to make a less creepy robot? Simple, just add data</title>
		<link>http://gigaom.com/2013/05/20/how-to-make-a-less-creepy-robot-simple-just-add-big-data/</link>
		<comments>http://gigaom.com/2013/05/20/how-to-make-a-less-creepy-robot-simple-just-add-big-data/#comments</comments>
		<pubDate>Mon, 20 May 2013 21:22:00 +0000</pubDate>
		<dc:creator>Stacey Higginbotham</dc:creator>
				<category><![CDATA[database]]></category>
		<category><![CDATA[Disney Research]]></category>
		<category><![CDATA[robots]]></category>
		<category><![CDATA[UI]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=647315</guid>
		<description><![CDATA[Deep inside the House of Mouse researchers are solving computer science and mechanical engineering problems -- like how to build a robot that can hand you a drink without creeping you out.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=647315&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Disney&#8217;s research arm has <a href="http://www.disneyresearch.com/wp-content/uploads/Disney-Research-Human-to-Robot-Handoff-FINAL.pdf"> solved a problem that you probably didn&#8217;t even know robots have</a> &#8212; their inability to accept objects from people in a natural way. The Disney Research team, working with funding from the International Center for Advanced Communication Technologies (interACT) at Carnegie Mellon and the University and Karlsruhe Institute of Technology (KIT), believe that robots who can&#8217;t naturally accept &#8220;handoffs&#8221; of objects from people are creepy. In a <a href="http://www.disneyresearch.com/wp-content/uploads/icra13_RecMoHumanoidRobotics_final.pdf">paper</a> presented this month, Disney and its partners detailed how they used several motion-sensitive cameras, a database of gestures and some fancy algorithms to solve this handoff problem.</p>
<p>From the <a href="http://www.disneyresearch.com/project/objectreceivingrobots/">press release announcing the findings</a>:</p>
<blockquote id="quote-%e2%80%9cif-a-robot-"><p>“If a robot just sticks out its hand blindly, or uses motions that look more robotic than human, a person might feel uneasy working with that robot or might question whether it is up to the task,” Katsu Yamane, Disney Research, Pittsburgh senior research scientist explained. “We assume human-like motions are more user-friendly because they are familiar.”</p></blockquote>
<p><a href="http://gigaom2.files.wordpress.com/2013/05/recmo_image-1024x174.png"><img  alt="RecMo_image-1024x174" src="http://gigaom2.files.wordpress.com/2013/05/recmo_image-1024x174.png?w=708&#038;h=120" width="708" height="120" class="aligncenter size-large wp-image-647348" /></a></p>
<p>Despite the robot pictured on the Disney page touting this research looking like the mechanical, blue-haired skeleton that haunted my childhood nightmares, its attempts to grab the purse from the person do seem reactive to the human&#8217;s gestures, as opposed to the robot just sticking his arm out there and the person having to accommodate it. And that sort of naturalism will be important as we bring more robots into our homes and workplaces.</p>
<p>For example, an MIT group used a dancer&#8217;s motions to build a <a href="http://gigaom.com/2013/05/13/just-add-robots-mit-and-coke-show-off-a-smartphone-controlled-bartender/">robotic bartender</a> in a quest for naturalism &#8212; even though that robot doesn&#8217;t interact with people.</p>
<p>Today, designers try to <a href="http://www.nytimes.com/2013/01/27/opinion/sunday/our-talking-walking-objects.html">endear robots to us</a> with quirky noises (like R2D2) and maybe light displays or LED faces &#8212; anything to help anthropomorphize them. But as robots become more human-looking they can also become more sinister &#8212; <a href="http://en.wikipedia.org/wiki/Uncanny_valley">achieving that same uncanny valley</a> that Disney and other content companies have struggled with in animation. Remember the dead-eyed stars of the Polar Express that you probably couldn&#8217;t empathize with? The jerky movements of a home health robot might engender similar feelings &#8212; or worse &#8212; they may scare people.</p>
<p>Building the natural gestures of the Disney robot took the creation of a hierarchical gesture database that the robot can access as it detects the person passing something to it. In the Disney paper research, the robot is not only able to reach for the handbag, but when the human attempts a fake pass to the robot, the <del datetime="2013-05-20T19:18:59+00:00">blue-haired monstrosity</del> robot is able to adapt. From the release:</p>
<blockquote id="quote-to-enable-a-robot-to2"><p>To enable a robot to access a library of human-to-human passing motions with the speed necessary for robot-human interaction, the researchers developed a hierarchical data structure. Using principal component analysis, the researchers first developed a rough estimate of the distribution of various motion samples. They then grouped samples of similar poses and organized them into a binary tree structure. With a series of “either/or” decisions, the robot can rapidly search this database, so it can recognize when the person initiates a handing motion and then refine its response as the person follows through.</p></blockquote>
<p>Even if you don&#8217;t have an opinion on how naturally robots should move, this research brings home the awesome amount of work it takes to build computers and robots that mimic the capabilities of a person. Much like computer visualization, the science of robotic interaction takes a problem the size of a mountain and has to chip it down into grains of sand using a toothpick to find solutions. It&#8217;s a testament to human curiosity that people are willing to try.</p>
<p>Also, I expect Disney might be lured by the idea of natural-looking robots roaming its theme parks. My only question is would they be dressed up as characters or working the cash register at the gift stores. </p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=647315&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=135391"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=135391" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=647315+how-to-make-a-less-creepy-robot-simple-just-add-big-data&utm_content=shigginbotham">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=647315+how-to-make-a-less-creepy-robot-simple-just-add-big-data&utm_content=shigginbotham">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2012/01/newnet-q4-platform-mania-and-social-commerce-shakeout/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=647315+how-to-make-a-less-creepy-robot-simple-just-add-big-data&utm_content=shigginbotham">NewNet Q4: Platform mania and social commerce shakeout</a></li><li><a href="http://pro.gigaom.com/2012/01/newnet-q4-platform-mania-and-social-commerce-shakeout/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=647315+how-to-make-a-less-creepy-robot-simple-just-add-big-data&utm_content=shigginbotham">NewNet Q4: Platform mania and social commerce shakeout</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/20/how-to-make-a-less-creepy-robot-simple-just-add-big-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/05/disneyrobot.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/05/disneyrobot.jpg?w=150" medium="image">
			<media:title type="html">disneyrobot</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/aee37121e18bf76bb9fee4494bab237a?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">shigginbotham</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/recmo_image-1024x174.png?w=708" medium="image">
			<media:title type="html">RecMo_image-1024x174</media:title>
		</media:content>
	</item>
		<item>
		<title>Database startup Drawn to Scale is closing down</title>
		<link>http://gigaom.com/2013/05/17/database-startup-drawn-to-scale-is-closing-down/</link>
		<comments>http://gigaom.com/2013/05/17/database-startup-drawn-to-scale-is-closing-down/#comments</comments>
		<pubDate>Fri, 17 May 2013 21:24:03 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[Drawn to Scale]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hbase]]></category>
		<category><![CDATA[SQL on Hadoop]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=646718</guid>
		<description><![CDATA[Database startup Drawn to Scale, creator of the SQL-on-Hadoop technology called Spire, is closing down. The company's product, Spire, was one of the first SQL-on-Hadoop technologies.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=646718&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Database startup Drawn to Scale, creator of the SQL-on-Hadoop technology called Spire, is closing down. Co-founder and CEO Bradford Stephens officially <a href="http://www.roadtofailure.com/?p=11">announced the closure in a blog post</a> on Friday.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/05/spirearchitecture-015-e1361407038325.png"><img  alt="spirearchitecture-015-e1361407038325" src="http://gigaom2.files.wordpress.com/2013/05/spirearchitecture-015-e1361407038325.png?w=300&#038;h=185" width="300" height="185" class="alignleft size-medium wp-image-646740" /></a>The company&#8217;s product, Spire, which provided full SQL support on top of the HBase NoSQL database, was one of the first products to <a href="http://gigaom.com/2012/07/24/how-one-startup-wants-to-inject-hadoop-into-your-sql/">try to blend Hadoop&#8217;s scalability with the robustness and familiarity of SQL</a>. That&#8217;s now <a href="http://gigaom.com/2013/03/05/the-hadoop-ecosystem-the-welcome-elephant-in-the-room-infographic/">an increasingly crowded space</a> (and has grown since that linked graphic was created). In March, Drawn to Scale <a href="http://gigaom.com/2013/03/19/drawn-to-scale-wants-to-solve-your-mongodb-scalability-problems/">expanded its support to MongoDB</a>, as well.</p>
<p>I wasn&#8217;t shocked when Stephens told me the news &#8212; questions about the four-year-old company&#8217;s financial health had been swirling for a while &#8212; but to hear of its financial woes was a bit surprising. His account in the post pretty much echoes what I had heard from others:</p>
<blockquote id="quote-it-seemed-we-had-eve"><p>&#8220;It seemed we had everything going for us — paid customers such as American Express, Orange Telecom, Flurry, and 4 others. Our technology worked brilliantly, we had a big hiring pipeline, and we had great media presence against our competitors who raised 10-100x more cash.&#8221;</p></blockquote>
<p>He added:</p>
<blockquote id="quote-yet-five-days-before2"><p>&#8220;Yet five days before we signed term sheets for a big A round or sold the company, we started getting hit by a series of black swans — and we just didn’t have what we needed to recover. I’ll leave the public detail at that level, but I will say that paying employees’ health insurance out of your meager savings is a powerful incentive to change course.&#8221;</p></blockquote>
<p>Up to this point, the company <a href="http://gigaom.com/2012/03/08/drawn-to-scale-raises-money-to-make-sql-big-data-ready/">had raised $925,000</a> from RTP Ventures, IA Ventures and SK Ventures. There&#8217;s no word yet on what will come of the company&#8217;s intellectual property.</p>
<p>As Stephens &#8212; who&#8217;s now doing an entrepreneur-in-residence gig at Ping Identity and helping out other startups (including popular wardrobe app <a href="http://www.clothapp.com/">Cloth</a>) &#8212; succinctly put it during a phone discussion, &#8220;We just don&#8217;t have the horsepower to keep running the company.&#8221;</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=646718&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=143678"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=143678" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=646718+database-startup-drawn-to-scale-is-closing-down&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/report/the-new-economics-of-enterprise-data-warehousing/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=646718+database-startup-drawn-to-scale-is-closing-down&utm_content=dharrisstructure">How data warehousing is now a cost-effective solution for businesses</a></li><li><a href="http://pro.gigaom.com/report/sql-on-hadoop-roadmap-2013/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=646718+database-startup-drawn-to-scale-is-closing-down&utm_content=dharrisstructure">Sector RoadMap: SQL-on-Hadoop platforms in 2013</a></li><li><a href="http://pro.gigaom.com/report/how-to-use-big-data-to-make-better-business-decisions/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=646718+database-startup-drawn-to-scale-is-closing-down&utm_content=dharrisstructure">How to use big data to make better business decisions</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/17/database-startup-drawn-to-scale-is-closing-down/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/05/dtsdragon.png?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/05/dtsdragon.png?w=150" medium="image">
			<media:title type="html">dtsdragon</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/spirearchitecture-015-e1361407038325.png?w=300" medium="image">
			<media:title type="html">spirearchitecture-015-e1361407038325</media:title>
		</media:content>
	</item>
		<item>
		<title>How data warehousing is now a cost-effective solution for businesses</title>
		<link>http://pro.gigaom.com/report/the-new-economics-of-enterprise-data-warehousing/</link>
		<comments>http://pro.gigaom.com/report/the-new-economics-of-enterprise-data-warehousing/#comments</comments>
		<pubDate>Mon, 13 May 2013 06:55:34 +0000</pubDate>
		<dc:creator>nraden</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[ADAPA]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[apache-hadoop]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[clickstream analytics]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[cloud-infrastructure]]></category>
		<category><![CDATA[columnar databases]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[data analysis]]></category>
		<category><![CDATA[data management]]></category>
		<category><![CDATA[data storage]]></category>
		<category><![CDATA[data warehouse]]></category>
		<category><![CDATA[data warehousing]]></category>
		<category><![CDATA[data-analytics]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[database technology]]></category>
		<category><![CDATA[Database theory]]></category>
		<category><![CDATA[distributed processing]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[etl]]></category>
		<category><![CDATA[extraction transform load systems]]></category>
		<category><![CDATA[Ferrari]]></category>
		<category><![CDATA[file systems]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[HDFS]]></category>
		<category><![CDATA[Hewlett-Packard]]></category>
		<category><![CDATA[high-speed technologies]]></category>
		<category><![CDATA[HP]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[information technology]]></category>
		<category><![CDATA[integrated circuit]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Mapr]]></category>
		<category><![CDATA[mobile devices]]></category>
		<category><![CDATA[Moore's Law]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[Open Source Software]]></category>
		<category><![CDATA[parallel processing]]></category>
		<category><![CDATA[relational-databases]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[social networks]]></category>
		<category><![CDATA[storage devices]]></category>
		<category><![CDATA[storage virtualization technologies]]></category>
		<category><![CDATA[System administration]]></category>
		<category><![CDATA[tco]]></category>
		<category><![CDATA[total-cost-of-ownership]]></category>
		<category><![CDATA[Transaction processing]]></category>
		<category><![CDATA[Truviso]]></category>
		<category><![CDATA[Vertica]]></category>
		<category><![CDATA[Virtualization technology]]></category>

		<guid isPermaLink="false">http://pro.gigaom.com/?post_type=go-report&#038;p=175747/</guid>
		<description><![CDATA[Data-warehouse providers are quickly adding Hadoop distributions, or even their own versions of Hadoop, into their architecture, adding further cost advantages to collections of extremely large data sets. Finding the talent to manage this newly converged environment will not be easy, but it presents tremendous opportunity for companies willing to take some risk.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=648494&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>The new economics of data warehousing provide attractive alternatives in both costs and benefits. While big data gets most of the attention, evolved data warehousing will play an important role for the foreseeable future. In order to be relevant, data-warehouse design and operation need to be simplified, taking advantage of greatly improved hardware, software, and methods.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=648494&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=369636"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=369636" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=648494+the-new-economics-of-enterprise-data-warehousing&utm_content=nraden">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=648494+the-new-economics-of-enterprise-data-warehousing&utm_content=nraden">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/report/cloud-and-data-first-quarter-2013-analysis-and-outlook/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=648494+the-new-economics-of-enterprise-data-warehousing&utm_content=nraden">Cloud and data first-quarter 2013: analysis and outlook</a></li><li><a href="http://pro.gigaom.com/report/how-to-use-big-data-to-make-better-business-decisions/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=648494+the-new-economics-of-enterprise-data-warehousing&utm_content=nraden">How to use big data to make better business decisions</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://pro.gigaom.com/report/the-new-economics-of-enterprise-data-warehousing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="https://gigaom-pro-files.s3.amazonaws.com/files/2012/08/datacenter1.jpg?w=150" />
		<media:content url="https://gigaom-pro-files.s3.amazonaws.com/files/2012/08/datacenter1.jpg?w=150" medium="image">
			<media:title type="html">datacenter1</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/fdbbd80432b14e9d84aa12c6fc0cce24?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">nraden</media:title>
		</media:content>
	</item>
	</channel>
</rss>
