<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>GigaOM &#187; NoSQL</title>
	<atom:link href="http://gigaom.com/tag/nosql/feed/" rel="self" type="application/rss+xml" />
	<link>http://gigaom.com</link>
	<description></description>
	<lastBuildDate>Thu, 20 Jun 2013 02:42:17 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='gigaom.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/0db8f6557d022075dbbf010c54d46d93?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>GigaOM &#187; NoSQL</title>
		<link>http://gigaom.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://gigaom.com/osd.xml" title="GigaOM" />
	<atom:link rel='hub' href='http://gigaom.com/?pushpress=hub'/>
		<item>
		<title>MetricaDB wants to tie data together for frustrated analysts</title>
		<link>http://gigaom.com/2013/06/12/metricadb-wants-to-tie-data-together-for-frustrated-analysts/</link>
		<comments>http://gigaom.com/2013/06/12/metricadb-wants-to-tie-data-together-for-frustrated-analysts/#comments</comments>
		<pubDate>Wed, 12 Jun 2013 13:00:37 +0000</pubDate>
		<dc:creator>David Meyer</dc:creator>
				<category><![CDATA[David Crawford]]></category>
		<category><![CDATA[launchpad]]></category>
		<category><![CDATA[Metrica]]></category>
		<category><![CDATA[MetricaDB]]></category>
		<category><![CDATA[MongoDB]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[postgres]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Structure]]></category>
		<category><![CDATA[Structure Launchpad]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=656774</guid>
		<description><![CDATA[Want to correlate data from a NoSQL database with data from Salesforce, Stripe or Mailchimp? MetricaDB, a finalist in our Structure 2013 LaunchPad competition, wants to make that easy for you.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=656774&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>As a company these days, it&#8217;s way too easy to find your data spread across a wide variety of cloud services. And when it comes to tying that information together in a meaningful way, the result can be pretty confusing if you don&#8217;t have a data infrastructure team to join the dots.</p>
<p>That&#8217;s the problem that David Crawford is trying to fix with his cloud analytics startup <a href="http://www.metricadb.com/">MetricaDB</a>, one of our <a href="http://about.gigaom.com/2013/04/25/gigaom-announces-structure-2013-launchpad-finalists/">Structure 2013 LaunchPad</a> finalists. It&#8217;s a software-as-a-service (SaaS) tool for individual analysts who don&#8217;t care whether the data is held in a NoSQL database or behind a Salesforce or Google Analytics API – they just want to deal with it in one place.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/06/structure_launchpad_in-article.png"><img src="http://gigaom2.files.wordpress.com/2013/06/structure_launchpad_in-article.png?w=300&#038;h=200" alt="Structure_Launchpad" width="300" height="200"  class="alignleft size-medium wp-image-654130" /></a>Crawford&#8217;s a one-man band at the moment and the 8-month-old MetricaDB is a bootstrapped affair, but it seems pretty effective for such an early product (albeit a product that&#8217;s already evolved from simply enabling SQL queries against MongoDB data).</p>
<p>After signing up, you click on the different SaaS products you use to gain a view on your customer – Mailchimp, Sendgrid, Salesforce, Stripe and so on – and end up with a SQL console that puts all that data in straightforward tables. Then you can run SQL queries (MetricaDB is built on Postgres) or what have you and export the results to Excel.</p>
<p>So, for example, a user could look at support tickets on his or her company&#8217;s cloud CRM system and join it to the Stripe billing system to see how the biggest spenders feel they&#8217;re being treated. </p>
<p>&#8220;You&#8217;re not looking up the API libraries or running a script,&#8221; Crawford said. &#8220;You attach multiple services and they all just appear as tables in the same database – pulling the data together and doing something across the different sources is just a SQL join.&#8221; </p>
<p>Crawford is pitching MetricaDB as a product for individual analysts who are &#8220;tired of waiting for engineers to get the data they want and have them deliver an Excel spreadsheet&#8221; – down the line he wants to add enough features to make it a fully-fledged enterprise play, but right now he&#8217;s attacking the market from the bottom up.</p>
<p>&#8220;What I&#8217;ve learned about security in the SaaS world is it&#8217;s largely a brand perception issue,&#8221; Crawford said. &#8220;I need to build the brand so people will be comfortable giving me access to their proprietary data and knowing I&#8217;ll steer it well.&#8221;</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=656774&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=403100"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=403100" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=656774+metricadb-wants-to-tie-data-together-for-frustrated-analysts&utm_content=superglaze">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/report/cloud-and-data-first-quarter-2013-analysis-and-outlook/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=656774+metricadb-wants-to-tie-data-together-for-frustrated-analysts&utm_content=superglaze">Cloud and data first-quarter 2013: analysis and outlook</a></li><li><a href="http://pro.gigaom.com/2013/01/cloud-and-data-fourth-quarter-2012-analysis/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=656774+metricadb-wants-to-tie-data-together-for-frustrated-analysts&utm_content=superglaze">The fourth quarter of 2012 in cloud</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=656774+metricadb-wants-to-tie-data-together-for-frustrated-analysts&utm_content=superglaze">A near-term outlook for big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/06/12/metricadb-wants-to-tie-data-together-for-frustrated-analysts/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/06/metricadb-founder-david-crawford.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/06/metricadb-founder-david-crawford.jpg?w=150" medium="image">
			<media:title type="html">MetricaDB founder David Crawford</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/6599daccfd7e897e68744fe0065e5a2e?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">superglaze</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/06/structure_launchpad_in-article.png?w=300" medium="image">
			<media:title type="html">Structure_Launchpad</media:title>
		</media:content>
	</item>
		<item>
		<title>Stealth-mode 28msec wants to build a Tower of Babel for databases</title>
		<link>http://gigaom.com/2013/06/11/stealth-mode-28msec-wants-to-build-a-tower-of-babel-for-databases/</link>
		<comments>http://gigaom.com/2013/06/11/stealth-mode-28msec-wants-to-build-a-tower-of-babel-for-databases/#comments</comments>
		<pubDate>Tue, 11 Jun 2013 13:00:01 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[28msec]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[launchpad]]></category>
		<category><![CDATA[MongoDB]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Structure]]></category>
		<category><![CDATA[Structure 2013]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=656262</guid>
		<description><![CDATA[28msec is about to exit stealth mode and take the covers off its database platform that lets users query data from any source in real time.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=656262&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.28msec.com/">28msec</a> is not your average database startup but, then again, neither is its mission. The company — still in stealth mode (until our <a href="http://event.gigaom.com/structure/?utm_source=cloud&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=656262+stealth-mode-28msec-wants-to-build-a-tower-of-babel-for-databases&amp;utm_content=dharrisstructure">Structure Launchpad event</a> on June 20) after about seven years of existence — has created a data-processing platform that it says can take and analyze data from any source, and then deliver the results in real time.</p>
<p>The company took so long to officially launch, CEO Eric Kish told me, because it took such a long time to build. The 28msec history goes like this: The early investors are database industry veterans (one was employee No. 6 at Oracle) who, at some point in 2006, envisioned an explosion in data formats and databases. Their solution was to create a platform able to extract data from any of these sources, transform it into a standard format, and then let users analyze it using a single query language that looks a lot like the SQL they already know. 28msec is based on the open source <a href="http://www.jsoniq.org/">JSONiq</a> and <a href="http://www.zorba-xquery.com/">Zorba</a> query languages and will be available as a cloud service.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/06/structure_launchpad_in-article.png"><img alt="Structure_Launchpad" src="http://gigaom2.files.wordpress.com/2013/06/structure_launchpad_in-article.png?w=708"   class="aligncenter size-full wp-image-654130"></a></p>
<p>That’s about all Kish is willing to spill right now with regard to the technology.</p>
<p>As for the company itself, it has been staffed thus far primarily by Ph.Ds. in query technologies from ETH Zurich in Switzerland, where co-founder Donald Kossmann is a professor. Every year since 28msec was founded, it has hired one of his graduates to help build the product. The company brought on Kish, a serial entrepreneur, as CEO in 2012.</p>
<p>28msec was originally based in Zurich, but is in the process of shifting its base to Palo Alto, where Kish lives. It has raised $5.5 million in capital from friends and family, and already has paying customers.</p>
<p>As for the name, 28msec, it’s a reference to the time it takes for a database to access data stored on a hard disk. After the headquarters, maybe that name will be the next thing to change given the prevalence of flash and RAM as database storage media. “Seven years later,” Kish acknowledged, “it’s not relevant anymore.”</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=656262&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=665430"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=665430" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=656262+stealth-mode-28msec-wants-to-build-a-tower-of-babel-for-databases&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/report/cloud-and-data-first-quarter-2013-analysis-and-outlook/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=656262+stealth-mode-28msec-wants-to-build-a-tower-of-babel-for-databases&utm_content=dharrisstructure">Cloud and data first-quarter 2013: analysis and outlook</a></li><li><a href="http://pro.gigaom.com/2013/01/cloud-and-data-fourth-quarter-2012-analysis/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=656262+stealth-mode-28msec-wants-to-build-a-tower-of-babel-for-databases&utm_content=dharrisstructure">The fourth quarter of 2012 in cloud</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=656262+stealth-mode-28msec-wants-to-build-a-tower-of-babel-for-databases&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/06/11/stealth-mode-28msec-wants-to-build-a-tower-of-babel-for-databases/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/06/json-xml-e1370889222654.png?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/06/json-xml-e1370889222654.png?w=150" medium="image">
			<media:title type="html">json-xml</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/06/structure_launchpad_in-article.png" medium="image">
			<media:title type="html">Structure_Launchpad</media:title>
		</media:content>
	</item>
		<item>
		<title>Under the covers of the NSA&#8217;s big data effort</title>
		<link>http://gigaom.com/2013/06/07/under-the-covers-of-the-nsas-big-data-effort/</link>
		<comments>http://gigaom.com/2013/06/07/under-the-covers-of-the-nsas-big-data-effort/#comments</comments>
		<pubDate>Sat, 08 Jun 2013 02:15:19 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Accumulo]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Cybersecurity]]></category>
		<category><![CDATA[data privacy]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[intelligence]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[NSA]]></category>
		<category><![CDATA[privacy]]></category>
		<category><![CDATA[spying]]></category>
		<category><![CDATA[sqrrl]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=655599</guid>
		<description><![CDATA[There's much debate still to be had over the NSA's recently uncovered data-collection practices, but some of the technologies underlying them are out in the open. Here's what we know already.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=655599&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>The <a href="http://gigaom.com/2013/06/07/through-a-prism-darkly-tracking-the-ongoing-nsa-surveillance-story/">NSA&#8217;s data collection practices</a> have much of America &#8212; and certainly the tech community &#8212; on edge, but sources familiar with the agency&#8217;s technology are saying the situation isn&#8217;t as bad as it seems. Yes, the agency has a lot of data and can do some powerful analysis, but, the argument goes, there are strict limits in place around how the agency can use it and who has access. Whether that&#8217;s good enough is still an open debate, but here&#8217;s what we know about the technology that&#8217;s underpinning all that data.</p>
<h2 id="what-is-accumulo">What is Accumulo?</h2>
<p>The technological linchpin to everything the NSA is doing from a data-analysis perspective is <a href="http://en.wikipedia.org/wiki/Apache_Accumulo">Accumulo</a> &#8212; an open-source database the agency built in order to store and analyze huge amounts of data. Adam Fuchs knows Accumulo well because he helped build it during a nine-year stint with the NSA; he&#8217;s now co-founder and CTO of a company called <a href="http://www.sqrrl.com/">Sqrrl</a> that sells a commercial version of the database system. I spoke with him earlier this week, days before news broke of the NSA collecting data from Verizon and the country&#8217;s largest web companies.</p>
<div id="attachment_655914" class="wp-caption alignright" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/06/fuchs.jpg"><img  alt="fuchs" src="http://gigaom2.files.wordpress.com/2013/06/fuchs.jpg?w=300&#038;h=173" width="300" height="173" class="size-medium wp-image-655914" /></a><p class="wp-caption-text">Adam Fuchs</p></div>
<p>The NSA began building Accumulo in late 2007, Fuchs said, because they were trying to do automated analysis for tracking and discovering new terrorism suspects. &#8220;We had a set of applications that we wanted to develop and we were looking for the right infrastructure to build them on,&#8221; he said.</p>
<p>The problem was those technologies weren&#8217;t available. He liked what projects like HBase were doing by using Hadoop to mimic Google&#8217;s famous BigTable data store, but it still wasn&#8217;t up to the NSA requirements around scalability, reliability or security. So, they began work on a project called CloudBase, which eventually was renamed Accumulo.</p>
<p>Now, Fuchs said, &#8220;It&#8217;s operating at thousands-of-nodes scale&#8221; within the NSA&#8217;s data centers. There are multiple instances each storing tens of petabytes (1 petabyte equals 1,000 terabyes or 1 million gigabytes) of data and it&#8217;s the backend of the agency&#8217;s most widely used analytical capabilities. Accumulo&#8217;s ability to handle data in a variety of formats (a characteristic called <a href="http://stackoverflow.com/questions/15589184/what-does-being-schema-less-mean-for-a-nosql-database">&#8220;schemaless&#8221;</a> in database jargon) means the NSA can store data from numerous sources all within the database and add new analytic capabilities in days or even hours.</p>
<p>&#8220;It&#8217;s quite critical,&#8221; he added.</p>
<h2 id="what-the-nsa-can-and-cant-do-w">What the NSA can and can&#8217;t do with all this data</h2>
<p>As I <a href="http://gigaom.com/2013/06/06/heres-how-the-nsa-analyzes-all-that-call-data/">explained on Thursday</a>, Accumulo is especially adept at analyzing trillions of data points in order to build massive graphs that can detect the connections between them and the strength of the connections. Fuchs didn&#8217;t talk about the size of the NSA&#8217;s graph, but he did say the database is designed to handle months or years worth of information and let analysts move from query to query very fast. When you&#8217;re talking about analyzing call records, it&#8217;s easy to see where this type of analysis would be valuable in determining how far a suspected terrorist&#8217;s network might spread and who might be involved.</p>
<p>Stewart Baker, former NSA general counsel under George W. Bush, <a href="http://www.skatingonstilts.com/skating-on-stilts/2013/06/the-fisa-court-order-flap-take-a-deep-breath.html">wrote on his blog Thursday</a> that this type of data could also be used for for general pattern recognition &#8212; the kinds of stuff that targeted advertisers love to do. Only, instead of the system serving someone an ad because of what they&#8217;ve been searching for and the operating system they&#8217;re using, Baker presented the hypothetical of &#8220;[an] American who makes a call to Yemen at 11 a.m., Sanaa time, hangs up after a few seconds, and then gets a call from a different Yemeni number three hours later.&#8221;</p>
<p>The big legal question here is around probable cause and whether the government should further investigate this caller based on call patterns similar to those of known terrorists, but the big data question is around false positives. Baker&#8217;s hypothetical might appear pretty cut and dry but, data scientist <a href="http://www.linkedin.com/in/turian">Joseph Turian</a> explains, call records in general probably don&#8217;t offer too strong of a signal and could lead to situations where innocent behavior patterns looks a lot like nefarious ones. &#8220;But once you start connecting the dots with other pieces of information you have from other sources,&#8221; he said via email, &#8220;you can start making more predictions.&#8221;</p>
<p>This is where a program like PRISM, the NSA&#8217;s reported effort to collect data straight from the likes of Google, Facebook and Apple could come into play. If you&#8217;re able to tie a name or web account to a phone number, you can figure out all sorts of information. If you can prove that certain people are radical Islamists, for example, you can start to infer more things about the others in that social graph.</p>
<p>And if Sqrrl&#8217;s capabilities are any indicator of what Accumulo is supporting within the NSA, the agency can perform a lot of simpler functions on its data as well. In addition to graph processing, said Ely Kahn, Sqrrl&#8217;s co-founder and VP of business development, their product includes pre-packaged analytic capabilities around SQL queries and full-text search, and also supports streaming data. This means Sqrrl&#8217;s version can support any number of interesting use cases &#8212; from processing data as it hits the system to keeping a massive index that can be searched in the same way someone searches the web.</p>
<h2 id="how-much-data-is-the-nsa-colle">How much data is the NSA collecting? Follow the money</h2>
<p>We&#8217;re not quite sure how much data the two programs that came to light this week are actually collecting, but the evidence suggests it&#8217;s not that much &#8212; at least from a volume perspective. Take the PRISM program that&#8217;s gathering data from web properties including Google, Facebook, Microsoft, Apple, Yahoo and AOL. It seems the NSA would have to be selective in what it grabs.</p>
<p>Assuming it includes every cost associated with running the program, the $20 million per year allocated to PRISM, <a href="http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/">according to the slides published by the</a> <em>Washington Post</em>, wouldn&#8217;t be nearly enough to store all the raw data &#8212; much less new datasets created from analyses &#8212; from such large web properties. Yahoo alone, I&#8217;m told, was spending over $100 million a year to operate its <a href="http://gigaom.com/2013/03/04/the-history-of-hadoop-from-4-nodes-to-the-future-of-data/">approximately 42,000-node Hadoop environment</a>, consisting of hundreds of petabytes, a few years ago. Facebook users <a href="http://gigaom.com/2012/08/22/facebook-is-collecting-your-data-500-terabytes-a-day/">are generating more than 500 terabytes of new data</a> every day.</p>
<p>Using about the least-expensive option around for mass storage &#8212; cloud storage provider Backblaze&#8217;s <a href="http://gigaom.com/2013/02/20/it-turns-out-a-lot-of-companies-like-building-their-own-storage-gear/">open source storage pod designs</a> &#8212; just storing 500 terabytes of Facebook data a day would cost more than $10 million in hardware alone over the course of a year. Using higher-performance hard drives or other premium gear &#8212; things Backblaze eschews because it&#8217;s concerned primarily about cost and scalability rather than performance &#8212; would cost even more.</p>
<p>Even at the Backblaze price point, though, which is pocket change for the NSA, the agency would easily run over $20 million trying to store too many emails, chats, Skype calls, photos, videos and other types data from the other companies it&#8217;s working with.</p>
<p>Actually, it&#8217;s possible the intelligence community is taking advantage of the Backblaze designs. In September 2011, Backblaze CEO Gleb Budman says, he met with CIA representatives who discussed that agency&#8217;s five-year plan &#8220;to centralize data services into a large private cloud&#8221; and how Backblaze&#8217;s technology might fit into it. Its plans for analyzing this data, as illustrated in the slide below (and <a href="http://gigaom.com/2013/03/20/even-the-cia-is-struggling-to-deal-with-the-volume-of-real-time-social-data/">discussed by CIA CTO Ira &#8220;Gus&#8221; Hunt at Structure: Data</a> in March), seem to mirror what the NSA has in mind.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/06/cia-big-data.jpg"><img  alt="cia big data" src="http://gigaom2.files.wordpress.com/2013/06/cia-big-data.jpg?w=708&#038;h=531" width="708" height="531" class="aligncenter size-large wp-image-655904" /></a>Whatever type of gear the NSA is using, though, and how ever much it&#8217;s spending on the Verizon data or PRISM specifically, we do know the agency is spending a lot of money on its data infrastructure. There are those dozens (at least) of petabytes of overall data in Accumulo, and the agency is famously building a 1-million-square-foot, $1.5 billion data center in Utah. It <a href="http://www.datacenterknowledge.com/archives/2013/06/06/nsa-to-build-860-million-hpc-center-in-maryland/">recently began construction on a 600,000-square-foot, $860 million facility</a> in Maryland.</p>
<h2 id="policies-are-in-place">Policies are in place</h2>
<p>Sqrrl&#8217;s Kahn &#8212; who previously served as director of cybersecurity strategy at the National Security Staff in the White House &#8212; says even with all the effort it&#8217;s putting into data collection and analysis, the NSA really is concerned about privacy. Not only are there strict administrative and legal limitations in place about when the agency can actually search through collected data (something Stewart Baker <a href="http://www.skatingonstilts.com/skating-on-stilts/2013/06/stewart-baker-fisa-nsa-law.html">explains in more detail</a> in a Friday blog post), but Accumulo itself was designed with privacy in mind.</p>
<p>The system itself is designed to make sure there&#8217;s not a free-for-all on data, another individual familiar with Accumulo said.</p>
<p>It has what Kahn and Sqrrl CTO Fuchs described as &#8220;cell-level&#8221; security, meaning administrators can manage access to individual pieces of data within a table. Furthermore, Fuchs explained, those policies stick with the data as it&#8217;s transformed as part of the analysis process, so someone prohibited from seeing it won&#8217;t be able to see it just because it&#8217;s now part of a different dataset. When data would come into the NSA from the CIA, he said, there were policies in place around who could see it, and Accumulo helped enforce them.</p>
<p>Even agencies within the Department of Homeland Security are using or experimenting with Accumulo, Kahn added, because <a href="http://gigaom.com/2012/04/11/cispa-isnt-sopa-but-it-isnt-ideal-and-it-might-become-law/">proposed legislation</a> would put them in charge of ensuring privacy as cybersecurity data exchanges hands between the government and private corporations.</p>
<p>It&#8217;s ironic he acknowledged, but Accumulo actually flips the presumed paradigm that stricter security and privacy regulations mean less sharing. That might be a shallow victory for citizens concerned about their civil liberties, but data collection and sharing don&#8217;t seem likely to stop any time soon. At least it&#8217;s something.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=655599&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=609400"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=609400" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=655599+under-the-covers-of-the-nsas-big-data-effort&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=655599+under-the-covers-of-the-nsas-big-data-effort&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2010/10/with-scalable-data-stores-around-is-nosql-a-non-starter/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=655599+under-the-covers-of-the-nsas-big-data-effort&utm_content=dharrisstructure">With Scalable Data Stores Around, Is NoSQL a Non-Starter?</a></li><li><a href="http://pro.gigaom.com/2009/12/will-the-real-time-web-bring-high-performance-to-a-system-near-you/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=655599+under-the-covers-of-the-nsas-big-data-effort&utm_content=dharrisstructure">Will the Real-Time Web Bring High Performance to a System Near You?</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/06/07/under-the-covers-of-the-nsas-big-data-effort/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/02/shutterstock_37622056.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/02/shutterstock_37622056.jpg?w=150" medium="image">
			<media:title type="html">sql statement</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/06/fuchs.jpg?w=300" medium="image">
			<media:title type="html">fuchs</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/06/cia-big-data.jpg?w=708" medium="image">
			<media:title type="html">cia big data</media:title>
		</media:content>
	</item>
		<item>
		<title>IBM throws its weight behind MongoDB for mobile apps</title>
		<link>http://gigaom.com/2013/06/04/ibm-throws-its-weight-behind-mongodb-for-mobile-apps/</link>
		<comments>http://gigaom.com/2013/06/04/ibm-throws-its-weight-behind-mongodb-for-mobile-apps/#comments</comments>
		<pubDate>Tue, 04 Jun 2013 21:17:35 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[10Gen]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[MongoDB]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=654192</guid>
		<description><![CDATA[IBM and 10gen are collaborating on a standard that would make it easier to write applications that can access data from both MongoDB and relational systems such as IBM DB2.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=654192&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>IBM helped propel SQL, Linux and Java into the mainstream, and now it&#8217;s looking to do the same for MongoDB. The company said it&#8217;s working with MongoDB creator 10gen on a new standard that will let mobile apps built atop the NoSQL database connect with data stored in business-critical systems.</p>
<p>At its core, the new standard &#8212; which encompasses the MongoDB API, data representation (<a href="http://bsonspec.org/">BSON</a>), query language and wire protocol &#8212; appears to be all about establishing a way for mobile and other next-generation applications to connect with enterprise database systems such as IBM&#8217;s popular DB2 database and its WebSphere eXtreme Scale data grid. MongoDB <a href="http://gigaom.com/2013/04/09/mongodb-ftw-fast-growing-10gen-hires-first-cfo/">is already immensely popular among web and mobile developers</a> who must deal with semi- and unstructured data, but its lack of transactional integrity (<a href="http://www.quora.com/MongoDB/Which-companies-have-moved-away-from-MongoDB-and-why">among other things</a>) means MongoDB isn&#8217;t often deployed for &#8220;mission-critical&#8221; applications that require ACID compliance and consistent performance.</p>
<p>In theory, the new standards would MongoDB-based applications easily and securely access mission-critical database systems. This could usher in a new wave of flexible applications that add significant value by spanning multiple data systems. According to a press release, &#8220;Customers can begin to use these new features later this summer by pairing eXtreme Scale with MongoDB, and by running their MongoDB applications on DB2 directly.&#8221;</p>
<p>The companies are also seeking participation from other parties interested in developing standard methods for interacting with MongoDB.</p>
<p>However, there&#8217;s a bigger shift at play here than the development of a new database standard, and it has everything to do with <a href="http://gigaom.com/2013/06/04/why-ibm-desperately-needed-to-buy-softlayer/">IBM&#8217;s planned acquisition of cloud provider SoftLayer</a>, also announced on Tuesday. If IBM wants to remain relevant as server sales and application platforms move to the cloud, it has to embrace the new business and application-development models that come along with cloud computing. IBM&#8217;s stable of enterprise developers might not be <a href="http://gigaom.com/2013/06/03/google-takes-on-parse-with-new-service-for-mobile-app-backends/">deploying mobile apps on Parse or Google</a> any time soon, but they will look for alternative platforms if IBM doesn&#8217;t at least try to keep up with a changing landscape.</p>
<p>Coincidentally, SoftLayer and 10gen <a href="http://gigaom.com/2012/12/04/new-slick-mongodb-managed-service-from-softlayer-and-10gen/">already have a strong partnership</a> around hosting MongoDB applications in the cloud.</p>
<p>If IBM is still an IT kingmaker, that bodes very well for MongoDB, as well as for the OpenStack cloud computing platform that IBM is also backing. If IBM&#8217;s influence in this realm is slipping, though, one could argue that IBM needs MongoDB and OpenStack more than they need it.</p>
<p>I am awaiting comment from IBM and/or 10gen for more details on the scope of their partnership and this standard, and will update when I hear more.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=654192&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=87440"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=87440" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=654192+ibm-throws-its-weight-behind-mongodb-for-mobile-apps&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/report/cloud-and-data-first-quarter-2013-analysis-and-outlook/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=654192+ibm-throws-its-weight-behind-mongodb-for-mobile-apps&utm_content=dharrisstructure">Cloud and data first-quarter 2013: analysis and outlook</a></li><li><a href="http://pro.gigaom.com/2013/01/cloud-and-data-fourth-quarter-2012-analysis/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=654192+ibm-throws-its-weight-behind-mongodb-for-mobile-apps&utm_content=dharrisstructure">The fourth quarter of 2012 in cloud</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=654192+ibm-throws-its-weight-behind-mongodb-for-mobile-apps&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/06/04/ibm-throws-its-weight-behind-mongodb-for-mobile-apps/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/06/mongodb-e1370373696392.png?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/06/mongodb-e1370373696392.png?w=150" medium="image">
			<media:title type="html">mongodb</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>
	</item>
		<item>
		<title>Google&#8217;s growing cloud just got a NoSQL database</title>
		<link>http://gigaom.com/2013/05/15/googles-growing-cloud-just-got-a-nosql-database/</link>
		<comments>http://gigaom.com/2013/05/15/googles-growing-cloud-just-got-a-nosql-database/#comments</comments>
		<pubDate>Wed, 15 May 2013 23:31:16 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Amazon]]></category>
		<category><![CDATA[Amazon Web Services]]></category>
		<category><![CDATA[DynamoDB]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Google Cloud Datastore]]></category>
		<category><![CDATA[Google I/O 2013]]></category>
		<category><![CDATA[NoSQL]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=645949</guid>
		<description><![CDATA[Google is expanding its cloud platform with a "NoSQL-like" database called Cloud Datastore. It's a fully managed database that's replicated across data centers and built to scale.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=645949&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>It doesn&#8217;t have a cool name like Cassandra, Voldemort or MongoDB, but Google is offering up a non-relational database <a href="https://developers.google.com/datastore/">called Google Cloud Datastore</a>. Like almost everything the company has done since announcing its Compute Engine service at last year&#8217;s IO conference &#8212; including <a href="http://gigaom.com/2013/05/15/and-bam-heres-google-compute-engine/">the rest of the features it announced on Wednesday</a> &#8212; Cloud Datastore looks like a direct shot at current cloud champion Amazon Web Services.</p>
<p><a href="http://gigaom.com/2013/05/15/googles-growing-cloud-just-got-a-nosql-database/googlecloudstore/" rel="attachment wp-att-645989"><img  alt="googlecloudstore" src="http://gigaom2.files.wordpress.com/2013/05/googlecloudstore.jpg?w=708"   class="aligncenter size-full wp-image-645989" /></a>AWS <a href="http://gigaom.com/2012/01/18/amazon-launches-home-grown-nosql-database/">has a managed NoSQL database service called DynamoDB</a> that&#8217;s replicated across three availability zones to ensure its stays up. Google&#8217;s Cloud Datastore sounds eerily similar, according to the product&#8217;s website (although Google calls its product &#8220;NoSQL-like). It&#8217;s fully managed, built for speed and scale and is replicated across data centers. For some queries, Google even promises that Cloud Datastore will support ACID transactions.</p>
<p>Although the services advertise similar features in terms of availability and scalability, they&#8217;re quite different technically. Cloud Datastore is <a href="http://en.wikipedia.org/wiki/BigTable">based on Google&#8217;s BigTable database</a> (and <a href="http://googleappengine.blogspot.com/2009/09/migration-to-better-datastore.html">a library called Megastore on top of it</a>) while DynamoDB is <a href="http://www.allthingsdistributed.com/2012/01/amazon-dynamodb.html">based on Amazon&#8217;s Dynamo database</a>. You can get details on Datastore  and how it works <a href="https://developers.google.com/datastore/docs/concepts/overview">here</a>. Pricing information is available <a href="https://developers.google.com/cloud/pricing#cloud-datastore">here</a>.</p>
<p>If its goal is to compete with AWS, though, Google&#8217;s cloud platform still has a long way to go. Yes, it has most of the key services in place and even some seeming advantages in certain areas, but it&#8217;s lacking the incredible breadth of services AWS offers &#8212; everything from virtual server instances to a <a href="http://gigaom.com/2013/02/19/amazon-adds-opsworks-application-life-cycle-management-to-aws-cloud/">devops service</a> to a <a href="http://gigaom.com/2013/02/15/watch-out-hp-ibm-teradata-oracle-amazon-redshift-is-here/">hosted data warehouse</a>. It&#8217;s also lacking a seven-year reputation for being an all-around reliable platform and an <a href="http://gigaom.com/2013/02/21/amazon-gets-more-serious-about-the-enterprise-no-kidding/">ever-growing list of large-enterprise users</a>.</p>
<p>Of course, there&#8217;s also an argument to be made that Google doesn&#8217;t really have to compete with AWS at all when it comes to cloud computing. AWS made a name for itself by  taking all the new workloads from startups and corporate developers who wanted to build new types of applications and didn&#8217;t want to deal with the IT department; Google has the same opportunity ahead of it. <a href="http://gigaom.com/2012/09/13/will-go-be-the-new-go-to-programming-language/">New programming languages like Go</a> and the unique nature of the rest of Google&#8217;s services, Cloud Datastore included, could make it the go-to place for a class of developers that likes to push the envelope in terms of application design.</p>
<p>Oh, and Google has a little ace up its sleeve called Android. If someone is so inclined to develop mobile applications for <a href="http://gigaom.com/2013/05/15/google-io-statshot-900-million-android-devices-activated/">the most-popular mobile operating system on the planet</a>, there are worse places to host them.</p>
<p><em>This post was updated at 5:35 p.m. to clarify that DynamoDB and Cloud Datastore are based on different underlying technologies.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=645949&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=505260"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=505260" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=645949+googles-growing-cloud-just-got-a-nosql-database&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/15/googles-growing-cloud-just-got-a-nosql-database/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/10/shutterstock_113600470.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/10/shutterstock_113600470.jpg?w=150" medium="image">
			<media:title type="html">Shiny database</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/googlecloudstore.jpg" medium="image">
			<media:title type="html">googlecloudstore</media:title>
		</media:content>
	</item>
		<item>
		<title>MapR releases M7, its commercial HBase distro</title>
		<link>http://gigaom.com/2013/05/01/mapr-releases-m7-its-commercial-hbase-distro/</link>
		<comments>http://gigaom.com/2013/05/01/mapr-releases-m7-its-commercial-hbase-distro/#comments</comments>
		<pubDate>Wed, 01 May 2013 23:21:07 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Databases]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hbase]]></category>
		<category><![CDATA[Mapr]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=641425</guid>
		<description><![CDATA[MapR on Wednesday released its commercial version of HBase called M7, the first such product on the market, that the company claims is bigger, faster and better than the open source version.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=641425&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>MapR didn&#8217;t miss the memo about the key to success in the Hadoop space being the creation of a data platform that can do many things. And on Wednesday, the company released its take on HBase, <a href="http://www.mapr.com/products/mapr-editions/m7-edition">called M7.</a></p>
<p>Last week, I <a href="http://gigaom.com/2013/04/22/how-hbase-converted-myspaces-mysql-champion-and-is-driving-hadoop-mainstream/">explained how HBase is fast becoming the star of the Hadoop ecosystem</a> because it allows users to build more real-time, almost transactional applications on top of Hadoop. True to its form with its other products, MapR has taken HBase even further with M7 by promising greater availability (99.999 percent), instant recovery, faster operations and the ability to handle 1 trillion tables in a single cluster. In open source versions of HBase, MapR VP of Marketing Jack Norris told me, the accepted table limit per cluster is several hundred.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/05/m7.jpg"><img  alt="m7" src="http://gigaom2.files.wordpress.com/2013/05/m7.jpg?w=300&#038;h=265" width="300" height="265" class="alignright size-medium wp-image-641471" /></a>Additionally, M7 shares a single data layer with the Hadoop file system, meaning less performance overhead and, presumably, easier management.</p>
<p>As we&#8217;re seeing with other Hadoop vendors, including Cloudera (which <a href="http://gigaom.com/2013/04/30/with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market/">released its Impala SQL query engine on Tuesday</a>), the Hadoop market is fast becoming one where each vendor is trying to set itself apart from the rest by building the best platform with the broadest set of capabilities. In furtherance of that mission, MapR also announced on Wednesday full-text search on its Hadoop distribution thanks to a partnership with Lucene specialist LucidWorks. It already has its own Hadoop distribution complete with proprietary code to bolster the file system and speed up MapReduce, as well as an <a href="http://gigaom.com/2012/08/17/for-fast-interactive-hadoop-queries-drill-may-be-the-answer/">open source SQL-on-Hadoop project called Drill</a> in the works.</p>
<p>MapR employees are probably sleeping a lot easier these days as a result of this platform push. Others in the Hadoop market used to talk about the fear of fragmentation and then point at MapR as the example of a company helping foment that outcome with its proprietary software. Now, however, even if everyone else is building open source products, they&#8217;re all still backing their own and largely dismissing the others.</p>
<p>I suspect the result is feature lock-in even there&#8217;s no technological lock-in, kind of <a href="http://gigaom.com/2011/03/16/how-amazon-is-following-apples-lead-to-rule-cloud-computing/">like using Amazon Web Services for cloud computing</a> and then hoping to replicate its various servies elsewhere. It might be easy enough to move your data, but impossible or very difficult to replicate those additional capabilities elsewhere. If MapR can build a better version of HBase and companies are willing to pay for it, then so be it.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=641425&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=392724"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=392724" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=641425+mapr-releases-m7-its-commercial-hbase-distro&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/report/cloud-and-data-first-quarter-2013-analysis-and-outlook/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=641425+mapr-releases-m7-its-commercial-hbase-distro&utm_content=dharrisstructure">Cloud and data first-quarter 2013: analysis and outlook</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=641425+mapr-releases-m7-its-commercial-hbase-distro&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2011/03/defining-hadoop-the-players-technologies-and-challenges-of-2011/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=641425+mapr-releases-m7-its-commercial-hbase-distro&utm_content=dharrisstructure">Defining Hadoop: the Players, Technologies and Challenges of 2011</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/01/mapr-releases-m7-its-commercial-hbase-distro/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/09/shutterstock_110961494.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/09/shutterstock_110961494.jpg?w=150" medium="image">
			<media:title type="html">Database rows</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/m7.jpg?w=300" medium="image">
			<media:title type="html">m7</media:title>
		</media:content>
	</item>
		<item>
		<title>10gen introduces a backup option for MongoDB</title>
		<link>http://gigaom.com/2013/04/30/10gen-introduces-a-backup-option-for-mongodb/</link>
		<comments>http://gigaom.com/2013/04/30/10gen-introduces-a-backup-option-for-mongodb/#comments</comments>
		<pubDate>Tue, 30 Apr 2013 15:14:18 +0000</pubDate>
		<dc:creator>Jordan Novet</dc:creator>
				<category><![CDATA[10Gen]]></category>
		<category><![CDATA[backup]]></category>
		<category><![CDATA[MongoDB]]></category>
		<category><![CDATA[NoSQL]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=640880</guid>
		<description><![CDATA[10gen, the company behind the popular MongoDB NoSQL database, has come out with a way for users to back up their data, so developers can focus on building applications.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=640880&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>There&#8217;s no question that MongoDB is popular among developers. 10gen, the company behind the NoSQL database, has been building out its <a href="http://gigaom.com/2013/04/09/mongodb-ftw-fast-growing-10gen-hires-first-cfo/">executive team</a>. Now 10gen is adding a support mechanism that could give users some assurance that they won&#8217;t lose their data in the event of a disaster.</p>
<p>The MongoDB Backup Service, now in limited release with general release slated for the summer, lets customers determine how often they want to back up their databases at colocation facilities 10gen uses. If a user wants to back up every six hours, for example, then that user has many options to choose from in the way of restoring a database to a previous state. They can choose the version from six, 12, 18 or 24 hours ago. Restores require two-factor authentication and work across multiple shards. Customers pay only for the amount of backup that they use.</p>
<p>10gen, based in New York and Palo Alto, Calif., expects the service to be a hit not necessarily with big companies but with small and medium-sized businesses. &#8220;It allows them to focus on building out applications instead of worry about this operational part of the infrastructure,&#8221; said Kelly Stirman, director of product marketing at 10gen. Regardless of company size, the feature could be valuable for anyone working in Mongo with <a href="http://gigaom.com/2013/03/19/drawn-to-scale-wants-to-solve-your-mongodb-scalability-problems/">larger data sets</a>.</p>
<p>Beyond that, backing up means users can move data from a production environment into a testing environment to look for issues so their production environment won&#8217;t be affected.</p>
<p>While many MongoDB users already back up their databases, the systems are typically homemade, Stirman said. The MongoDB Backup Service, by comparison, is more reliable.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=640880&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=550108"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=550108" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640880+10gen-introduces-a-backup-option-for-mongodb&utm_content=gigajordan">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2013/01/cloud-and-data-fourth-quarter-2012-analysis/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640880+10gen-introduces-a-backup-option-for-mongodb&utm_content=gigajordan">The fourth quarter of 2012 in cloud</a></li><li><a href="http://pro.gigaom.com/report/cloud-and-data-first-quarter-2013-analysis-and-outlook/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640880+10gen-introduces-a-backup-option-for-mongodb&utm_content=gigajordan">Cloud and data first-quarter 2013: analysis and outlook</a></li><li><a href="http://pro.gigaom.com/2012/11/breaking-down-barriers-and-reducing-cycle-times-with-devops-and-continuous-delivery/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640880+10gen-introduces-a-backup-option-for-mongodb&utm_content=gigajordan">How devops can reduce cycle times</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/30/10gen-introduces-a-backup-option-for-mongodb/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2010/12/datacenterimage1.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2010/12/datacenterimage1.jpg?w=150" medium="image">
			<media:title type="html">datacenterimage1</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/c00ab753df107b639e76ed4c3ab07ba7?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigajordan</media:title>
		</media:content>
	</item>
		<item>
		<title>The ex-MySQL gang is back together, pushing MariaDB as a neutral &#8216;bridge&#8217;</title>
		<link>http://gigaom.com/2013/04/24/the-ex-mysql-gang-is-back-together-pushing-mariadb-as-a-neutral-bridge/</link>
		<comments>http://gigaom.com/2013/04/24/the-ex-mysql-gang-is-back-together-pushing-mariadb-as-a-neutral-bridge/#comments</comments>
		<pubDate>Wed, 24 Apr 2013 16:54:13 +0000</pubDate>
		<dc:creator>David Meyer</dc:creator>
				<category><![CDATA[database]]></category>
		<category><![CDATA[MariaDB]]></category>
		<category><![CDATA[Monty Widenius]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[SkySQL]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=634013</guid>
		<description><![CDATA[MySQL and MariaDB services company SkySQL has brought Monty Widenius and other MariaDB players on board. The result, says CEO Patrik Sallner, will be "a new form of database platform that ties together other databases."<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=634013&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Bad news for Oracle, maybe: some of the key pre-Sun-takeover MySQL players are back together, and their MariaDB fork of MySQL looks like it&#8217;s gaining serious traction.</p>
<p>The reunion comes courtesy of a merger between open source database services firm SkySQL (which supports both MySQL and MariaDB deployments for customers ranging from Harvard to Shutterstock) and a company called Monty Program &#8212; yes, as in Monty Widenius, who named MySQL after his oldest daughter My and its fork after his younger daughter, Maria. </p>
<p>So now we have Widenius and other ex-MySQLers such as Colin Charles back together with players such as MySQL co-founder David Axmark and former MySQL sales director Magnus Stenberg. Actually, that&#8217;s underselling the magnitude of what&#8217;s happened here: out of the 70 employees of the fused operation (which is continuing under the SkySQL name), 50 used to be at the original MySQL firm. </p>
<h2 id="open-appeal">Open appeal</h2>
<p>At the same time, MariaDB seems to be capitalizing on the <a href="http://www.theregister.co.uk/2012/11/29/monty_oracle_eu_promises/">disillusionment</a> of some in the open source community with Oracle&#8217;s stewardship of MySQL &#8212; doing things like releasing extensions for the commercial version but not the free version was never going to win favor in that scene. Wikipedia <a href="http://blog.wikimedia.org/2013/04/22/wikipedia-adopts-mariadb/">migrated to MariaDB</a> in the last few days, and the Fedora and OpenSUSE Linux distros will <a href="http://www.zdnet.com/oracle-who-fedora-and-opensuse-will-replace-mysql-with-mariadb-7000010640/">both make the jump</a> in their next releases. </p>
<p>The MariaDB Foundation, which is <a href="http://blog.mariadb.org/mariadb-foundation-takes-next-steps-to-community-governance/">busy sorting out its governance structure</a> and which now claims SkySQL as an early member, also <a href="http://webmink.com/2013/04/18/taking-mariadb-foundation-forward/">took on</a> former Sun Chief Open Source Officer Simon Phipps as its CEO a week ago.</p>
<p>&#8220;It is a pleasure to have a company representing the reunited core team of our code base joining the Foundation at its inception,&#8221; Phipps said in a statement this week.</p>
<h2 id="mariadb-the-bridge">MariaDB the &#8220;bridge&#8221;</h2>
<p>The fused team has a unique NewSQL proposition: not only is MariaDB fully compatible with MySQL, but it can also interface with newer NoSQL databases such as Cassandra and LevelDB. According to SkySQL CEO Patrik Sallner, SkySQL will continue to service both MySQL and MariaDB customers and won&#8217;t be forcing anyone to jump to MariaDB &#8212; but he expects many customers to make that leap nonetheless:</p>
<blockquote id="quote-right-now-because-my"><p>&#8220;Right now, because MySQL belongs to Oracle, it&#8217;s not necessarily perceived as independent. Linux is the default operating system in most enterprise contexts. Oracle, IBM and Microsoft control the vast majority of business in databases and most companies have at least two of these, which are not compatible with each other. And, as companies deploy new applications, they use new [NoSQL] database technologies to meet their needs.</p>
<p>&#8220;We believe that MariaDB has an opportunity to become a truly independent and interoperable open source database, meaning we can provide a solution that&#8217;s a neutral ground for companies. &#8230; Our aspiration is to start building this into a new form of database platform that ties together other databases in a seamless manner. By providing a bridge, we believe we can create more innovation.&#8221;</p></blockquote>
<p>Sallner noted that there isn&#8217;t currently a great deal of difference between MySQL and MariaDB, apart from the latter&#8217;s &#8220;pluggable&#8221; approach to storage engines. &#8220;Using the SQL language allows us to be compatible with other databases, and we have a connect engine which allows us to add on-the-fly support for other data formats,&#8221; he said.</p>
<p>As a next step, Sallner said he hoped to see other database providers join the MariaDB Foundation, in order to maintain this open common ground. &#8220;We&#8217;re not competing against DB2 or Oracle or Microsoft today &#8212; we&#8217;re all serving different needs,&#8221; he said. So does he want to sign up Oracle itself? &#8220;That&#8217;ll be a stretch, but it would be a huge sign of success,&#8221; he laughed.</p>
<p>It&#8217;s not all bonhomie, though &#8212; Sallner reckons large internet companies will engage with MariaDB in a way that they haven&#8217;t with Oracle&#8217;s MySQL.</p>
<p>&#8220;We believe those companies are willing to contribute the work they&#8217;ve done back to MariaDB,&#8221; he said. &#8220;Facebook and Twitter have contributed substantial new features to MariaDB. They probably wouldn&#8217;t have contributed that to Oracle.&#8221;</p>
<p><em>UPDATE (10.55am PT): This piece originally and incorrectly stated that Widenius is the new SkySQL CTO, whereas he is in fact the CTO of the MariaDB Foundation. Widenius is on the board of SkySQL, but his role is non-operational.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=634013&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=604057"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=604057" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=634013+the-ex-mysql-gang-is-back-together-pushing-mariadb-as-a-neutral-bridge&utm_content=superglaze">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/11/breaking-down-barriers-and-reducing-cycle-times-with-devops-and-continuous-delivery/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=634013+the-ex-mysql-gang-is-back-together-pushing-mariadb-as-a-neutral-bridge&utm_content=superglaze">How devops can reduce cycle times</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=634013+the-ex-mysql-gang-is-back-together-pushing-mariadb-as-a-neutral-bridge&utm_content=superglaze">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2010/10/is-the-future-of-enterprise-completely-open-source/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=634013+the-ex-mysql-gang-is-back-together-pushing-mariadb-as-a-neutral-bridge&utm_content=superglaze">Is the Future of Enterprise Completely Open Source?</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/24/the-ex-mysql-gang-is-back-together-pushing-mariadb-as-a-neutral-bridge/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/04/mariadb.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/04/mariadb.jpg?w=150" medium="image">
			<media:title type="html">MariaDB</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/6599daccfd7e897e68744fe0065e5a2e?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">superglaze</media:title>
		</media:content>
	</item>
		<item>
		<title>How HBase converted MySpace&#8217;s MySQL champion and is driving Hadoop mainstream</title>
		<link>http://gigaom.com/2013/04/22/how-hbase-converted-myspaces-mysql-champion-and-is-driving-hadoop-mainstream/</link>
		<comments>http://gigaom.com/2013/04/22/how-hbase-converted-myspaces-mysql-champion-and-is-driving-hadoop-mainstream/#comments</comments>
		<pubDate>Mon, 22 Apr 2013 18:14:03 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Databases]]></category>
		<category><![CDATA[Gravity]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hbase]]></category>
		<category><![CDATA[Myspace]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[NoSQL]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=632738</guid>
		<description><![CDATA[ Gravity CTO Jim Benedetto knows his way around MySQL after managing a 600-instance cluster at MySpace, but he has found HBase religion as his real-time content-recommendation platform grew. And he's not alone.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=632738&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>How&#8217;s this for an understatement: Operational databases are important for many, if not the majority, of web applications. And if you&#8217;re doing big business on the web, finding one that can scale with your data volumes and still perform like you need it to is critical. MapReduce for batch data processing and analysis? Not so much, actually.</p>
<p>That&#8217;s why as Hadoop keeps <a href="http://gigaom.com/2013/03/04/the-history-of-hadoop-from-4-nodes-to-the-future-of-data/">thundering toward its destination as the de facto data platform</a> for next-generation applications, companies such as Cloudera and Hortonworks that are making a killing off it might want to stop and thank <a href="http://www.searchenginecaffe.com/2007/05/hbase-powersets-bigtable.html">the guys from Powerset for building HBase</a>. Because the database &#8212; <a href="http://hbase.apache.org/">a columnar Google BigTable clone that runs on top of the Hadoop Distributed File System</a> &#8212; is so fast and scalable, it&#8217;s helping Hadoop find a home in companies and with applications that HDFS and MapReduce alone might not have been able to penetrate so easily.</p>
<p>The latest HBase user I&#8217;ve come across is <a href="http://www.gravity.com/">Gravity</a>, the <a href="http://gigaom.com/2012/03/15/the-personalized-web-is-just-an-interest-graph-away/">interest graph</a> company that powers content recommendations for some of the biggest publishers on the web.</p>
<h2 id="from-big-mysql-at-myspace-to-b">From big MySQL at MySpace to big data with HBase</h2>
<p>Its co-founders were all senior executives at MySpace, including Gravity CTO Jim Benedetto, who was SVP of technology for the social networking pioneer. He was actually MySpace&#8217;s first architect and helped build platform&#8217;s MySQL database. Although MySpace never reached <a href="http://gigaom.com/2011/12/06/facebook-shares-some-secrets-on-making-mysql-scale/">Facebook&#8217;s scale</a>, it did have 150 millions users at its peak, all able to store unlimited numbers of wall posts, messages and photos. Benedetto eventually oversaw a 600-instance cluster that required about 30 database adminstrators to keep it up and running.</p>
<div id="attachment_603574" class="wp-caption alignleft" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/01/1z5o2256.jpg"><img  alt="Structure Data 2012: Jim Benedetto – CTO, Gravity Ashlie Beringer – Partner, Gibson, Dunn &amp; Crutcher" src="http://gigaom2.files.wordpress.com/2013/01/1z5o2256.jpg?w=300&#038;h=200" width="300" height="200" class="size-medium wp-image-603574" /></a><p class="wp-caption-text">Benedetto (center) at Structure: Data 2012. (c) Pinar Ozger</p></div>
<p>So naturally, when it came time to build out the Gravity architecture, Benedetto opted for the MySQL he knew so well. Until about three years ago, he told me recently, that database held about 95 percent of the company&#8217;s data. At some point, though, Benedetto and his team realized they were spending way too much time keeping their MySQL environment up insteading of building new things, so it was time for a change.</p>
<p>It ultimately opted for HBase, but the decision wasn&#8217;t easy. &#8220;For us,&#8221; Benedetto said, &#8220;our data and algorithms are our company,&#8221; so making the move from a relational database to a column-based database that can serve MapReduce jobs was nerve-racking. After all, he explained, &#8220;You never want to migrate your data &#8230; and if you have to, you never want to migrate it more than once.&#8221; In fact, he added, &#8220;you&#8217;re not going back.&#8221;</p>
<p>But Benedetto says the move to HBase as Gravity&#8217;s primary data store has been &#8220;life-saving,&#8221; and it&#8217;s arguably a more important component of the company&#8217;s infrastructure than is Hadoop MapReduce. HBase handles the company&#8217;s real-time recommendation algorithms, and it does it across the entire Gravity platform rather than on a site-by-site basis. And although it&#8217;s not banking-grade when it comes to the consistency of transactions, Benedetto says it&#8217;s about 99.95 percent consistent in real time. Later on, batch MapReduce jobs swoop in and pick up whatever HBase dropped earlier, and process it all against the company&#8217;s graph algorithms.</p>
<div id="attachment_633095" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/04/canvas-copy.jpg"><img  alt="interest graph" src="http://gigaom2.files.wordpress.com/2013/04/canvas-copy.jpg?w=708&#038;h=708" width="708" height="708" class="size-large wp-image-633095" /></a><p class="wp-caption-text">An example of an interest graph from Gravity,</p></div>
<h2 id="scalable-for-sure-and-getting-">Scalable for sure, and getting easier to use</h2>
<p>And although it took some serious engineering effort to get HBase operational when Gravity began working with it three years ago, Benedetto thinks HBase is getting to the point (as is rival NoSQL database Cassandra, he acknowledged) where one could safely call it &#8220;enterprise-ready.&#8221; Right now, he noted, &#8220;You&#8217;re not gonna to see HBase in a company that just buys Oracle because Oracle is the name and Oracle has been around for 20 years,&#8221; but for web startups that hope to reach a certain scale and even for existing companies that are running into the MySQL wall, he sees a shift occurring.</p>
<p>&#8220;The web farm is the easiest part of your infrastructure to scale because all it does is cost more money,&#8221; Benedetto explained. Databases, on the other hand, require a lot of thinking about how to migrate data, shard the database and otherwise make a piece of software likely designed for a handful of servers, max, spread across dozens or hundreds. HBase really eases the scaling process, as well as the subsequent management, he said. Now, Gravity&#8217;s 100-node HBase cluster has only two operations engineers dedicated to it.</p>
<p>Indeed, there are startups trying to capitalize on HBase by <a href="http://gigaom.com/2013/03/19/drawn-to-scale-wants-to-solve-your-mongodb-scalability-problems/">using it to power SQL and even MongoDB-compliant databases</a> that can scale beyond what most relational databases can do.</p>
<p>Aside from scale HBase might soon start catching on because of the work companies like Gravity have been doing to make it more user-friendly. It might scale easily, but, as Benedetto noted, it&#8217;s not always easy to get started with &#8212; especially without some deep understanding of the intricacies of the underlying HDFS infrastructure. Last year, eBay VP of Experience, Search and Platforms Hugh Williams <a href="http://gigaom.com/2012/01/31/under-the-covers-of-ebays-big-data-operation/">told me that although HBase is one of the big data tools the company is most excited about</a>, it&#8217;s also the area where he&#8217;d like to see the most improvement.</p>
<p>To help alleviate some of the learning curve, Gravity has <a href="http://www.gravity.com/labs/hpaste/">developed an open-source tool called HPaste</a> that lets developers access data and run jobs on HBase data using Scala rather than the more-bloated Java programming language on which Hadoop and HBase are built. One of the biggest benefits of HPaste, Benedetto said, is that it lets new HBase developers see the data in a way that makes sense to them: HBase stores everything in byte arrays, he explained, and &#8220;when a human tries to read a byte array, it looks like ancient hieroglyphics.&#8221;</p>
<div id="attachment_633093" class="wp-caption alignright" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/04/kiji-org-architecture1.png"><img  alt="Kiji architecture" src="http://gigaom2.files.wordpress.com/2013/04/kiji-org-architecture1.png?w=300&#038;h=275" width="300" height="275" class="size-medium wp-image-633093" /></a><p class="wp-caption-text">The Kiji architecture</p></div>
<p>Elsewhere, a startup called WibiData has <a href="http://gigaom.com/2012/11/14/wibidata-open-sources-kiji-to-make-hbase-more-useful/">created an open-source framework called Kiji</a> that aims to provide a collection of high-level APIs that should make it easier to store different data types in and develop applications on HBase. The company envisions Kiji being to HBase what the Spring Framework has become to Java over the course of the past decade.</p>
<h2 id="hadoops-weapon-for-the-mainstr">Hadoop&#8217;s weapon for the mainstream?</h2>
<p>But user experience aside, a lot of companies already invested in Hadoop &#8212; aside from <a href="http://gigaom.com/2011/03/04/how-facebook-is-powering-real-time-analytics/">expert users such as Facebook</a> &#8212; are starting to see the promise of HBase and are incorporating it into their architectures.</p>
<p>WibiData co-founder Christophe Bisciglia, who also co-founded Hadoop pioneer Cloudera in 2008, gave me his take on the state of HBase while <a href="http://gigaom.com/2013/03/12/hadoops-past-present-and-future-a-gigaom-special-report/">discussing its role in the future of Hadoop</a> earlier this year. &#8221;If you talk to anyone from Cloudera or any of the platform vendors, I think they will tell you that a large percentage of their customers use HBase. It’s something that I only expect to see increasing,&#8221;  he explained. &#8220;&#8230; HBase is gonna be what takes Hadoop from an ETL and BI platform into a real-time application platform.&#8221;</p>
<div id="attachment_633120" class="wp-caption alignleft" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/04/cloudera_enterprise_diagram.png"><img  alt="The Cloudera Hadoop stack (Gravityu uses Cloudera's distro)." src="http://gigaom2.files.wordpress.com/2013/04/cloudera_enterprise_diagram.png?w=300&#038;h=165" width="300" height="165" class="size-medium wp-image-633120" /></a><p class="wp-caption-text">The Cloudera Hadoop stack (Gravity uses Cloudera&#8217;s distro).</p></div>
<p>Benedetto appears to agree. He considers Hadoop as a whole incredibly important, almost on par with what Amazon Web Services did for computing resources, because it lets startups use commercial-grade open source software to do data storage and processing that previously was only available to massive web companies. &#8220;More and more &#8230; the shining star in that suite is HBase,&#8221; he said. &#8220;If I were Oracle, I&#8217;d be scared.&#8221;</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=632738&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=446793"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=446793" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=632738+how-hbase-converted-myspaces-mysql-champion-and-is-driving-hadoop-mainstream&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=632738+how-hbase-converted-myspaces-mysql-champion-and-is-driving-hadoop-mainstream&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/report/how-to-use-big-data-to-make-better-business-decisions/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=632738+how-hbase-converted-myspaces-mysql-champion-and-is-driving-hadoop-mainstream&utm_content=dharrisstructure">How to use big data to make better business decisions</a></li><li><a href="http://pro.gigaom.com/2011/04/infrastructure-q1-iaas-comes-down-to-earth-big-data-takes-flight/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=632738+how-hbase-converted-myspaces-mysql-champion-and-is-driving-hadoop-mainstream&utm_content=dharrisstructure">Infrastructure Q1: IaaS Comes Down to Earth; Big Data Takes Flight</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/22/how-hbase-converted-myspaces-mysql-champion-and-is-driving-hadoop-mainstream/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/10/shutterstock_113600470.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/10/shutterstock_113600470.jpg?w=150" medium="image">
			<media:title type="html">Shiny database</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/01/1z5o2256.jpg?w=300" medium="image">
			<media:title type="html">Structure Data 2012: Jim Benedetto – CTO, Gravity Ashlie Beringer – Partner, Gibson, Dunn &#38; Crutcher</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/canvas-copy.jpg?w=708" medium="image">
			<media:title type="html">interest graph</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/kiji-org-architecture1.png?w=300" medium="image">
			<media:title type="html">Kiji architecture</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/cloudera_enterprise_diagram.png?w=300" medium="image">
			<media:title type="html">The Cloudera Hadoop stack (Gravityu uses Cloudera&#039;s distro).</media:title>
		</media:content>
	</item>
		<item>
		<title>MarkLogic nets $25M to keep up enterprise NoSQL pitch</title>
		<link>http://gigaom.com/2013/04/10/marklogic-nets-25m-to-keep-up-enterprise-nosql-pitch/</link>
		<comments>http://gigaom.com/2013/04/10/marklogic-nets-25m-to-keep-up-enterprise-nosql-pitch/#comments</comments>
		<pubDate>Wed, 10 Apr 2013 11:30:53 +0000</pubDate>
		<dc:creator>Jordan Novet</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[marklogic]]></category>
		<category><![CDATA[NoSQL]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=629466</guid>
		<description><![CDATA[MarkLogic has raised $25 million in new venture funding to add more customers for its NoSQL database. It wants to go after companies that have looked to longtime software vendors for relational solutions.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=629466&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>When MarkLogic Founder Christopher Lindblad started working on a database for unstructured data in 2001, his efforts were prescient. Since then, the database market has since seen a proliferation of non-relational, or NoSQL, startups to handle the wide variety of data types that new data sources such as web applications and digital documents generate. The space has grown so big, in fact, that it has <a href="http://gigaom.com/2013/03/21/no-not-every-database-was-created-equal-heres-how-theyre-stand-out/2/">already started to consolidate</a>. Amid all this, MarkLogic has managed to stand out by generating more revenue than pretty much any other vendor, according to <a href="http://wikibon.org/w/images/2/21/Forecast-BigDataDatabasebyVendor.png">figures</a> Wikibon released in February.</p>
<p>On Wednesday, MarkLogic&#8217;s success was validated again, as the company announced a $25 million round of venture funding, bringing the total it has raised to $71.2 million. Sequoia Capital and Tenaya Capital led the round; CEO Gary Bloom and other MarkLogic executives also contributed.</p>
<p>MarkLogic like to tout the fact that it&#8217;s geared for enterprise use. Features such as high availability, replication, clustering and ACID compliance help differentiate the company from other NoSQL databases, Bloom told me. And although the company is taking in revenue and looks robust enough to <a href="http://gigaom.com/2011/04/05/with-a-new-ceo-marklogic-eyes-big-data-ipo/">go public</a> now, Bloom said he would rather boost revenues to the point that MarkLogic could sustain success after an IPO.</p>
<p>Rather than go after the revenues that open-source NoSQL databases generate, Bloom said he wants to take away database marketshare from legacy companies peddling SQL databases, including IBM, SAP and Bloom&#8217;s previous employer, Oracle. That means MarkLogic salespeople will have to convince slower-to-change enterprises on the reality that relational databases might not be the best choice if they want to take advantage of unstructured data. MarkLogic also will have to put up with fellow NoSQL players that are adding enterprise functions, such as <a href="http://gigaom.com/2013/03/19/10gen-rolls-out-new-features-to-woo-more-enterprises-to-mongodb/">MongoDB</a>,</p>
<p>But if MarkLogic&#8217;s plan turns out to be fruitful, a public offering could come within a year or two, Bloom said.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=629466&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=789477"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=789477" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=629466+marklogic-nets-25m-to-keep-up-enterprise-nosql-pitch&utm_content=gigajordan">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2013/01/cloud-and-data-fourth-quarter-2012-analysis/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=629466+marklogic-nets-25m-to-keep-up-enterprise-nosql-pitch&utm_content=gigajordan">The fourth quarter of 2012 in cloud</a></li><li><a href="http://pro.gigaom.com/2011/03/putting-big-data-to-work-opportunities-for-enterprises/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=629466+marklogic-nets-25m-to-keep-up-enterprise-nosql-pitch&utm_content=gigajordan">Putting Big Data to Work: Opportunities for Enterprises</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=629466+marklogic-nets-25m-to-keep-up-enterprise-nosql-pitch&utm_content=gigajordan">A near-term outlook for big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/10/marklogic-nets-25m-to-keep-up-enterprise-nosql-pitch/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/04/marklogic-ceo.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/04/marklogic-ceo.jpg?w=150" medium="image">
			<media:title type="html">MarkLogic CEO</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/c00ab753df107b639e76ed4c3ab07ba7?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigajordan</media:title>
		</media:content>
	</item>
	</channel>
</rss>