<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>GigaOM &#187; open source</title>
	<atom:link href="http://gigaom.com/tag/open-source/feed/" rel="self" type="application/rss+xml" />
	<link>http://gigaom.com</link>
	<description></description>
	<lastBuildDate>Sun, 19 May 2013 01:01:13 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='gigaom.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/0db8f6557d022075dbbf010c54d46d93?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>GigaOM &#187; open source</title>
		<link>http://gigaom.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://gigaom.com/osd.xml" title="GigaOM" />
	<atom:link rel='hub' href='http://gigaom.com/?pushpress=hub'/>
		<item>
		<title>We&#8217;re witnessing the rise of the graph in big data</title>
		<link>http://gigaom.com/2013/05/14/were-witnessing-the-rise-of-the-graph-in-big-data/</link>
		<comments>http://gigaom.com/2013/05/14/were-witnessing-the-rise-of-the-graph-in-big-data/#comments</comments>
		<pubDate>Tue, 14 May 2013 14:33:33 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[graph analysis]]></category>
		<category><![CDATA[graph database]]></category>
		<category><![CDATA[GraphLab]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=645059</guid>
		<description><![CDATA[Graph databases and graph-processing applications have been popping up all over the place lately, and now they're starting to go commercial. On Tuesday, popular open source project GraphLab joined the ranks of graph startups.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=645059&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>GraphLab, a popular <a href="http://graphlab.org/">open source project</a> dedicated to graph analysis and machine learning, is trying to capitalize on the excitement around graphs by spinning off a commercial entity, <a href="http://graphlab.com/">GraphLab Inc.</a> GraphLab creator &#8212; and University of Washington machine learning professor &#8212; Carlos Guestrin will lead the new Seattle-based company, which has raised $6.75 million from Madrona Venture Group and NEA.</p>
<p>Graph analysis is among the hottest techniques around for making sense of large datasets, primarily by determining how tightly different data points are related or how similar they are. The term &#8220;graph&#8221; came into the broader lexicon along with social networks, which built social graphs to <a href="http://gigaom.com/2013/03/14/facebook-tweaks-its-algorithms-to-improve-graph-search-comment-search-coming/">assess the relationships among their millions of users</a>, but the technique has much broader uses.</p>
<div id="attachment_645089" class="wp-caption aligncenter" style="width: 677px"><a href="http://gigaom2.files.wordpress.com/2013/05/lnkdmap-1.jpg"><img  alt="My LinkedIn social graph" src="http://gigaom2.files.wordpress.com/2013/05/lnkdmap-1.jpg?w=708"   class="size-full wp-image-645089" /></a><p class="wp-caption-text">My LinkedIn social graph</p></div>
<p>Guestrin said GraphLab&#8217;s algorithms are used in a lot of recommender systems, but he also cites fraud detection in banking networks and intrusion detection in computer networks as potential applications. We&#8217;ve covered graphs as the analytical model of choice for everything <a href="http://gigaom.com/2013/04/22/how-hbase-converted-myspaces-mysql-champion-and-is-driving-hadoop-mainstream/">from content recommendation</a> to <a href="http://gigaom.com/2013/01/22/biotech-startup-syapse-wants-to-be-salesforce-com-for-our-genomes/">tracking lab work in genomics</a>. Really, though &#8212; especially when combined with machine learning &#8212; graph analysis <a href="http://gigaom.com/2013/01/16/has-ayasdi-turned-machine-learning-into-a-magic-bullet/">can be applied to anything</a> where there&#8217;s too much data for a person to possibly analyze the relationships between every point.</p>
<div id="attachment_601469" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/01/ayasdi-product-image-2-e1358295341371.jpg"><img  alt="One of Ayasdi's graph-like data maps" src="http://gigaom2.files.wordpress.com/2013/01/ayasdi-product-image-2-e1358295341371.jpg?w=708&#038;h=472" width="708" height="472" class="size-large wp-image-601469" /></a><p class="wp-caption-text">One of Ayasdi&#8217;s graph-like data maps</p></div>
<p>Google also famously uses <a href="http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html">a graph-processing system called Pregel</a> as part of PageRank. Although a number of graph databases and other projects have popped up in the past few years, Guestrin said GraphLab is actually a contemporary of Pregel. He and some colleagues at Carnegie Mellon built a small system for their lab about five years ago, then released it into the open-source world with few expectations that it would catch on. Now, he added, Pandora and WalmartLabs are among the project&#8217;s user base.</p>
<p>Among those other projects are graph databases such as <a href="http://giraph.apache.org/">Giraph</a> (an open source, Hadoop-based Pregel clone developed at Facebook) and <a href="http://www.neo4j.org/">Neo4j</a> (which also has a commercial arm, <a href="http://gigaom.com/2012/11/02/graph-startup-neo-raises-11m-as-specialized-databases-take-hold/">called Neo Technology</a>), as well as <a href="http://engineering.twitter.com/2012/03/cassovary-big-graph-processing-library.html">Twitter&#8217;s Cassovary</a> and fellow University of Washington project <a href="http://www.cs.washington.edu/node/4217/">Grappa</a>. Guestrin said GraphLab can work with most of them, particularly if they&#8217;re not designed to do machine learning at scale like GraphLab is. Some efforts, he noted, are focused on simply storing data in graph form (e.g., databases) or in providing simple graph analysis.</p>
<p>As for when we&#8217;ll actually see the results of the effort to commercialize GraphLab, Guestrin said it will be a while. Right now, he&#8217;s focused on the next open source release of GraphLab in July. However, the company will begin engaging with commercial users over the next several months to determine what types of features they would expect in commercial graph-analysis software.</p>
<p>The bigger question to come out of all this graph activity, though, is how big a market we&#8217;ll ultimately see for graph-analysis or any other specific technique. As companies get more comfortable with big data from a technical standpoint, they&#8217;re getting more interested in the different types of analysis it allows for too. This is evidenced by the <a href="http://gigaom.com/2013/03/07/5-reasons-why-the-future-of-hadoop-is-real-time-relatively-speaking/">quest to make Hadoop support myriad processing frameworks</a> aside from MapReduce.</p>
<p>We already have a handful of commercial graph products on the market &#8212; including an industrial grade one called <a href="http://www.yarcdata.com/">YarcData</a> from supercomputer maker Cray &#8212; but how many will there eventually be? And if graph analysis is all the rage right now, what comes next?</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=645059&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=786310"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=786310" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=645059+were-witnessing-the-rise-of-the-graph-in-big-data&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=645059+were-witnessing-the-rise-of-the-graph-in-big-data&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/01/12-tech-leaders-resolutions-for-2012/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=645059+were-witnessing-the-rise-of-the-graph-in-big-data&utm_content=dharrisstructure">12 tech leaders’ resolutions for 2012</a></li><li><a href="http://pro.gigaom.com/2011/11/connected-world-the-consumer-technology-revolution/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=645059+were-witnessing-the-rise-of-the-graph-in-big-data&utm_content=dharrisstructure">Connected world: the consumer technology revolution</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/14/were-witnessing-the-rise-of-the-graph-in-big-data/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/05/graphics2-3_final_cartoon.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/05/graphics2-3_final_cartoon.jpg?w=150" medium="image">
			<media:title type="html">graphics2-3_final_cartoon</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/lnkdmap-1.jpg" medium="image">
			<media:title type="html">My LinkedIn social graph</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/01/ayasdi-product-image-2-e1358295341371.jpg?w=708" medium="image">
			<media:title type="html">One of Ayasdi&#039;s graph-like data maps</media:title>
		</media:content>
	</item>
		<item>
		<title>The promise of better data has MetLife investing $300M in new tech</title>
		<link>http://gigaom.com/2013/05/07/with-300m-earmarked-for-tech-innovation-metlife-wants-to-remake-insurance/</link>
		<comments>http://gigaom.com/2013/05/07/with-300m-earmarked-for-tech-innovation-metlife-wants-to-remake-insurance/#comments</comments>
		<pubDate>Tue, 07 May 2013 14:00:31 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[10Gen]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[insurance industry]]></category>
		<category><![CDATA[MetLife]]></category>
		<category><![CDATA[MongoDB]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=642824</guid>
		<description><![CDATA[MetLife is building new products on new technologies thanks to a $300 million investment in new technology and new skills. One of the first products is a MongoDB-based app that puts all of customers' information in one place.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=642824&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>The insurance industry hasn&#8217;t always been a beacon of technological innovation. Then again, its major providers haven&#8217;t always earmarked $300 million for investments in new technology and new talent like MetLife has. The strategy has already borne its first fruit in the form of a new database system and application that lets the company see everything it knows about a customer in a single place.</p>
<p>The new application, called The Wall, is essentially a way to make the customer service experience more palatable for consumers and to lower the burden of hiring new representatives. Because it&#8217;s designed to look and function like Facebook, MetLife CIO and SVP of Regional Application Development Gary Hoberman told me, The Wall means new hires don&#8217;t have to be trained on complex enterprise call center software. For customers calling MetLife to discuss a claim or their coverage, it means fewer annoying waits as an agent accesses data from any of dozens of different places.</p>
<p>&#8220;Instead of seeing what someone had for dinner, [The Wall is] all a customer&#8217;s transactions,&#8221; Hoberman said. Claims, records, status, possible cross-sell information (e.g., if someone lives in an apartment and might need renter&#8217;s insurance) &#8212; it&#8217;s all in there. Looking forward, he said, it might even contain other publicly available information from social media and certain mobile apps that would give the company even greater visibility into its customers&#8217; lives.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/05/metlife-screen-shot_active-contract.png"><img  alt="MetLife Screen Shot_Active Contract" src="http://gigaom2.files.wordpress.com/2013/05/metlife-screen-shot_active-contract.png?w=708&#038;h=720" width="708" height="720" class="aligncenter size-large wp-image-642871" /></a></p>
<h2 id="up-and-running-in-3-months-on-">Up and running in 3 months, on MongoDB</h2>
<p>From a business perspective, though, the most-impressive part of The Wall is how quickly it was implemented and what a divergence from classic large-enterprise IT practices it represents. For Hoberman, who spent 16 years at Citi before joining MetLife in mid-2012, the process was eye-opening. If you told someone in the financial services industry that it would take just five days to get servers up and running for the prototype of such a big application, he said, &#8220;they&#8217;d look at you like you had two heads.&#8221;</p>
<p>But that&#8217;s exactly what MetLife did. In fact, it had the entire prototype built just two weeks after devising it and the production system up and running in just three months. It came together so fast because of MetLife&#8217;s new focus on cutting-edge IT and clear mission to build a useful product rather than, as Hoberman put it, &#8220;doing big data for big data&#8217;s sake.&#8221; The tech team was willingly working nights and weekends and the leadership team was directly involved because everyone understood what a fundamental change the application could have on the business.</p>
<p>&#8220;In insurance,&#8221; Hoberman said, &#8220;&#8230; working in months, not years, is really a startup mentality.&#8221;</p>
<p>How big an undertaking was it? Built atop MongoDB, The Wall brings together data from more than 70 legacy systems and merges it into a single record. It runs across six servers in two data centers and presently stores about 24 terabytes of data. That includes MetLife&#8217;s entire U.S. customer base (some 45 million agreements in total), although the goal is to expand it to international customers and multiple languages, as well, and maybe even create a customer-facing version. It updates in near real time, just like the Facebook wall, as new customer data is entered.</p>
<p>Building a production database system on NoSQL technology isn&#8217;t commonplace in insurance or other large industries, but it was about the only way to pull this off. Going with the relational model, Hoberman explained, would have meant figuring out a common set of schema across such a wide range of products (insurance products and terms vary from state to state and country to country) that it would have been nearly impossible to actually achieve that coveted 360-degree customer view. MongoDB let Hoberman&#8217;s team build some light schema to give the app order, but to be able to take in all the data it had available.</p>
<h2 id="bringing-in-new-tech-and-new-b">Bringing in new tech, and new blood</h2>
<p>This is only a part of what MetLife is doing with new information technologies, though, and only a fraction of what it wants to do. With The Wall, specifically, MetLife Hoberman wants to build next-best action models that will give agents guidance on how to best deal with customers. Elsewhere, the company has already used its new centralized MongoDB system to build models for predicting attrition, and it&#8217;s using Hadoop and HBase for some other workloads where they&#8217;re a better fit.</p>
<p>It&#8217;s all thanks to a company mandate to save $450 million from its bloated technology and operations budget and then invest two-thirds of it back into new technology. &#8220;We literally have a $300 million investment to decide what&#8217;s going to be the future of MetLife,&#8221; Hoberman said. It&#8217;s kind of like being in a startup, he added, only with the resources to make sure everything is done right (much <a href="http://gigaom.com/2012/09/16/how-disney-built-a-big-data-platform-on-a-startup-budget/">like with other large enterprises embracing open source</a>, Hoberman&#8217;s team prototyped The Wall using open source MongoDB but brought in <a href="http://gigaom.com/2013/04/09/mongodb-ftw-fast-growing-10gen-hires-first-cfo/">10gen</a> when it came time to build a production system).</p>
<p>It might be easy to mock that statement, except that Hoberman and his peers are putting their money where their mouths are by bringing in new talent, as well. It&#8217;s setting up a team in the Research Triangle region of North Carolina and bringing in employees with expertise in areas such as social, mobile and big data. And Hoberman is far less concerned with specific technical skills than he is with motivation.</p>
<p>It&#8217;s all about &#8220;attitude and aptitude,&#8221; he said. &#8220;They can learn anything.&#8221;</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=642824&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=31878"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=31878" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=642824+with-300m-earmarked-for-tech-innovation-metlife-wants-to-remake-insurance&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2013/01/cloud-and-data-fourth-quarter-2012-analysis/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=642824+with-300m-earmarked-for-tech-innovation-metlife-wants-to-remake-insurance&utm_content=dharrisstructure">The fourth quarter of 2012 in cloud</a></li><li><a href="http://pro.gigaom.com/2012/11/unlocking-big-datas-potential-with-search/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=642824+with-300m-earmarked-for-tech-innovation-metlife-wants-to-remake-insurance&utm_content=dharrisstructure">How search can unlock the power of big data</a></li><li><a href="http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=642824+with-300m-earmarked-for-tech-innovation-metlife-wants-to-remake-insurance&utm_content=dharrisstructure">Takeaways from the second quarter in cloud and data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/07/with-300m-earmarked-for-tech-innovation-metlife-wants-to-remake-insurance/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/05/metlife-screen-shot_active-contract1-e1367933585875.png?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/05/metlife-screen-shot_active-contract1-e1367933585875.png?w=150" medium="image">
			<media:title type="html">MetLife Screen Shot_Active Contract</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/metlife-screen-shot_active-contract.png?w=708" medium="image">
			<media:title type="html">MetLife Screen Shot_Active Contract</media:title>
		</media:content>
	</item>
		<item>
		<title>MapR releases M7, its commercial HBase distro</title>
		<link>http://gigaom.com/2013/05/01/mapr-releases-m7-its-commercial-hbase-distro/</link>
		<comments>http://gigaom.com/2013/05/01/mapr-releases-m7-its-commercial-hbase-distro/#comments</comments>
		<pubDate>Wed, 01 May 2013 23:21:07 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Databases]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hbase]]></category>
		<category><![CDATA[Mapr]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=641425</guid>
		<description><![CDATA[MapR on Wednesday released its commercial version of HBase called M7, the first such product on the market, that the company claims is bigger, faster and better than the open source version.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=641425&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>MapR didn&#8217;t miss the memo about the key to success in the Hadoop space being the creation of a data platform that can do many things. And on Wednesday, the company released its take on HBase, <a href="http://www.mapr.com/products/mapr-editions/m7-edition">called M7.</a></p>
<p>Last week, I <a href="http://gigaom.com/2013/04/22/how-hbase-converted-myspaces-mysql-champion-and-is-driving-hadoop-mainstream/">explained how HBase is fast becoming the star of the Hadoop ecosystem</a> because it allows users to build more real-time, almost transactional applications on top of Hadoop. True to its form with its other products, MapR has taken HBase even further with M7 by promising greater availability (99.999 percent), instant recovery, faster operations and the ability to handle 1 trillion tables in a single cluster. In open source versions of HBase, MapR VP of Marketing Jack Norris told me, the accepted table limit per cluster is several hundred.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/05/m7.jpg"><img  alt="m7" src="http://gigaom2.files.wordpress.com/2013/05/m7.jpg?w=300&#038;h=265" width="300" height="265" class="alignright size-medium wp-image-641471" /></a>Additionally, M7 shares a single data layer with the Hadoop file system, meaning less performance overhead and, presumably, easier management.</p>
<p>As we&#8217;re seeing with other Hadoop vendors, including Cloudera (which <a href="http://gigaom.com/2013/04/30/with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market/">released its Impala SQL query engine on Tuesday</a>), the Hadoop market is fast becoming one where each vendor is trying to set itself apart from the rest by building the best platform with the broadest set of capabilities. In furtherance of that mission, MapR also announced on Wednesday full-text search on its Hadoop distribution thanks to a partnership with Lucene specialist LucidWorks. It already has its own Hadoop distribution complete with proprietary code to bolster the file system and speed up MapReduce, as well as an <a href="http://gigaom.com/2012/08/17/for-fast-interactive-hadoop-queries-drill-may-be-the-answer/">open source SQL-on-Hadoop project called Drill</a> in the works.</p>
<p>MapR employees are probably sleeping a lot easier these days as a result of this platform push. Others in the Hadoop market used to talk about the fear of fragmentation and then point at MapR as the example of a company helping foment that outcome with its proprietary software. Now, however, even if everyone else is building open source products, they&#8217;re all still backing their own and largely dismissing the others.</p>
<p>I suspect the result is feature lock-in even there&#8217;s no technological lock-in, kind of <a href="http://gigaom.com/2011/03/16/how-amazon-is-following-apples-lead-to-rule-cloud-computing/">like using Amazon Web Services for cloud computing</a> and then hoping to replicate its various servies elsewhere. It might be easy enough to move your data, but impossible or very difficult to replicate those additional capabilities elsewhere. If MapR can build a better version of HBase and companies are willing to pay for it, then so be it.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=641425&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=745118"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=745118" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=641425+mapr-releases-m7-its-commercial-hbase-distro&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=641425+mapr-releases-m7-its-commercial-hbase-distro&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2011/03/defining-hadoop-the-players-technologies-and-challenges-of-2011/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=641425+mapr-releases-m7-its-commercial-hbase-distro&utm_content=dharrisstructure">Defining Hadoop: the Players, Technologies and Challenges of 2011</a></li><li><a href="http://pro.gigaom.com/2012/12/big-data-2013-key-trends-and-companies-to-watch/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=641425+mapr-releases-m7-its-commercial-hbase-distro&utm_content=dharrisstructure">Big data 2013: key trends and companies to watch</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/01/mapr-releases-m7-its-commercial-hbase-distro/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/09/shutterstock_110961494.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/09/shutterstock_110961494.jpg?w=150" medium="image">
			<media:title type="html">Database rows</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/m7.jpg?w=300" medium="image">
			<media:title type="html">m7</media:title>
		</media:content>
	</item>
		<item>
		<title>With Impala now GA, Cloudera&#8217;s CEO sizes up the SQL-on-Hadoop market</title>
		<link>http://gigaom.com/2013/04/30/with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market/</link>
		<comments>http://gigaom.com/2013/04/30/with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market/#comments</comments>
		<pubDate>Tue, 30 Apr 2013 13:00:40 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[data warehouse]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[hive]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[Impala]]></category>
		<category><![CDATA[Mapr]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[SQL on Hadoop]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=640777</guid>
		<description><![CDATA[Cloudera's Impala engine for interactive SQL queries on Hadoop data is now generally available, and CEO Mike Olson gives his lay of the competitive landscape.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=640777&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>There is no shortage of confidence in the Hadoop space, and market leader Cloudera bolstered its own on Tuesday with the general availability of its Impala SQL query engine for Hadoop. And if CEO Mike Olson&#8217;s comments are any indication, we&#8217;re in for a long ride of competitive jockeying and oneupmanship as Cloudera and its peers go all Microsoft or Google and create myriad new data-processing engines to turn their Hadoop distributions into bona fide platforms.</p>
<p>Launched as a private beta in May 2012 and <a href="http://gigaom.com/2012/10/24/cloudera-makes-sql-a-first-class-citizen-in-hadoop/">made public in October</a>, Impala is Cloudera&#8217;s attempt to address the growing demand for interactive SQL analytics on Hadoop data. It&#8217;s essentially a massively parallel database designed to share the same storage platform and metadata as Hadoop MapReduce, only it is its own separate processing engine.</p>
<div id="attachment_640848" class="wp-caption alignleft" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/04/impala-arch-new.jpg"><img  alt="How Impala fits in" src="http://gigaom2.files.wordpress.com/2013/04/impala-arch-new.jpg?w=300&#038;h=257" width="300" height="257" class="size-medium wp-image-640848" /></a><p class="wp-caption-text">How Impala fits in</p></div>
<p>Impala actually uses the same &#8220;nearly ANSI&#8221; version of SQL as does current standard bearer Hive, but that technology (created by Facebook in 2009 as a data warehouse layer for Hadoop) doesn&#8217;t run nearly fast enough to sate many users&#8217; desire for interactive analytics. This is because Hive transforms SQL queries into MapReduce jobs, meaning every one is processed against the entire corpus of data in the Hadoop Distributed File System.</p>
<h2 id="sizing-up-the-competition">Sizing up the competition</h2>
<p>Only Cloudera isn&#8217;t the first to have the idea, <a href="http://gigaom.com/2013/02/21/sql-is-whats-next-for-hadoop-heres-whos-doing-it/">nor is it alone in trying to sell interactive SQL on Hadoop</a>. The idea was <a href="http://gigaom.com/2011/10/21/hadapt-raises-9-5m-for-hadoop-data-warehouse/">first commercialized by Boston-based startup Hadapt</a> in 2011, and is now being pushed by numerous startups and larger Hadoop players. Among them: Pivotal (formerly EMC) Greenplum, MapR (with <a href="http://gigaom.com/2012/08/17/for-fast-interactive-hadoop-queries-drill-may-be-the-answer/">Drill</a>), Hortonworks (with <a href="http://hortonworks.com/blog/100x-faster-hive/">Stinger</a>), Drawn to Scale, Splice Machine, Jethro Data and Citus Data.</p>
<div id="attachment_640858" class="wp-caption aligncenter" style="width: 600px"><a href="http://gigaom2.files.wordpress.com/2013/04/had_graphic2-scaled.jpg"><img  alt="Hadapt's architecture" src="http://gigaom2.files.wordpress.com/2013/04/had_graphic2-scaled.jpg?w=708"   class="size-full wp-image-640858" /></a><p class="wp-caption-text">Hadapt&#8217;s architecture</p></div>
<p>But Cloudera is arguably the biggest name pushing SQL on Hadoop, and CEO Mike Olson thinks Impala stands out for several reasons &#8212; not the least of which is that it exists as a product. &#8220;Nobody else is shipping production-grade SQL query support on Hadoop,&#8221; he told me during a recent call. &#8220;At least not in open source.&#8221; He seems content to let the startups do their things, instead focusing his attention on Cloudera&#8217;s big three Hadoop-distribution competitors in Pivotal, MapR and Hortonworks. Greenplum and Pivotal SVP Scott Yara <a href="http://gigaom.com/2013/02/25/emc-to-hadoop-competition-see-ya-wouldnt-wanna-be-ya/">was full of confidence &#8212; and R&amp;D budget</a>&#8211; when the company announced the Pivotal HD distribution and HAWQ technology in February, but Olson claims the approach requires a siloed DBMS within HDFS and is a &#8220;rearguard defensive strategy&#8221; to protect the company&#8217;s sunk costs in its database technology.</p>
<div id="attachment_615210" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/02/hawq1.jpg"><img  alt="The Pivotal HD and Hawq architecture" src="http://gigaom2.files.wordpress.com/2013/02/hawq1.jpg?w=708&#038;h=387" width="708" height="387" class="size-large wp-image-615210" /></a><p class="wp-caption-text">The Pivotal HD and Hawq architecture</p></div>
<p>As for Hortonworks, Olson questions the wisdom of its Stinger initiative to boost Hive&#8217;s speed, noting that &#8220;Hive never got good while it was running standalone on MapReduce.&#8221; Hortonworks also <a href="http://gigaom.com/2013/04/15/teradata-to-connect-hadoop-and-data-warehouses-roll-out-new-appliance/">partners with vendors such as Teradata</a> to let their platforms access Hadoop data in its native format, but those approaches still require sending data over the network. &#8220;It&#8217;s not the way you would build it if you woke up in the 2000s and were building this anew,&#8221; Olson said.</p>
<div id="attachment_640854" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/04/stingerroad.png"><img  alt="The Stinger roadmap" src="http://gigaom2.files.wordpress.com/2013/04/stingerroad.png?w=708&#038;h=558" width="708" height="558" class="size-large wp-image-640854" /></a><p class="wp-caption-text">The Stinger roadmap</p></div>
<p>Olson acknowledged that the MapR-led Apache Drill project is cut from the same cloth as Impala (that is, being a Google Dremel clone designed specifically for Hadoop), but &#8220;the difference is we&#8217;re shipping code.&#8221; Being generally available and ready for production workloads means Cloudera can lock down users and market share before many even have a chance to experiment with Drill. He all but dismissed questions over the readiness of Impala, spurred by rumblings in the Hadoop space that Cloudera rushed it into public beta in order to get on the scoreboard against more fully baked offerings.</p>
<p>&#8220;I don&#8217;t feel we&#8217;re under the gun competitively to pull it out of beta because no one else has product in the market,&#8221; Olson said. &#8220;I have no problems &#8230; calling this GA quality.&#8221; He did, however, acknowledge that Impala is shipping with a &#8220;minium viable feature set&#8221; that the company has plans to build on in the near future. Impala Senior Product Manager Justin Erickson noted a few issues of concern, including around the number of concurrent users Impala can support, but said they have been addressed during the beta period.</p>
<h2 id="one-piece-of-a-larger-platform">One piece of a larger platform</h2>
<p>Really, though, the whole point of Impala and its competitors is to turn Hadoop from a tool for batch analytics and mass storage <a href="http://gigaom.com/2013/03/07/5-reasons-why-the-future-of-hadoop-is-real-time-relatively-speaking/">into a platform that can handle nearly all of companies&#8217; data-processing needs</a>. In that regard, it appears we&#8217;re just getting started. Cloudera, MapR, Pivotal Greenplum and Hortonworks are already pushing their own products and projects, and Olson said &#8220;it&#8217;s absolutely our intent&#8221; to enhance Cloudera&#8217;s platform with even more open-source products &#8212; perhaps even more database technologies <a href="http://gigaom.com/2013/04/22/how-hbase-converted-myspaces-mysql-champion-and-is-driving-hadoop-mainstream/">a la HBase</a> &#8212; that will let users do more stuff with more types of data. Over time, this strategy could result in Hadoop displacing the current breed of databases and data warehouses and becoming the single data store atop of which users run whatever applications they so desire. For now, though, especially when it comes to Impala and the data warehouse incumbents, Olson is taking a measured approach. &#8220;The likelihood that we&#8217;re going to knock them off in the near term,&#8221; he said, &#8220;&#8230; it would be a tough fight to win.&#8221;</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=640777&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=567119"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=567119" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640777+with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640777+with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640777+with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2012/04/sector-roadmap-hadoop-platforms-2012/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640777+with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market&utm_content=dharrisstructure">2012: The Hadoop infrastructure market booms</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/30/with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/01/1z5o1503.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/01/1z5o1503.jpg?w=150" medium="image">
			<media:title type="html">Structure Data 2012: Michael Olson – CEO, Cloudera</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/impala-arch-new.jpg?w=300" medium="image">
			<media:title type="html">How Impala fits in</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/had_graphic2-scaled.jpg" medium="image">
			<media:title type="html">Hadapt&#039;s architecture</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/hawq1.jpg?w=708" medium="image">
			<media:title type="html">The Pivotal HD and Hawq architecture</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/stingerroad.png?w=708" medium="image">
			<media:title type="html">The Stinger roadmap</media:title>
		</media:content>
	</item>
		<item>
		<title>Welcome to Berkeley: Where Hadoop isn&#8217;t nearly fast enough</title>
		<link>http://gigaom.com/2013/04/17/welcome-to-berkeley-where-hadoop-isnt-nearly-fast-enough/</link>
		<comments>http://gigaom.com/2013/04/17/welcome-to-berkeley-where-hadoop-isnt-nearly-fast-enough/#comments</comments>
		<pubDate>Wed, 17 Apr 2013 23:19:13 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[AMPLab]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[in-memory]]></category>
		<category><![CDATA[Mesos]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[Shark]]></category>
		<category><![CDATA[Spark]]></category>
		<category><![CDATA[Tachyon]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=632061</guid>
		<description><![CDATA[Hadoop not fast enough for you? Then you might want to get to know AMPLab, a University of California, Berkeley team developing faster versions of many core Hadoop components.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=632061&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Tucked within the computer science deparment at the University of California, Berkeley, there&#8217;s an institution called <a href="http://amplab.cs.berkeley.edu/">AMPLab</a> that&#8217;s making a name for itself by &#8212; among other things &#8212; essentially rebuilding the Hadoop platform, only faster.</p>
<div id="attachment_632077" class="wp-caption alignright" style="width: 283px"><a href="http://gigaom2.files.wordpress.com/2013/04/spark-lr.png"><img  alt="Results for linear regression test" src="http://gigaom2.files.wordpress.com/2013/04/spark-lr.png?w=708"   class="size-full wp-image-632077" /></a><p class="wp-caption-text">Results for linear regression test</p></div>
<p>AMPLab&#8217;s most well-known product in the big data space, called <a href="http://spark-project.org/">Spark</a>, is an in-memory parallel processing framework that&#8217;s comparable to Hadoop MapReduce except, its creators claim, it is up to 100 times faster. Because it runs in-memory, Spark might be comparable with something like <a href="http://gigaom.com/2012/10/24/metamarkets-open-sources-druid-its-in-memory-database/">Druid</a> or SAP&#8217;s HANA system, too. Spark is the processing engine that powers <a href="http://gigaom.com/2012/12/05/clearstory-data-raises-9m-and-might-actually-make-data-your-friend/">ClearStory&#8217;s next-generation analytics and visualization service</a>.</p>
<p>Like Hive as a data warehouse for Hadoop? Then you&#8217;ll love <a href="http://shark.cs.berkeley.edu/">Shark</a>, which is short for &#8220;Hive on Spark.&#8221;</p>
<p>Even as Hadoop gets more flexible thanks to new features such as YARN, which would technically allow it to run an alternative framework like Spark, AMPLab has its own cluster-management project called <a href="https://amplab.cs.berkeley.edu/projects/mesos-dynamic-resource-sharing-for-clusters/">Mesos</a>. Twitter <a href="http://gigaom.com/2012/04/19/twitter-backs-fave-big-data-projects-with-apache-sponsorship/">is a big fan of Mesos</a>, which is <a href="http://incubator.apache.org/mesos/">also an Apache Incubator project</a>.</p>
<p>AMPLab&#8217;s latest target is the Hadoop Distributed File System, or HDFS. HDFS has long been criticized for availability and speed, so AMPLab created an alternative called Tachyon (<a href="http://highscalability.com/blog/2013/4/17/tachyon-fault-tolerant-distributed-file-system-with-300-time.html">hat tip to High Scalability</a> for calling my attention to it). According to the <a href="http://tachyon-project.org/">Tachyon homepage</a>, &#8220;it offers up to 300 times higher throughput than HDFS, by leveraging lineage information and using memory aggressively. Tachyon caches working set files in memory, and enables different jobs/queries and frameworks to access cached files at memory speed.&#8221;</p>
<p>AMPLab isn&#8217;t the first to question the cult of HDFS, though. There are <a href="http://gigaom.com/cloud/because-hadoop-isnt-perfect-8-ways-to-replace-hdfs/">numerous commercial options available</a>, and Quantcast <a href="http://gigaom.com/2012/09/27/quantcast-releases-bigger-faster-stronger-hadoop-file-system/">built its own open source file system</a> that it claims is faster and more efficient when running at massive scale.</p>
<p>But it&#8217;s probably unfair to call AMPLab&#8217;s projects competitors to Hadoop. They&#8217;re certainly alternatives, but they&#8217;re also complementary, as Twitter&#8217;s heavy use of Hadoop and Mesos demonstrates. And Spark, Shark, Mesos and Tachyon are all compatible with their peer projects from the Apache Hadoop project.</p>
<p>Really, AMPLab is doing what any research institution does by pushing the limits of the current commercially available software. If it happens to disrupt the status quo, then so be it. For users, though, it&#8217;s just providing some new options to play around with as they try to find the right tool for their particular jobs. Its partners and sponsors, including Google, Facebook, Microsoft and Amazon Web Services, certainly have an interest in finding the best-possible technology, or creating it if necessary.</p>
<div id="attachment_632076" class="wp-caption alignright" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/04/arch_mlbase-300x297.jpg"><img  alt="The MLBase architecture." src="http://gigaom2.files.wordpress.com/2013/04/arch_mlbase-300x297.jpg?w=708"   class="size-full wp-image-632076" /></a><p class="wp-caption-text">The MLBase architecture.</p></div>
<p>Other related AMPLab projects include <a href="https://amplab.cs.berkeley.edu/projects/piql-scale-independent-query-processing/">PIQL</a>, a SQL-like query language that sits atop a key-value store; <a href="https://amplab.cs.berkeley.edu/projects/mlbase/">MLBase</a>, a system for doing machine learning on distributed systems; <a href="https://amplab.cs.berkeley.edu/projects/akaros-%c2%a0an-operating-system-for-many-core-architectures-and-large-scale-smp-systems/">Akaros</a>, an operating system for manycore and large SMP systems; and <a href="https://amplab.cs.berkeley.edu/projects/sparrow-low-latency-scheduling-for-interactive-cluster-services/">Sparrow</a>, a cluster-scheduling system designed for low-latency computing.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=632061&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=631305"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=631305" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=632061+welcome-to-berkeley-where-hadoop-isnt-nearly-fast-enough&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/11/unlocking-big-datas-potential-with-search/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=632061+welcome-to-berkeley-where-hadoop-isnt-nearly-fast-enough&utm_content=dharrisstructure">How search can unlock the power of big data</a></li><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=632061+welcome-to-berkeley-where-hadoop-isnt-nearly-fast-enough&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=632061+welcome-to-berkeley-where-hadoop-isnt-nearly-fast-enough&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/17/welcome-to-berkeley-where-hadoop-isnt-nearly-fast-enough/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/04/amplab.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/04/amplab.jpg?w=150" medium="image">
			<media:title type="html">amplab</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/spark-lr.png" medium="image">
			<media:title type="html">Results for linear regression test</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/arch_mlbase-300x297.jpg" medium="image">
			<media:title type="html">The MLBase architecture.</media:title>
		</media:content>
	</item>
		<item>
		<title>&#8216;Linux of online learning&#8217; gets stronger: edX and Stanford team up to build open source platform</title>
		<link>http://gigaom.com/2013/04/02/linux-of-online-learning-gets-stronger-edx-and-stanford-team-up-to-build-open-source-platform/</link>
		<comments>http://gigaom.com/2013/04/02/linux-of-online-learning-gets-stronger-edx-and-stanford-team-up-to-build-open-source-platform/#comments</comments>
		<pubDate>Wed, 03 Apr 2013 04:00:34 +0000</pubDate>
		<dc:creator>Ki Mae Heussner</dc:creator>
				<category><![CDATA[education technology]]></category>
		<category><![CDATA[online education]]></category>
		<category><![CDATA[online learning]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=626861</guid>
		<description><![CDATA[Despite its initial efforts at building its own open-source online learning platform, Stanford said it will fold that platform into the edX platform launched by Harvard and MIT.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=626861&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>In its mission to become the <a href="http://gigaom.com/2013/03/14/the-linux-of-online-learning-edx-takes-big-step-toward-open-source-goal/">&#8220;Linux of online learning,&#8221;</a> <a href="http://www.edx.org">edX</a> just got a powerful new partner. On Wednesday, the Harvard and MIT-backed non-profit is set to announce that it&#8217;s teaming up with Stanford to collaboratively develop the open-source edX platform.</p>
<p>Last fall, Stanford launched its own open-source online learning platform <a href="http://class.stanford.edu/">Class2Go</a>, which it released to the public in January. Developed by a team of Stanford engineers, the platform was designed to support the university’s online classes and research. In addition to being open, the platform was intended to be inter-operable with other services and portable (meaning that the course content isn’t tied to one platform). But as part of the new collaboration, Stanford will cease development on that platform and focus its efforts on edX.</p>
<p>&#8220;[We'll] fold in the key features of the Class2Go platform in the open-source edX and, together, we&#8217;ll be working on a single platform going ahead,&#8221; Anant Agarwal, president of edX, said on a call with reporters. &#8220;By putting all the wood behind one arrow, so to speak, we thought we could have a bigger impact.&#8221;</p>
<p>Since its launch, other schools around the world have started using Class2Go. While the platform will continue to be available to other users, John Mitchell, Stanford&#8217;s vice provost for online learning, said they&#8217;ll work with those schools to migrate to edX while it transitions its own courses.</p>
<p>The two organizations gave few details on how the collaboration would actually work. But they said that Class2Go’s analytics tools, which can track how long students watch a given video, which sections they repeat and other kinds of student activity on the site, are an example of the kinds of features that will be integrated with edX.</p>
<p>Despite Stanford’s collaboration on the edX platform, Mitchell said the university was not joining the “X University Consortium” of institutions that offer courses on the edX site &#8212; which is not entirely surprising given its affiliation with for-profit rival Coursera. The startup was launched by two Stanford professors and the university was one of its launch partners.</p>
<p>But even as Stanford and other top universities partner with for-profit online course providers, like Coursera and Udacity, the growing support for an open source platform shows that schools want to experiment with multiple approaches and be able to control and customize online educational courses and learning tools. The open-source approach means developers anywhere can add new tools to the platform, that professors can create online experiences that best suit their needs and that schools can learn from the innovation of others.</p>
<p>In addition to the Stanford partnership, edX also announced that on June 1, it will release the entire source code for the online learning platform. That development follows its announcement last month that it would <a href="http://gigaom.com/2013/03/14/the-linux-of-online-learning-edx-takes-big-step-toward-open-source-goal/">release its XBlock SDK</a>, the underlying architecture supporting edX course content.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=626861&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=277636"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=277636" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=626861+linux-of-online-learning-gets-stronger-edx-and-stanford-team-up-to-build-open-source-platform&utm_content=kimaeheussner">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/01/12-tech-leaders-resolutions-for-2012/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=626861+linux-of-online-learning-gets-stronger-edx-and-stanford-team-up-to-build-open-source-platform&utm_content=kimaeheussner">12 tech leaders’ resolutions for 2012</a></li><li><a href="http://pro.gigaom.com/2011/08/what-the-google-motorola-deal-means-for-android-microsoft-and-the-mobile-industry/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=626861+linux-of-online-learning-gets-stronger-edx-and-stanford-team-up-to-build-open-source-platform&utm_content=kimaeheussner">What the Google-Motorola deal means for Android, Microsoft and the mobile industry</a></li><li><a href="http://pro.gigaom.com/2011/07/open-sourcing-the-food-industry-new-technology-for-a-new-food-system/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=626861+linux-of-online-learning-gets-stronger-edx-and-stanford-team-up-to-build-open-source-platform&utm_content=kimaeheussner">Open-sourcing the food industry: new technology for a new food system</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/02/linux-of-online-learning-gets-stronger-edx-and-stanford-team-up-to-build-open-source-platform/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/02/edx.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/02/edx.jpg?w=150" medium="image">
			<media:title type="html">edx</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/7467db695203dccb9119d2430d0c5246?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">kimaeheussner</media:title>
		</media:content>
	</item>
		<item>
		<title>Facebook builds a database benchmark for a graph-powered world</title>
		<link>http://gigaom.com/2013/04/01/facebook-builds-a-database-benchmark-for-a-graph-powered-world/</link>
		<comments>http://gigaom.com/2013/04/01/facebook-builds-a-database-benchmark-for-a-graph-powered-world/#comments</comments>
		<pubDate>Mon, 01 Apr 2013 22:28:39 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Benchmarks]]></category>
		<category><![CDATA[Databases]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[graph databases]]></category>
		<category><![CDATA[LinkBench]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=626218</guid>
		<description><![CDATA[Facebook has built a new open source tool for benchmarking graph databases, called LinkBench. And although the chances are your infrastructure and workloads look nothing like Facebook's, the good news is LinkBench was built with configurability in mind.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=626218&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>If you&#8217;re doing any sort of social-media application, you might want to take note of what Facebook just built. The company has <a href="http://www.facebook.com/notes/facebook-engineering/linkbench-a-database-benchmark-for-the-social-graph/10151391496443920">created a benchmarking tool called LinkBench</a> that measures the performance of databases tasked with serving graph-structured data, which, presumably, is the lifeblood of every startup around that&#8217;s concerned with who&#8217;s connected to whom.</p>
<p>Although, of all LinkBench&#8217;s features &#8212; and you can read all about them in a Facebook Engineer wall post from Monday morning &#8212; probably the biggest is <a href="https://github.com/facebook/linkbench">that it&#8217;s open source</a> and built to be extensible. One of the biggest problems with benchmarks overall is that they rarely align with actual production workloads inside the companies that are supposed to care about them. In this case, for example, a benchmark for measuring the performance of <a href="http://gigaom.com/2011/12/06/facebook-shares-some-secrets-on-making-mysql-scale/">Facebook&#8217;s massive MySQL</a>+memcached+<a href="http://www.facebook.com/note.php?note_id=388112370932">Flashcache</a> database architecture against its massive social graph and transaction activity would be all but worthless unless someone was just planning to rebuild Facebook.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/04/linkbench-copy.jpg"><img  alt="linkbench copy" src="http://gigaom2.files.wordpress.com/2013/04/linkbench-copy.jpg?w=708&#038;h=610" width="708" height="610" class="aligncenter size-large wp-image-626252" /></a></p>
<p>I&#8217;ve written in the past that <a href="http://gigaom.com/2012/07/26/why-crowdsourced-computing-benchmarks-are-the-future/">perhaps crowdsourced benchmarks are the wave of the futur</a>e: essentially a compiled set of statistics and best practices as more companies test different database (or Hadoop) technologies on different hardware setups against different workloads and publish the results. Everything will of course vary by the exact details within any given environment, but it would be a good way to get a sense of how a particular stack might, or perhaps should, fare.</p>
<p>But an open source benchmark tuned for a specific use case &#8212; social graphs &#8212; by probably the world&#8217;s foremost expert on that use case is interesting, too. Anyone else trying to serve data from their own social graphs can benefit from some of LinkBench&#8217;s more-prominent features, such as its ability to generate &#8220;large synthetic social graphs,&#8221; while tuning it to the specifics of their own infrastructure. After all, it might be that your app has different requirement around reading versus writing data, and <a href="http://gigaom.com/2011/07/21/is-stonebraker-right-why-sql-isnt-the-choice-du-jour-for-many-apps/">it&#8217;s very possible you&#8217;re not using MySQL</a>, either.</p>
<p>Or maybe you are using MySQL and want to see how a newer database technology might handle your graph workload. That, by the way, is one of the reasons Facebook built LinkBench, according to this post.</p>
<p>At any rate, the social web is all about graphs, and database performance really matters for anyone trying to build a service that stays online and provides a pleasant user experience. Say what you want about Facebook, but its services perform, so the bar is set high for anyone trying to dethrone it or at least to build something than can attract an equally large and devout following.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=626218&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=181584"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=181584" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=626218+facebook-builds-a-database-benchmark-for-a-graph-powered-world&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2010/10/is-the-future-of-enterprise-completely-open-source/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=626218+facebook-builds-a-database-benchmark-for-a-graph-powered-world&utm_content=dharrisstructure">Is the Future of Enterprise Completely Open Source?</a></li><li><a href="http://pro.gigaom.com/2012/11/breaking-down-barriers-and-reducing-cycle-times-with-devops-and-continuous-delivery/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=626218+facebook-builds-a-database-benchmark-for-a-graph-powered-world&utm_content=dharrisstructure">How devops can reduce cycle times</a></li><li><a href="http://pro.gigaom.com/2011/12/migrating-media-applications-to-the-private-cloud-best-practices-for-businesses/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=626218+facebook-builds-a-database-benchmark-for-a-graph-powered-world&utm_content=dharrisstructure">Migrating media applications to the private cloud: best practices for businesses</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/01/facebook-builds-a-database-benchmark-for-a-graph-powered-world/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/04/graph-copy-e1364854754198.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/04/graph-copy-e1364854754198.jpg?w=150" medium="image">
			<media:title type="html">graph copy</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/linkbench-copy.jpg?w=708" medium="image">
			<media:title type="html">linkbench copy</media:title>
		</media:content>
	</item>
		<item>
		<title>Big, open data: MapR on Github and Yelp&#8217;s dataset challenge</title>
		<link>http://gigaom.com/2013/03/28/big-open-data-mapr-on-github-and-yelps-dataset-challenge/</link>
		<comments>http://gigaom.com/2013/03/28/big-open-data-mapr-on-github-and-yelps-dataset-challenge/#comments</comments>
		<pubDate>Thu, 28 Mar 2013 16:57:42 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[github]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Mapr]]></category>
		<category><![CDATA[Open Data]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[yelp]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=625286</guid>
		<description><![CDATA[MapR is releasing open source code and partnering with Canonical on Ubuntu, while Netflix is releasing some data for for developers to play with. Sounds like a good day for openness.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=625286&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>If you&#8217;re into open source, or at least open data, today is a good day. Hadoop vendor MapR has open sourced a portion of its source code <a href="https://github.com/mapr/">on Github</a> and <a href="http://repository.mapr.com/maven/">Maven</a>, while Yelp has released a sample of its data as <a href="http://www.yelp.com/dataset_challenge/">part of a $5,000 challenge</a> to find the most-innovative use for it.</p>
<p>MapR&#8217;s decision to open source parts of it code is significant, but not groundbreaking. The company is only releasing its improvements to a handful of Hadoop-related Apache projects that are included in the MapR distribution of Hadoop, but not the proprietary code that&#8217;s MapR&#8217;s real competitive advantage in the contentious Hadoop market. While it&#8217;s still not flying the all-open-source banner like Hortonworks is, the code release puts MapR more on par with competitor Cloudera, which bolsters its open source aspects with some proprietary software for managing Hadoop clusters.</p>
<p>MapR also took another step in the open source direction on Thursday, announcing a partnership with Canonical that integrates MapR&#8217;s M3 distribution with the Ubuntu Linux operating system. The two also have plans to ease the installation of MapR&#8217;s Hadoop software on OpenStack-based cloud infrastructure.</p>
<p>I wrote recently <a href="http://gigaom.com/2013/03/18/in-battle-for-hadoop-mapr-raises-30m/">in relation to MapR&#8217;s $30 million VC investment</a> that the company is in a tricky position when it comes to open source. The Hadoop ecosystem was <a href="http://gigaom.com/2013/03/04/the-history-of-hadoop-from-4-nodes-to-the-future-of-data/">built on open source and still values it immensely</a>, but some customers are definitely willing to pay money for products that deliver the features they want, open source or not.</p>
<p>As for Yelp, well, it&#8217;s just following in the footsteps of many companies &#8212; <a href="http://gigaom.com/2009/07/27/why-the-netflix-prize-is-a-kind-of-a-big-deal/">Netflix</a> and everyone doing something on Kaggle <a href="https://www.kaggle.com/c/predict-wordpress-likes/forums/t/2738/splunk-innovation-prize-results/14720">(including GigaOM</a>) &#8212; in trying to find new ways to use its data. The data set it&#8217;s releasing is from the Phoenix, Ariz., area and include 11,537 businesses, 8,282 checkin sets, 43,873 users and 229,907 reviews. The deadline for entries is May 20, and they can be submitted in pretty much any form you can imagine.</p>
<p>Hopefully, for Yelp&#8217;s sake, it doesn&#8217;t step in it the way other companies &#8212; <a href="http://gigaom.com/2010/03/12/netflix-cancels-recommendation-engine-contest-settles-privacy-lawsuit/">including Netflix</a> and AOL &#8212; have when they released supposedly anonymous data sets that were later de-anonymized. Releasing data sets gives clear benefits to both the source companies <a href="http://gigaom.com/2012/05/24/in-social-data-a-fight-between-science-and-privacy/">and institutions or individuals accessing the data</a>, but privacy snafus have a away sneaking up and mitigating some of the goodwill.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-249574p1.html">Shutterstock user Jakub Krechowicz</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=625286&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=995668"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=995668" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=625286+big-open-data-mapr-on-github-and-yelps-dataset-challenge&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=625286+big-open-data-mapr-on-github-and-yelps-dataset-challenge&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/11/unlocking-big-datas-potential-with-search/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=625286+big-open-data-mapr-on-github-and-yelps-dataset-challenge&utm_content=dharrisstructure">How search can unlock the power of big data</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=625286+big-open-data-mapr-on-github-and-yelps-dataset-challenge&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/03/28/big-open-data-mapr-on-github-and-yelps-dataset-challenge/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/03/shutterstock_88662181.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/03/shutterstock_88662181.jpg?w=150" medium="image">
			<media:title type="html">giving hands</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>
	</item>
		<item>
		<title>Concurrent gets $4M for higher-level Hadoop</title>
		<link>http://gigaom.com/2013/03/20/concurrent-gets-4m-for-higher-level-hadoop/</link>
		<comments>http://gigaom.com/2013/03/20/concurrent-gets-4m-for-higher-level-hadoop/#comments</comments>
		<pubDate>Wed, 20 Mar 2013 16:33:29 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Cascading]]></category>
		<category><![CDATA[Concurrent]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=622319</guid>
		<description><![CDATA[Cascading proprietor Concurrent has secured $4 million in venture capital in order to advance its efforts toward easing the development of big data applications.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=622319&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.concurrentinc.com/">Concurrent</a>, proprietor of the open-source <a href="http://www.cascading.org/">Cascading framework</a> for developing big data workflows, has closed $4 million Series A investment round from True Ventures  <em>(see disclosure)</em> and Rembrandt Partners. Cascading has been around for a few years, actually, but Concurrent only <a href="http://gigaom.com/2011/07/26/concurrent-raises-900k-to-make-hadoop-easier/">raised seed funding in 2011</a> and has been riding the wave of interest in making big data easier to do.</p>
<p>In practice, Cascading is generally used as a higher-level method than MapReduce for writing Hadoop jobs, although it&#8217;s technically a framework that could support any number of distributed-processing frameworks. It&#8217;s <a href="http://gigaom.com/2012/08/15/meet-the-combo-behind-etsy-airbnb-and-climate-corp-hadoop-jobs/">used by a number of notable users</a>, including Etsy, Airbnb and Climate Corporation. In February, the Cascading project expanded its scope to address the growing SQL-on-Hadoop trend with a project called Lingual.</p>
<p>Software veteran Gary Nakamura is taking on the role of Concurrent CEO, replacing Cascading creator Chris Wensel, who&#8217;ll stay on as the company&#8217;s CTO.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/03/api-diagram-1.png"><img  alt="api-diagram (1)" src="http://gigaom2.files.wordpress.com/2013/03/api-diagram-1.png?w=708"   class="aligncenter size-full wp-image-622350" /></a></p>
<p><em><strong>Disclosure</strong>: Concurrent is backed by True Ventures, a venture capital firm that is an investor in the parent company of this blog, Giga Omni Media. Om Malik, the founder of Giga Omni Media, is also a venture partner at True.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=622319&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=793176"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=793176" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=622319+concurrent-gets-4m-for-higher-level-hadoop&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=622319+concurrent-gets-4m-for-higher-level-hadoop&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=622319+concurrent-gets-4m-for-higher-level-hadoop&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=622319+concurrent-gets-4m-for-higher-level-hadoop&utm_content=dharrisstructure">A near-term outlook for big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/03/20/concurrent-gets-4m-for-higher-level-hadoop/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/03/wensel1-e1363797088502.jpeg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/03/wensel1-e1363797088502.jpeg?w=150" medium="image">
			<media:title type="html">wensel</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/api-diagram-1.png" medium="image">
			<media:title type="html">api-diagram (1)</media:title>
		</media:content>
	</item>
		<item>
		<title>Storage player Basho open sources Riak CS</title>
		<link>http://gigaom.com/2013/03/20/storage-player-basho-open-sources-riak-cs/</link>
		<comments>http://gigaom.com/2013/03/20/storage-player-basho-open-sources-riak-cs/#comments</comments>
		<pubDate>Wed, 20 Mar 2013 13:00:00 +0000</pubDate>
		<dc:creator>Barb Darrow</dc:creator>
				<category><![CDATA[Basho]]></category>
		<category><![CDATA[Cloud]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[Structure Data 2013]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=621979</guid>
		<description><![CDATA[Riak CS distributed cloud storage technology has always been sort of open-sourcey but not really open sourced. That's changing now with Basho putting it under the Apache 2 license.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=621979&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://basho.com/">Basho </a>has done pretty well positioning its Riak CS distributed storage technology  in the open source cloud world. It’s not really part of either the<a href="http://www.openstack.org/foundation/"> OpenStack</a> nor the <a href="http://incubator.apache.org/cloudstack/">CloudStack</a> organizations but it will run in both environments. This was a tad unusual because, strictly speaking, Riak CS itself was not open source.</p>
<p>At least until now. As of Wednesday, Riak CS will be available under the Apache 2 license.</p>
<p><a href="http://gigaom.com/2013/03/20/storage-player-basho-open-sources-riak-cs/riakcslogo/" rel="attachment wp-att-621980"><img alt="RiakCSLogo" src="http://gigaom2.files.wordpress.com/2013/03/riakcslogo.jpg?w=708"   class="alignleft size-full wp-image-621980"></a>This is part of an evolution, said Basho CTO Justin Sheehy (pictured above.) Last fall, Basho said <a href="http://gigaom.com/2012/09/05/basho-joins-apache-cloudstack-effort/"> it would work with Citrix to integrate Apache Cloudstack effort </a> to integrate Riak CS into Cloudstack.</p>
<p>“We are not part of the OpenStack Foundation but we have deployed Riak CS in OpenStack environments with OpenStack components,” Sheehy said. OpenStack also includes its own storage subsystem as an option.  The work with Cloudstack is a little more explicit since Cloudstack is mostly a compute cloud with networking elements but does not itself offer a real storage subsystem analogous to Amazon’s S3 storage, he said.</p>
<p>One of Riak CS’s key draws is that customers can use it to build private or public cloud storage that is API-compatible with Amazon S3, the 800-lb gorilla in public cloud storage.</p>
<p>Along with its licensing news, Basho also unveiled new features for the open source version including multipart uploads to ease the movement of very large files in pieces into the store; per-bucket policies to restrict access to some data based on its source IP; and Riak Control, a standalone web administration tool for managing users (see screenshot below.)</p>
<p><a href="http://gigaom.com/2013/03/20/storage-player-basho-open-sources-riak-cs/riakcsscreen/" rel="attachment wp-att-621996"><img alt="riakcsscreen" src="http://gigaom2.files.wordpress.com/2013/03/riakcsscreen.jpg?w=708&#038;h=386" width="708" height="386" class="aligncenter size-full wp-image-621996"></a></p>
<p>Basho continues to offer <a href="http://gigaom.com/2013/02/21/basho-technologies-takes-aim-at-more-enterprises-with-upgrades/">Riak CS Enterprise </a>as a commercial product with around-the-clock support.  To be clear, <a href="http://basho.com/riak/">Basho’s Riak</a>, the distributed database underlying Riak CS was already an open source product.</p>
<p>Sheehy will be talking big data and other hot topics at <a href="http://event.gigaom.com/structuredata/?utm_source=cloud&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=621979+storage-player-basho-open-sources-riak-cs&amp;utm_content=gigabarb">GigaOM’s Structure: Data </a>event in New York on Wednesday.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=621979&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=975128"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=975128" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=621979+storage-player-basho-open-sources-riak-cs&utm_content=gigabarb">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=621979+storage-player-basho-open-sources-riak-cs&utm_content=gigabarb">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2010/10/what-enterprise-software-vendors-could-learn-from-the-consumer-space/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=621979+storage-player-basho-open-sources-riak-cs&utm_content=gigabarb">What Enterprise Software Vendors Could Learn from the Consumer Space</a></li><li><a href="http://pro.gigaom.com/2012/11/unlocking-big-datas-potential-with-search/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=621979+storage-player-basho-open-sources-riak-cs&utm_content=gigabarb">How search can unlock the power of big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/03/20/storage-player-basho-open-sources-riak-cs/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/09/sheehy-3-e1346846401253.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/09/sheehy-3-e1346846401253.jpg?w=150" medium="image">
			<media:title type="html">Basho CTO Justin Sheehy</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/4af03439988d64f816da72496325cb73?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigabarb</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/riakcslogo.jpg" medium="image">
			<media:title type="html">RiakCSLogo</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/riakcsscreen.jpg" medium="image">
			<media:title type="html">riakcsscreen</media:title>
		</media:content>
	</item>
	</channel>
</rss>
