<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Parallel Programming in the Age of Big Data</title>
	<atom:link href="http://gigaom.com/2008/11/09/mapreduce-leads-the-way-for-parallel-programming/feed/" rel="self" type="application/rss+xml" />
	<link>http://gigaom.com/2008/11/09/mapreduce-leads-the-way-for-parallel-programming/</link>
	<description>Trusted Insights and Conversations on the Next Wave of Technology</description>
	<lastBuildDate>Thu, 26 Nov 2009 19:44:08 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: woorung</title>
		<link>http://gigaom.com/2008/11/09/mapreduce-leads-the-way-for-parallel-programming/#comment-963121</link>
		<dc:creator>woorung</dc:creator>
		<pubDate>Thu, 30 Jul 2009 04:42:40 +0000</pubDate>
		<guid isPermaLink="false">http://gigaom.com/?p=27102#comment-963121</guid>
		<description>&lt;p&gt;I think that the final winner between anti-RDBMS and parallel RDBMS will a hybrid system which aims at integrating MapReduce into RDBMS.
Actually, GreenPlum and HadoopDB are doing that.
Both of them are RDBMS advocates since they have lots of knowledge and experience through RDBMS research.
Especially, the hybrid system need SQL-like query analysis &amp; optimization to manipulate distributed DBMSs with MapReduce.
In this point, I think that RDBMS advocates cannot help defeating anti-RDBMS advocates, unfortunately.
Nevertheless, they do not lead IT industry &amp; market due to somewhat high cost.
To reduce cost, most of people want to take advantage of open sources.
Currently, I am a creator of open source &quot;coord&quot;(http://www.coordguru.com), which provides C++ MapReduce framework and distributed key-value store. 
In the near future, I believe that such a plan will be achieved on coord project.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>I think that the final winner between anti-RDBMS and parallel RDBMS will a hybrid system which aims at integrating MapReduce into RDBMS.
Actually, GreenPlum and HadoopDB are doing that.
Both of them are RDBMS advocates since they have lots of knowledge and experience through RDBMS research.
Especially, the hybrid system need SQL-like query analysis &amp; optimization to manipulate distributed DBMSs with MapReduce.
In this point, I think that RDBMS advocates cannot help defeating anti-RDBMS advocates, unfortunately.
Nevertheless, they do not lead IT industry &amp; market due to somewhat high cost.
To reduce cost, most of people want to take advantage of open sources.
Currently, I am a creator of open source &#8220;coord&#8221;(http://www.coordguru.com), which provides C++ MapReduce framework and distributed key-value store. 
In the near future, I believe that such a plan will be achieved on coord project.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Cloudera CEO: Hadoop Will Go Beyond Web Apps</title>
		<link>http://gigaom.com/2008/11/09/mapreduce-leads-the-way-for-parallel-programming/#comment-949225</link>
		<dc:creator>Cloudera CEO: Hadoop Will Go Beyond Web Apps</dc:creator>
		<pubDate>Tue, 02 Jun 2009 00:05:18 +0000</pubDate>
		<guid isPermaLink="false">http://gigaom.com/?p=27102#comment-949225</guid>
		<description>&lt;p&gt;[...] Parallel Programming in the Age of Big Data [...]&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>[...] Parallel Programming in the Age of Big Data [...]</p>]]></content:encoded>
	</item>
	<item>
		<title>By: MapReduce vs. SQL: It&#8217;s Not One or the Other</title>
		<link>http://gigaom.com/2008/11/09/mapreduce-leads-the-way-for-parallel-programming/#comment-937708</link>
		<dc:creator>MapReduce vs. SQL: It&#8217;s Not One or the Other</dc:creator>
		<pubDate>Tue, 14 Apr 2009 22:53:36 +0000</pubDate>
		<guid isPermaLink="false">http://gigaom.com/?p=27102#comment-937708</guid>
		<description>&lt;p&gt;[...] than does cloud golden child MapReduce. But how shocked should we be, really? After all, choosing a parallel data strategy is not an all-or-nothing [...]&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>[...] than does cloud golden child MapReduce. But how shocked should we be, really? After all, choosing a parallel data strategy is not an all-or-nothing [...]</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Big (linked?) data</title>
		<link>http://gigaom.com/2008/11/09/mapreduce-leads-the-way-for-parallel-programming/#comment-925382</link>
		<dc:creator>Big (linked?) data</dc:creator>
		<pubDate>Sun, 08 Feb 2009 17:50:16 +0000</pubDate>
		<guid isPermaLink="false">http://gigaom.com/?p=27102#comment-925382</guid>
		<description>&lt;p&gt;[...] in the larger universe of data that these organizations inhabit. Big Data unleashed by the “Industrial Revolution of Data”, whether from public agencies, non-profit institutes, or forward-thinking private [...]&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>[...] in the larger universe of data that these organizations inhabit. Big Data unleashed by the “Industrial Revolution of Data”, whether from public agencies, non-profit institutes, or forward-thinking private [...]</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Big Money for Big Database Company</title>
		<link>http://gigaom.com/2008/11/09/mapreduce-leads-the-way-for-parallel-programming/#comment-921592</link>
		<dc:creator>Big Money for Big Database Company</dc:creator>
		<pubDate>Tue, 13 Jan 2009 17:45:46 +0000</pubDate>
		<guid isPermaLink="false">http://gigaom.com/?p=27102#comment-921592</guid>
		<description>&lt;p&gt;[...] Parallel programming in the age of big data. 2. Programming a parallel future. 3. Terracotta doesn&#8217;t wnat to kill your database, just [...]&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>[...] Parallel programming in the age of big data. 2. Programming a parallel future. 3. Terracotta doesn&#8217;t wnat to kill your database, just [...]</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Is Big Data near a tipping point? : Data Evolution</title>
		<link>http://gigaom.com/2008/11/09/mapreduce-leads-the-way-for-parallel-programming/#comment-921077</link>
		<dc:creator>Is Big Data near a tipping point? : Data Evolution</dc:creator>
		<pubDate>Fri, 09 Jan 2009 07:01:08 +0000</pubDate>
		<guid isPermaLink="false">http://gigaom.com/?p=27102#comment-921077</guid>
		<description>&lt;p&gt;[...] in the larger universe of data that these organizations inhabit.  Big Data unleashed by the “Industrial Revolution of Data”, whether from public agencies, non-profit institutes, or forward-thinking private [...]&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>[...] in the larger universe of data that these organizations inhabit.  Big Data unleashed by the “Industrial Revolution of Data”, whether from public agencies, non-profit institutes, or forward-thinking private [...]</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Parallel Programming in the Age of Big Data</title>
		<link>http://gigaom.com/2008/11/09/mapreduce-leads-the-way-for-parallel-programming/#comment-914614</link>
		<dc:creator>Parallel Programming in the Age of Big Data</dc:creator>
		<pubDate>Tue, 25 Nov 2008 12:15:43 +0000</pubDate>
		<guid isPermaLink="false">http://gigaom.com/?p=27102#comment-914614</guid>
		<description>&lt;p&gt;[...] Full Story [...]&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>[...] Full Story [...]</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Mapping and reducing MD trajectories with HiMach : business&#124;bytes&#124;genes&#124;molecules</title>
		<link>http://gigaom.com/2008/11/09/mapreduce-leads-the-way-for-parallel-programming/#comment-914586</link>
		<dc:creator>Mapping and reducing MD trajectories with HiMach : business&#124;bytes&#124;genes&#124;molecules</dc:creator>
		<pubDate>Tue, 25 Nov 2008 04:47:24 +0000</pubDate>
		<guid isPermaLink="false">http://gigaom.com/?p=27102#comment-914586</guid>
		<description>&lt;p&gt;[...] Parallel Programming in the Age of Big Data [...]&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>[...] Parallel Programming in the Age of Big Data [...]</p>]]></content:encoded>
	</item>
	<item>
		<title>By: &#8220;Parallel Programming in the Age of Big Data&#8221; &#124; insideHPC</title>
		<link>http://gigaom.com/2008/11/09/mapreduce-leads-the-way-for-parallel-programming/#comment-914576</link>
		<dc:creator>&#8220;Parallel Programming in the Age of Big Data&#8221; &#124; insideHPC</dc:creator>
		<pubDate>Tue, 25 Nov 2008 02:15:59 +0000</pubDate>
		<guid isPermaLink="false">http://gigaom.com/?p=27102#comment-914576</guid>
		<description>&lt;p&gt;[...] has gotten a lot of attention on the interwebs over the past couple weeks, so I hereby present you with a link to said article, plus an excerpt to keep you warm while you sink into your pre-Thanksgiving [...]&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>[...] has gotten a lot of attention on the interwebs over the past couple weeks, so I hereby present you with a link to said article, plus an excerpt to keep you warm while you sink into your pre-Thanksgiving [...]</p>]]></content:encoded>
	</item>
	<item>
		<title>By: jmh</title>
		<link>http://gigaom.com/2008/11/09/mapreduce-leads-the-way-for-parallel-programming/#comment-912647</link>
		<dc:creator>jmh</dc:creator>
		<pubDate>Fri, 14 Nov 2008 04:15:14 +0000</pubDate>
		<guid isPermaLink="false">http://gigaom.com/?p=27102#comment-912647</guid>
		<description>&lt;p&gt;Nitin:
You make a couple interesting points.  Google&#039;s MapReduce definition results in a batch (&quot;offline&quot;) processing system as you say -- because they defined Reduce to produce a sorted list of Reduce groups.  Hadoop followed suit.  Note though that if you change the assumptions a bit, you could allow pipelined (&quot;online&quot;) Reduce outputs.  We did a bunch of research on Online Aggregation http://control.cs.berkeley.edu back in the 90&#039;s in the SQL context, where you could get &quot;early returns&quot; from aggregation (reduce) tasks.   That work has been extended by Jermaine, Dobra and their students in recent years.   I agree that it&#039;s time to apply it to MapReduce, to bring that programming model online.  Another direction to pursue there is the extension of the MapReduce programming API to handle continuous data streams, akin to the work on TelegraphCQ http://telegraph.cs.berkeley.edu and related research projects.&lt;/p&gt;

&lt;p&gt;Your other point is about the fact that some workloads don&#039;t partition well, and you need to do replication.  You&#039;re right -- that&#039;s a trickier nut to crack. It arises in scientific computation a lot.  I&#039;m not will convinced that social nets are clearly in that bin.  I worked with LinkedIn to parallelize a number of their analytic jobs on top of Greenplum using both SQL and MapReduce, and they partitioned quite smoothly.&lt;/p&gt;

&lt;p&gt;There&#039;s historically been a tendency to think in the following terms: &quot;I need to parallelize Algorithm X, and it has a lot of intrinsic data sharing, so I need an innovative/expensive parallel architecture since data-parallelism won&#039;t work&quot;.  In many cases, it can be more fruitful to think in these terms: &quot;I have a cost-effective data-parallel infrastructure, so I need an innovative Algorithm X&#039; that approximates Algorithm X.&quot;  This mindset is taking hold in a number of quarters, and there&#039;s quite a lot of sophisticated things you can do cheaply if you embrace that approach.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Nitin:
You make a couple interesting points.  Google&#8217;s MapReduce definition results in a batch (&#8220;offline&#8221;) processing system as you say &#8212; because they defined Reduce to produce a sorted list of Reduce groups.  Hadoop followed suit.  Note though that if you change the assumptions a bit, you could allow pipelined (&#8220;online&#8221;) Reduce outputs.  We did a bunch of research on Online Aggregation <a href="http://control.cs.berkeley.edu" rel="nofollow">http://control.cs.berkeley.edu</a> back in the 90&#8217;s in the SQL context, where you could get &#8220;early returns&#8221; from aggregation (reduce) tasks.   That work has been extended by Jermaine, Dobra and their students in recent years.   I agree that it&#8217;s time to apply it to MapReduce, to bring that programming model online.  Another direction to pursue there is the extension of the MapReduce programming API to handle continuous data streams, akin to the work on TelegraphCQ <a href="http://telegraph.cs.berkeley.edu" rel="nofollow">http://telegraph.cs.berkeley.edu</a> and related research projects.</p>

<p>Your other point is about the fact that some workloads don&#8217;t partition well, and you need to do replication.  You&#8217;re right &#8212; that&#8217;s a trickier nut to crack. It arises in scientific computation a lot.  I&#8217;m not will convinced that social nets are clearly in that bin.  I worked with LinkedIn to parallelize a number of their analytic jobs on top of Greenplum using both SQL and MapReduce, and they partitioned quite smoothly.</p>

<p>There&#8217;s historically been a tendency to think in the following terms: &#8220;I need to parallelize Algorithm X, and it has a lot of intrinsic data sharing, so I need an innovative/expensive parallel architecture since data-parallelism won&#8217;t work&#8221;.  In many cases, it can be more fruitful to think in these terms: &#8220;I have a cost-effective data-parallel infrastructure, so I need an innovative Algorithm X&#8217; that approximates Algorithm X.&#8221;  This mindset is taking hold in a number of quarters, and there&#8217;s quite a lot of sophisticated things you can do cheaply if you embrace that approach.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Cloud Dataflow</title>
		<link>http://gigaom.com/2008/11/09/mapreduce-leads-the-way-for-parallel-programming/#comment-912186</link>
		<dc:creator>Cloud Dataflow</dc:creator>
		<pubDate>Wed, 12 Nov 2008 02:51:31 +0000</pubDate>
		<guid isPermaLink="false">http://gigaom.com/?p=27102#comment-912186</guid>
		<description>&lt;p&gt;[...] Hellerstein of Berkeley has been blogging about parallelism and big data recently. As a database guy, and an advisor to Greenplum, there is [...]&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>[...] Hellerstein of Berkeley has been blogging about parallelism and big data recently. As a database guy, and an advisor to Greenplum, there is [...]</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Nitin Borwankar</title>
		<link>http://gigaom.com/2008/11/09/mapreduce-leads-the-way-for-parallel-programming/#comment-912124</link>
		<dc:creator>Nitin Borwankar</dc:creator>
		<pubDate>Tue, 11 Nov 2008 19:44:21 +0000</pubDate>
		<guid isPermaLink="false">http://gigaom.com/?p=27102#comment-912124</guid>
		<description>&lt;p&gt;Hi Joe,&lt;/p&gt;

&lt;p&gt;It&#039;s true that Map-Reduce does wonders for &lt;em&gt;offline&lt;/em&gt; processing of data generated as a byproduct of activities on the web, phone etc.  But it needs to be pointed out that social network data models create &lt;em&gt;online&lt;/em&gt; data management problems that are not parallelizable. This is simply because user-friend-shareditem relationships generate networks of relationships that can&#039;t be partitioned as easily as the hierarchical trees we were accustomed to tackling in the business world.  There is a lot of  talk of sharding/partitioning etc. Yet it remains true that the underlying data models in social networks combined with the ~100million users and their data represent a whole different class of &lt;em&gt;online&lt;/em&gt; data management problem not addressed by map-reduce, column stores ... and all the &lt;em&gt;offline&lt;/em&gt; data management technologies that are correctly getting a lot of attention.
In  the meanwhile, Twitter, Facebook, etc. have to face massive &lt;em&gt;online&lt;/em&gt; data management problems not addressed by the vendors or the research community.&lt;/p&gt;

&lt;p&gt;There are surprising implications when one does look deeper at the problem. Simple back of the envelope calculations re: scaling data items belonging to ~100million network-meshed users suggests that only natural way to partition the data is for users to have their own client side data stores and run queries locally when those queries mostly involve their small subset of data. Also the ratio of (CPU+disk ) to data quantity is much more favorable.  Offloading subsetted queries is the only way for &lt;em&gt;online&lt;/em&gt; social network data management to scale.&lt;/p&gt;

&lt;p&gt;This creates the next generation of client-server architecture where ~100 million client-side databases (possibly on SQLite) each of ~100MB-10G need to be synchronized in real time with say the Facebook backend.   Realistic scalable solutions in this area, i.e. web driven online data management for ~100mill users, are being completely ignored by the Oracles/MySQLs of the world and also by the database research community and yet these problems can be very lucrative ....  Of course there is a lot of money in managing offline Big Data but who will help solve the pain of exploding online data management  - what I&#039;ve called Data 2.0 since 2005 ?&lt;/p&gt;

&lt;p&gt;The current advances in memory with 2TB RAM in 2U box for ~30K may provide temporary relief but the underlying meshed-network data models need serious research and engineering and for now the&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Hi Joe,</p>

<p>It&#8217;s true that Map-Reduce does wonders for <em>offline</em> processing of data generated as a byproduct of activities on the web, phone etc.  But it needs to be pointed out that social network data models create <em>online</em> data management problems that are not parallelizable. This is simply because user-friend-shareditem relationships generate networks of relationships that can&#8217;t be partitioned as easily as the hierarchical trees we were accustomed to tackling in the business world.  There is a lot of  talk of sharding/partitioning etc. Yet it remains true that the underlying data models in social networks combined with the ~100million users and their data represent a whole different class of <em>online</em> data management problem not addressed by map-reduce, column stores &#8230; and all the <em>offline</em> data management technologies that are correctly getting a lot of attention.
In  the meanwhile, Twitter, Facebook, etc. have to face massive <em>online</em> data management problems not addressed by the vendors or the research community.</p>

<p>There are surprising implications when one does look deeper at the problem. Simple back of the envelope calculations re: scaling data items belonging to ~100million network-meshed users suggests that only natural way to partition the data is for users to have their own client side data stores and run queries locally when those queries mostly involve their small subset of data. Also the ratio of (CPU+disk ) to data quantity is much more favorable.  Offloading subsetted queries is the only way for <em>online</em> social network data management to scale.</p>

<p>This creates the next generation of client-server architecture where ~100 million client-side databases (possibly on SQLite) each of ~100MB-10G need to be synchronized in real time with say the Facebook backend.   Realistic scalable solutions in this area, i.e. web driven online data management for ~100mill users, are being completely ignored by the Oracles/MySQLs of the world and also by the database research community and yet these problems can be very lucrative &#8230;.  Of course there is a lot of money in managing offline Big Data but who will help solve the pain of exploding online data management  &#8211; what I&#8217;ve called Data 2.0 since 2005 ?</p>

<p>The current advances in memory with 2TB RAM in 2U box for ~30K may provide temporary relief but the underlying meshed-network data models need serious research and engineering and for now the</p>]]></content:encoded>
	</item>
	<item>
		<title>By: second GigaOM post is up &#171; Data Beta</title>
		<link>http://gigaom.com/2008/11/09/mapreduce-leads-the-way-for-parallel-programming/#comment-911876</link>
		<dc:creator>second GigaOM post is up &#171; Data Beta</dc:creator>
		<pubDate>Mon, 10 Nov 2008 20:47:24 +0000</pubDate>
		<guid isPermaLink="false">http://gigaom.com/?p=27102#comment-911876</guid>
		<description>&lt;p&gt;[...] in Uncategorized    here &#8230; and it smells like the Internet out there in the comment thread.  Go figure.     &#171; [...]&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>[...] in Uncategorized    here &#8230; and it smells like the Internet out there in the comment thread.  Go figure.     &laquo; [...]</p>]]></content:encoded>
	</item>
	<item>
		<title>By: jmh</title>
		<link>http://gigaom.com/2008/11/09/mapreduce-leads-the-way-for-parallel-programming/#comment-911874</link>
		<dc:creator>jmh</dc:creator>
		<pubDate>Mon, 10 Nov 2008 20:40:39 +0000</pubDate>
		<guid isPermaLink="false">http://gigaom.com/?p=27102#comment-911874</guid>
		<description>&lt;p&gt;Some additional thoughts in response to comments:&lt;/p&gt;

&lt;p&gt;1) When it first appeared, this post had a different title, which had an unpleasant whiff of MapReduce marketing. I honestly don&#039;t know how that title got there, but it wasn&#039;t what I put down when I submitted the text.  Hopefully the fix to the title now gets the flavor a little closer to where it should be.  (It also makes R. McLoy&#039;s post a little confusing, since it refers to that earlier title which is now gone...)&lt;/p&gt;

&lt;p&gt;2) For me, the most interesting thing about MapReduce (by far!) is that people are so interested in it.  We&#039;re only going to make progress in parallel computing with paradigms that programmers embrace.  SQL and MapReduce are two successful examples (arguably the only two) in the parallel programming space.  MapReduce is the new kid on that block, relatively speaking. So we should learn from that, channel the energy to more general directions, and make more progress.  You have to bring people along with you on this stuff.&lt;/p&gt;

&lt;p&gt;3)  On other languages/models:  Josh is right:  the MS work on LINQ and co. -- and the MSR work on C-Omega -- are some of the most interesting things in the space of embedding declarative nuggets into traditional code, and moving toward a more parallelizable world.  Definitely worth watching.  In the comments on the first post, somebody mentioned Erlang, which is another very interesting and somewhat different design point in the parallel computing space.  Also well worth looking at -- I recommend the Erlang book, it&#039;s a pleasure to work through, and eye-opening.  Finally, I&#039;m pretty fired up about our research work on Overlog, and the work we&#039;re beginning on Lincoln.  These languages push the data-centric style of SQL and MapReduce into a much richer programming model.  The Overlog work is documented to some degree at https://trac.declarativity.net.  The work on Lincoln is still in the pipe, and has an eye on attracting programmers, not just researchers. Watch that space.&lt;/p&gt;

&lt;p&gt;4)  Yuvamani: Yep, Google&#039;s MapReduce is a batch system, because that&#039;s what Google wanted at the time.  Hadoop copied that.  But it&#039;s all relatively easy to change, with little impact on the programming model.   BigTable and company are indeed very interesting.  I expect an integration of low-consistency storage with other query processing languages in the next year or so.  We&#039;ll see which of the DBMS vendors gets there first, or if it happens by gluing a query processor over a system like that.  Facebook&#039;s Hive is one open-source step in that direction (though it&#039;s got a pretty thin query processor at this stage).  The question of how much query optimization to do under the covers is kind of an eternal tradeoff between complexity and control.  We&#039;ll see how it plays out in the emerging application domains.&lt;/p&gt;

&lt;p&gt;5) Ronald: I have to agree.  Mother Nature has been amazing people for millenia.  But it&#039;s been hard for us to systematize computational insights from that -- so far.  There are some pretty intriguing tidbits in the research about DNA computing and the like, but it will be quite some time before you can, say, implement Spore that way :-).  In general, we need to place bets on both short-term and long-term research, and keep a clear eye on the potential of each.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Some additional thoughts in response to comments:</p>

<p>1) When it first appeared, this post had a different title, which had an unpleasant whiff of MapReduce marketing. I honestly don&#8217;t know how that title got there, but it wasn&#8217;t what I put down when I submitted the text.  Hopefully the fix to the title now gets the flavor a little closer to where it should be.  (It also makes R. McLoy&#8217;s post a little confusing, since it refers to that earlier title which is now gone&#8230;)</p>

<p>2) For me, the most interesting thing about MapReduce (by far!) is that people are so interested in it.  We&#8217;re only going to make progress in parallel computing with paradigms that programmers embrace.  SQL and MapReduce are two successful examples (arguably the only two) in the parallel programming space.  MapReduce is the new kid on that block, relatively speaking. So we should learn from that, channel the energy to more general directions, and make more progress.  You have to bring people along with you on this stuff.</p>

<p>3)  On other languages/models:  Josh is right:  the MS work on LINQ and co. &#8212; and the MSR work on C-Omega &#8212; are some of the most interesting things in the space of embedding declarative nuggets into traditional code, and moving toward a more parallelizable world.  Definitely worth watching.  In the comments on the first post, somebody mentioned Erlang, which is another very interesting and somewhat different design point in the parallel computing space.  Also well worth looking at &#8212; I recommend the Erlang book, it&#8217;s a pleasure to work through, and eye-opening.  Finally, I&#8217;m pretty fired up about our research work on Overlog, and the work we&#8217;re beginning on Lincoln.  These languages push the data-centric style of SQL and MapReduce into a much richer programming model.  The Overlog work is documented to some degree at <a href="https://trac.declarativity.net" rel="nofollow">https://trac.declarativity.net</a>.  The work on Lincoln is still in the pipe, and has an eye on attracting programmers, not just researchers. Watch that space.</p>

<p>4)  Yuvamani: Yep, Google&#8217;s MapReduce is a batch system, because that&#8217;s what Google wanted at the time.  Hadoop copied that.  But it&#8217;s all relatively easy to change, with little impact on the programming model.   BigTable and company are indeed very interesting.  I expect an integration of low-consistency storage with other query processing languages in the next year or so.  We&#8217;ll see which of the DBMS vendors gets there first, or if it happens by gluing a query processor over a system like that.  Facebook&#8217;s Hive is one open-source step in that direction (though it&#8217;s got a pretty thin query processor at this stage).  The question of how much query optimization to do under the covers is kind of an eternal tradeoff between complexity and control.  We&#8217;ll see how it plays out in the emerging application domains.</p>

<p>5) Ronald: I have to agree.  Mother Nature has been amazing people for millenia.  But it&#8217;s been hard for us to systematize computational insights from that &#8212; so far.  There are some pretty intriguing tidbits in the research about DNA computing and the like, but it will be quite some time before you can, say, implement Spore that way :-).  In general, we need to place bets on both short-term and long-term research, and keep a clear eye on the potential of each.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Yuvamani</title>
		<link>http://gigaom.com/2008/11/09/mapreduce-leads-the-way-for-parallel-programming/#comment-911697</link>
		<dc:creator>Yuvamani</dc:creator>
		<pubDate>Mon, 10 Nov 2008 10:55:30 +0000</pubDate>
		<guid isPermaLink="false">http://gigaom.com/?p=27102#comment-911697</guid>
		<description>&lt;p&gt;I have found it funny that people concentrate on Googles Map Reduce when its other product is something people could kill for. I am talking about BigTable obviously.
Map Reduce is good but its use (until now) has been restricted to indexing operations search and data mining being examples. These applications have a unique advantage of NOT requiring realtime answers.&lt;/p&gt;

&lt;p&gt;Most webapps like flickr / youtube / amazon / blogs /twitter  have found that the main problems that they face scaling is caused by the database. SQL while being easy to learn is also opaque in the sense that it is really not known what will happen behind the scenes, Query plans change leaving massive website slowdowns etc etc. Also parallel / distributed databases are not really out there - mysql or oracle are not really parallel on commodity hardware.&lt;/p&gt;

&lt;p&gt;Thus Facebook / Amazon / Google are producing a simple parallel databases for webapps, I am talking of Cassandra / Dynamo and BigTable obviously. There is also CouchDB etc&lt;/p&gt;

&lt;p&gt;Why is this important. You assume that the  MapReduce will interact with parallel database through SQL. The problem is 
1) Mapreduce is not built for interactive apps. RDBMS are built for these classes of apps.
2) There are no good (meaning industry standard) distributed RDBMSs out there to power the parallel future you envision.&lt;/p&gt;

&lt;p&gt;SQL or no SQL is a smaller problem. Any language used here could be SQL like to take advantage of the ease of use and the number of people out there who know SQL. However the open question is what will the parallel db look like is still open.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>I have found it funny that people concentrate on Googles Map Reduce when its other product is something people could kill for. I am talking about BigTable obviously.
Map Reduce is good but its use (until now) has been restricted to indexing operations search and data mining being examples. These applications have a unique advantage of NOT requiring realtime answers.</p>

<p>Most webapps like flickr / youtube / amazon / blogs /twitter  have found that the main problems that they face scaling is caused by the database. SQL while being easy to learn is also opaque in the sense that it is really not known what will happen behind the scenes, Query plans change leaving massive website slowdowns etc etc. Also parallel / distributed databases are not really out there &#8211; mysql or oracle are not really parallel on commodity hardware.</p>

<p>Thus Facebook / Amazon / Google are producing a simple parallel databases for webapps, I am talking of Cassandra / Dynamo and BigTable obviously. There is also CouchDB etc</p>

<p>Why is this important. You assume that the  MapReduce will interact with parallel database through SQL. The problem is 
1) Mapreduce is not built for interactive apps. RDBMS are built for these classes of apps.
2) There are no good (meaning industry standard) distributed RDBMSs out there to power the parallel future you envision.</p>

<p>SQL or no SQL is a smaller problem. Any language used here could be SQL like to take advantage of the ease of use and the number of people out there who know SQL. However the open question is what will the parallel db look like is still open.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: R MacCloy</title>
		<link>http://gigaom.com/2008/11/09/mapreduce-leads-the-way-for-parallel-programming/#comment-911648</link>
		<dc:creator>R MacCloy</dc:creator>
		<pubDate>Mon, 10 Nov 2008 05:22:13 +0000</pubDate>
		<guid isPermaLink="false">http://gigaom.com/?p=27102#comment-911648</guid>
		<description>&lt;p&gt;A couple thoughts:
* SQL isn&#039;t inherently connected to megadata processing, of course. It would be better to speak of specialized data warehousing servers, of which Greenplum and Aster are two (Teradata, the recently-acquired ParAccel, and Vertica being some others). Curt Monash covers these pretty extensively at http://www.dbms2.com
* None of the above systems, as far as I am aware, handle the data volumes currently being routinely processed by M/R systems (at Google, Yahoo, and several other places). Which isn&#039;t to say that they&#039;re not extremely useful; it&#039;s just that the scope is different. They might get better, but I&#039;d question whether they&#039;re more optimizable in general.http://s2.wordpress.com/wp-content/themes/vip/gigaom3.5/images/buttons/comment.gif
* In my experience, SQL (and similar query languages) are more familiar to the vast majority of commercial programmers (although they may not have an in-depth grasp of either normalization or dimensional modeling) than MapReduce-style systems, although perhaps this is changing. People with functional programming seem to ramp up quicker with the model, for obvious reasons.&lt;/p&gt;

&lt;p&gt;Finally, I think the title&#039;s a bit misleading: while you can certainly credit MapReduce with attacking the problems of parallel &lt;em&gt;data processing&lt;/em&gt;, there are plenty of endeavours involving parallel systems where it doesn&#039;t apply. The resurgence of interest in actor systems is probably helping more widely when it comes to tackling concurrency in general.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>A couple thoughts:
* SQL isn&#8217;t inherently connected to megadata processing, of course. It would be better to speak of specialized data warehousing servers, of which Greenplum and Aster are two (Teradata, the recently-acquired ParAccel, and Vertica being some others). Curt Monash covers these pretty extensively at <a href="http://www.dbms2.com" rel="nofollow">http://www.dbms2.com</a>
* None of the above systems, as far as I am aware, handle the data volumes currently being routinely processed by M/R systems (at Google, Yahoo, and several other places). Which isn&#8217;t to say that they&#8217;re not extremely useful; it&#8217;s just that the scope is different. They might get better, but I&#8217;d question whether they&#8217;re more optimizable in general.http://s2.wordpress.com/wp-content/themes/vip/gigaom3.5/images/buttons/comment.gif
* In my experience, SQL (and similar query languages) are more familiar to the vast majority of commercial programmers (although they may not have an in-depth grasp of either normalization or dimensional modeling) than MapReduce-style systems, although perhaps this is changing. People with functional programming seem to ramp up quicker with the model, for obvious reasons.</p>

<p>Finally, I think the title&#8217;s a bit misleading: while you can certainly credit MapReduce with attacking the problems of parallel <em>data processing</em>, there are plenty of endeavours involving parallel systems where it doesn&#8217;t apply. The resurgence of interest in actor systems is probably helping more widely when it comes to tackling concurrency in general.</p>]]></content:encoded>
	</item>
</channel>
</rss>
