<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>GigaOM &#187; high-performance computing</title>
	<atom:link href="http://gigaom.com/tag/high-performance-computing/feed/" rel="self" type="application/rss+xml" />
	<link>http://gigaom.com</link>
	<description></description>
	<lastBuildDate>Thu, 20 Jun 2013 04:17:29 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='gigaom.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/0db8f6557d022075dbbf010c54d46d93?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>GigaOM &#187; high-performance computing</title>
		<link>http://gigaom.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://gigaom.com/osd.xml" title="GigaOM" />
	<atom:link rel='hub' href='http://gigaom.com/?pushpress=hub'/>
		<item>
		<title>DARPA puts $3M into startup pushing big data in Python</title>
		<link>http://gigaom.com/2013/02/05/darpa-puts-3m-into-startup-pushing-big-data-in-python/</link>
		<comments>http://gigaom.com/2013/02/05/darpa-puts-3m-into-startup-pushing-big-data-in-python/#comments</comments>
		<pubDate>Tue, 05 Feb 2013 16:27:34 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Continuum Analytics]]></category>
		<category><![CDATA[high-performance computing]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[scientific computing]]></category>
		<category><![CDATA[Visualization]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=607432</guid>
		<description><![CDATA[As part of its new big-data-focused XDATA initiative, DARPA has invested $3 million in a startup called Continuum Analytics. The company's aim is to extend Python's prowess in scientific computing into the world of big data and analytics.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=607432&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>The Defense Advanced Research Projects Agency has invested $3 million in <a href="http://continuum.io/">Continuum Analytics</a>, an Austin, Texas-based company that’s commercializing some popular methods for doing big data using the Python programming language. The investment comes out of DARPA’s XDATA fund, <a href="http://gigaom.com/2012/03/29/obamas-big-data-plans-lots-of-cash-and-lots-of-open-data/">a $100 million program announced in 2012</a> that aims to develop techniques and software for analyzing large volumes of semi-structured and unstructured data.</p>
<p>Python is already incredibly popular among programmers of all levels in all industries — not just with web programmers (including those at Google), but also with the scientific community. Continuum’s stated mission is “developing the next generation of tools to make Python as powerful and successful for big data and business data analytics as it has been for science, engineering, and scalable computing.”</p>
<p>Continuum’s flagship product, Anaconda, uses <a href="http://discoproject.org/">Disco</a> — a Python-based (and Nokia-developed) take on the Java-based Hadoop platform — and supports popular scientific Python libraries such as <a href="http://www.numpy.org/">NumPy</a> and <a href="http://www.scipy.org/">SciPy</a>. The company also offers a product called Wakari, which is a browser-based analytics environments that it describes as “WordPress, Github, and Youtube for science, engineering, and business data analytics.”</p>
<p>DARPA appears particularly interested in some of the open source efforts that <a href="http://continuum.io/developer-resources.html">Continuum is developing and sponsoring itself</a>. These include Blaze, a technology for writing Python code that can run analytic jobs across distributed systems and different environments; Bokeh, an HTML5 data visualization library designed for big, multidimensional data; and Numba, a compiler for turning Python code into native machine code to improve computing speed.</p>
<p>Considering <a href="http://gigaom.com/2012/03/29/how-federal-money-will-change-the-face-of-big-data/">the Defense Department’s goals with the XDATA program</a> — “seeking the equivalent of radar and overhead imagery for big data” so it can locate a single byte among an ocean of data — putting money behind a company like Continuum makes sense. Its core technologies are designed for the heavy computational needs and data-processing and visualization, but the Python foundation and commercialization efforts make them accessible by a broader array of users.</p>
<p>And anyone interested in how the defense and intelligence agencies are using big data technologies to further their missions really should check out our <a href="http://event.gigaom.com/structuredata/schedule/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=607432+darpa-puts-3m-into-startup-pushing-big-data-in-python&amp;utm_content=dharrisstructure">Structure: Data conference</a> in New York next month. Not only will we be discussing the cutting edge in data analysis, but we’ll have keynote presentations by Central Intelligence Agency CTO Ira “Gus” Hunt and Samantha Ravich, co-chair of the National Commission for Review of R&amp;D Programs for the United States Intelligence Community.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-98072p1.html">Shutterstock user argus</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=607432&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=951766"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=951766" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=607432+darpa-puts-3m-into-startup-pushing-big-data-in-python&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=607432+darpa-puts-3m-into-startup-pushing-big-data-in-python&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2010/07/the-incredible-growing-commercial-hadoop-market/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=607432+darpa-puts-3m-into-startup-pushing-big-data-in-python&utm_content=dharrisstructure">The Incredible, Growing, Commercial Hadoop Market</a></li><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=607432+darpa-puts-3m-into-startup-pushing-big-data-in-python&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/02/05/darpa-puts-3m-into-startup-pushing-big-data-in-python/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/02/shutterstock_119621998-e1360081354123.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/02/shutterstock_119621998-e1360081354123.jpg?w=150" medium="image">
			<media:title type="html">radar</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>
	</item>
		<item>
		<title>AWS beefs up cloud for super-fast data processing</title>
		<link>http://gigaom.com/2013/01/22/aws-beefs-up-cloud-for-super-fast-data-processing/</link>
		<comments>http://gigaom.com/2013/01/22/aws-beefs-up-cloud-for-super-fast-data-processing/#comments</comments>
		<pubDate>Tue, 22 Jan 2013 12:57:04 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Amazon Web Services]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Flash storage]]></category>
		<category><![CDATA[high-performance computing]]></category>
		<category><![CDATA[iaas]]></category>
		<category><![CDATA[in-memory]]></category>
		<category><![CDATA[in-memory database]]></category>
		<category><![CDATA[solid-state drives]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=603066</guid>
		<description><![CDATA[Amazon Web Services has introduced its latest instance -- an 88-core, 240 GB SSD, 244 GB RAM and 10 GbE behemoth designed for real-time analytics with software like SAP HANA, as well as demanding scientific workloads. <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=603066&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Latching onto the trend toward in-memory storage for real-time computing, Amazon Web Services has added a new type of virtual server. The new option — the 10th such available on the EC2 offering — is called the High-Memory Cluster Instance and includes 88 EC2 Compute Units of compute capacity (running on two Intel Xeon E5-2670 processors, two 120 GB solid-state drives of instance storage and 244 GB of RAM.</p>
<p>It’s designed with speed in mind for uses such as in-memory analytics (including on <a href="http://gigaom.com/2013/01/11/sap-marries-transaction-processing-with-analytics-by-putting-business-suite-on-hana/">SAP’s popular HANA platform</a>) and certain scientific workloads that require data delivery to keep up with processing speed. The faster applications can read and write data — and doing so from an in-memory cache or solid-state drives is much faster than doing so from hard drives — the sooner that processors can compute it.</p>
<p>And because the new instance is part of AWS’s Cluster Compute family, multiple instances are connected via a 10 GbE network for speedy server-to-server data transfer. In benchmark tests <a href="http://blog.cloudharmony.com/2010/09/benchmarking-of-ec2s-new-cluster.html">from a site called CloudHarmony in 2010</a>, Cluster Compute instances far <a href="http://pro.gigaom.com/blog/benchmarking-the-cloud-your-mileage-may-vary/?utm_source=cloud&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=603066+aws-beefs-up-cloud-for-super-fast-data-processing&amp;utm_content=dharrisstructure">outperformed anything else on the market</a> (<em>GigaOM Pro subscription req’d)</em> at the time. They’ve also been used to spin up clusters that can compete with traditional supercomputers in terms of sheer performance — <a href="http://www.top500.org/list/2012/11/?page=2">reaching No. 102 on the lastest Top500 list</a> with a peak speed of 354.1 teraflops.</p>
<p>Although, it should be noted, AWS isn’t the only game in town for users wanting this type of beefy core in order to handle their real-time data processing needs. Liquid Web’s Storm cloud service, for example, <a href="http://www.stormondemand.com/servers/ssd.html">offers some high-memory, SSD-powered servers</a> of its own at nearly $1.50 per hour less than what AWS charges (albeit with fewer cores and absent the 10 GbE backbone and list of features that comes along with the AWS platform).</p>
<p>Whatever the cloud, though, ever-higher-performing instances mean new classes of workloads and more business for cloud providers that offer them. Especially as big data and analytics applications pick up steam and <a href="http://gigaom.com/2012/10/24/metamarkets-open-sources-druid-its-in-memory-database/">move from batch to real-time</a>, clouds that can handle demanding users are in a good position.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-287794p1.html">Shutterstock user ssguy</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=603066&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=878825"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=878825" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=603066+aws-beefs-up-cloud-for-super-fast-data-processing&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/10/cloud-and-data-third-quarter-2012-analysis-and-outlook/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=603066+aws-beefs-up-cloud-for-super-fast-data-processing&utm_content=dharrisstructure">Cloud and data third-quarter 2012</a></li><li><a href="http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=603066+aws-beefs-up-cloud-for-super-fast-data-processing&utm_content=dharrisstructure">Takeaways from the second quarter in cloud and data</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=603066+aws-beefs-up-cloud-for-super-fast-data-processing&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/01/22/aws-beefs-up-cloud-for-super-fast-data-processing/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/01/shutterstock_54772192.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/01/shutterstock_54772192.jpg?w=150" medium="image">
			<media:title type="html">fast train</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>
	</item>
		<item>
		<title>Why Amazon thinks big data was made for the cloud</title>
		<link>http://gigaom.com/2012/11/30/why-amazon-thinks-big-data-was-made-for-the-cloud/</link>
		<comments>http://gigaom.com/2012/11/30/why-amazon-thinks-big-data-was-made-for-the-cloud/#comments</comments>
		<pubDate>Fri, 30 Nov 2012 19:53:12 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Amazon Web Services]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[elastic-mapreduce]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[high-performance computing]]></category>
		<category><![CDATA[supercomputers]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=589797</guid>
		<description><![CDATA[According to Amazon Web Services Chief Data Scientist Matt Wood, big data and cloud computing are nearly a match made in heaven. Limitless, on-demand and inexpensive resources open up new worlds of possibility, and a central platform makes it easy for communities to share huge datasets.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=589797&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>For Amazon Web Services Chief Data Scientist Matt Wood, the day isn&#8217;t filled performing data alchemy on behalf of his employer; he&#8217;s entertaining its customers. Wood helps AWS users build big data architectures that use the company&#8217;s cloud computing resources, and then take what he learns about those users&#8217; needs and turn them into products &#8212; such as the Data Pipeline Service and <a href="http://gigaom.com/cloud/amazons-new-data-warehousing-service-takes-aim-at-old-guard-it-giants/">Redshift data warehouse</a> AWS announced this week.</p>
<div id="attachment_589879" class="wp-caption alignleft" style="width: 150px"><a href="http://gigaom2.files.wordpress.com/2012/11/20120820170634_matt-wood.jpg"><img  alt="Matt Wood" src="http://gigaom2.files.wordpress.com/2012/11/20120820170634_matt-wood.jpg?w=708"   class="size-full wp-image-589879" /></a><p class="wp-caption-text">Matt Wood</p></div>
<p>He and I sat down this week at AWS&#8217;s inaugural Re: Invent conference and talked about many things, including what he&#8217;s seen in the field and where cloud-based big data efforts are headed. Here are the highlights.</p>
<h2>The end of contstraint-based thinking</h2>
<p>Not so long ago, computer scientists understood many of the concepts that we now call data science, but limited resources meant they were hamstrung in the types of analysis they could attempt to do. &#8220;That can be very limiting, very constraining when you&#8217;re working with data,&#8221; Wood said.</p>
<p>Now, however, data storage and processing resources are relatively inexpensive and abundant &#8212; so much so that they&#8217;ve actually made the concept of big data possible. Cloud computing has only made those resources cheaper and more abundant. The result, Wood said, is that people working with data are undergoing a shift from that mindset of limiting their data analysis to the resources they have available to one where they think about business needs first.</p>
<p>If they&#8217;re able to get past traditional notions of sampling and days-long processing times,  he added, individuals can focus their attention on what they <em>can</em> do because they have so many resources available. He noted how Yelp gave developers relatively free rein early on the use of Elastic MapReduce, saving them from having to formally request resources just &#8220;to see if the crazy idea [someone] had over coffee is going to play out.&#8221; Yelp was able to spot a shift in mobile traffic volume years ago and get a headstart on its mobile efforts because of that, Wood added.</p>
<h2>Data problems aren&#8217;t just about scale</h2>
<p>Generally speaking, Wood said, solving customers&#8217; data problems isn&#8217;t just about figuring out how to store ever greater volumes for every cheaper prices. &#8220;You don&#8217;t have to be at a petabyte scale in order to get some insight on who&#8217;s using your social game,&#8221; he said.</p>
<p>In fact, access to limitless storage and processing is a solution to one problem that actually creates another. Companies want to keep <em>all</em> the data they generate, and that creates complexity, Wood explained. As that data piles up in various repositories &#8212; perhaps in Amazon&#8217;s S3 and DynamoDB services, as well as on some physical machines with a company&#8217;s data center &#8212; moving it from place to place in order to reuse it becomes a difficult process.</p>
<p>Wood said AWS built its <a href="http://gigaom.com/cloud/amazon-preps-data-pipeline-service-to-automate-and-orchstrate-big-data-workflows/">new Data Pipeline Service</a> in order to address this problem. Pipelines can be &#8220;arbitrarily complex,&#8221; he explained &#8212; from running a simple piece of business logic against data to running whole batches through Elastic MapReduce &#8212; but the idea is to automate the movement and processing so users don&#8217;t have to build these flows themselves and then manually run them.</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/11/aws_data_pipeline_console_1-copy.jpg"><img  alt="aws_data_pipeline_console_1 copy" src="http://gigaom2.files.wordpress.com/2012/11/aws_data_pipeline_console_1-copy.jpg?w=708"   class="aligncenter size-full wp-image-589908" /></a></p>
<h2>The cloud isn&#8217;t just for storing tweets</h2>
<p>People sometimes question the relevance of cloud computing for big data workloads, if only because any data generated on in-house systems has to make its way to the cloud over inherently slow connections. The bigger the dataset, the longer the upload time.</p>
<p>Wood said AWS is trying hard to alleviate these problems. For example, <a href="http://gigaom.com/cloud/is-consumer-content-up-next-for-aspera/">partners such as Aspera</a> and even some open source projects enable customers to move large files at fast speeds over the internet (Wood said he&#8217;s seen consistent speeds of 700 megabits per second). This is also why AWS has eliminated data-transfer fees for inbound data, has turned on parallel uploads for large files and <a href="http://gigaom.com/cloud/amazon-gives-users-dedicated-links-to-its-cloud/">created its Direct Connect program</a> with data center operators that provide dedicated connections to AWS facilities.</p>
<p>And if datasets are too large for all those methods, customers<a href="http://gigaom.com/2010/06/10/when-amazon-resorts-to-snail-mail-theres-a-business-opportunity/"> can just send AWS their physical disks</a>. &#8220;We definitely receive hard drives,&#8221; Wood said.</p>
<h2>Collaboration is the future</h2>
<p>Once data makes its way to the cloud, it opens up entirely new methods of collaboration where researchers or even entire industries can access and work together on shared datasets too big to move around. &#8220;This sort of data space is something that&#8217;s becoming common in fields where there are very large datasets,&#8221; Wood said, citing as an example the <a href="http://www.1000genomes.org/">1000 Genomes project</a> dataset that AWS houses.</p>
<div id="attachment_419764" class="wp-caption aligncenter" style="width: 614px"><a href="http://gigaom2.files.wordpress.com/2011/10/dnanexus.jpg"><img  alt="DNAnexus's cloud-based architecture" src="http://gigaom2.files.wordpress.com/2011/10/dnanexus.jpg?w=604&#038;h=517" height="517" width="604" class="size-large wp-image-419764" /></a><p class="wp-caption-text">DNAnexus&#8217;s cloud-based architecture</p></div>
<p>As we&#8217;ve covered recently, <a href="http://gigaom.com/data/why-data-is-the-key-to-better-medicine-and-maybe-a-cure-for-cancer/">the genetics space is drooling over the promise of cloud computing</a>. The 1000 Genomes database is only 200TB, Wood explained, but very few project leads could get the budget to store that much data and make it accessible to their peers, much less the computation power required to process it. And even in fields such as pharmaceuticals, Amazon CTO Werner Vogels <a href="http://gigaom.com/cloud/amazons-vogels-on-21st-century-apps-and-it-life-events/">told me during an earlier interview</a>, companies are using the cloud to collaborate on certain datasets so companies don&#8217;t have to spend time and money reinventing the wheel.</p>
<h2>No more supercomputers?</h2>
<p>Wood seemed very impressed with the work that AWS&#8217;s high-performance computing customers have been doing on the platform &#8212; work that previously would have been done on supercomputers or other physical systems. Thanks to AWS partner Cycle Computing, he noted, the Morgridge Institute at the University of Wisconsin <a href="http://gigaom.com/cloud/gene-research-in-the-cloud-could-help-cure-diseases-in-the-lab/">was able to perform 116 years worth of computing in just one week</a>. In the past, access to that kind of power would have required waiting in line until resources opened up on a supercomputer somewhere.</p>
<p>The collaborative efforts Wood discussed certainly facilitate this type of extreme computation, as does AWS&#8217;s continuous efforts to beef up its instances with more and more power. Whatever users might need, from the new 250GB RAM on-demand instances to <a href="http://gigaom.com/cloud/amazon-gets-graphic-with-cloud-gpu-instances/">GPU-powered Cluster Compute Instances</a>, Wood said AWS will try to provide it. Because cost sometimes matters, AWS has opened Cluster Compute Instances and Elastic MapReduce to its spot market for buying capacity on the cheap.</p>
<p>But whatever data-intensive workloads organizations want to run, many will always look to the cloud now. Because cloud computing and big data &#8212; Hadoop, especially &#8212; have come of age roughly in parallel with each other, Wood hypothesized, they often go hand-in-hand in people&#8217;s minds.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-641209p1.html">Shutterstock user winui</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=589797&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=562470"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=562470" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=589797+why-amazon-thinks-big-data-was-made-for-the-cloud&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/its-time-for-cloud-security-and-big-data-to-come-together/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=589797+why-amazon-thinks-big-data-was-made-for-the-cloud&utm_content=dharrisstructure">It&#8217;s time for cloud security and big data to come together</a></li><li><a href="http://pro.gigaom.com/2011/10/buying-into-big-data-appliances/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=589797+why-amazon-thinks-big-data-was-made-for-the-cloud&utm_content=dharrisstructure">Buying into big data appliances</a></li><li><a href="http://pro.gigaom.com/2010/12/9-companies-that-pushed-the-infrastructure-discussion-in-2010/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=589797+why-amazon-thinks-big-data-was-made-for-the-cloud&utm_content=dharrisstructure">9 Companies that Pushed the Infrastructure Discussion in 2010</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/11/30/why-amazon-thinks-big-data-was-made-for-the-cloud/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/11/shutterstock_94487455-e1354305591139.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/11/shutterstock_94487455-e1354305591139.jpg?w=150" medium="image">
			<media:title type="html">cloud data</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/11/20120820170634_matt-wood.jpg" medium="image">
			<media:title type="html">Matt Wood</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/11/aws_data_pipeline_console_1-copy.jpg" medium="image">
			<media:title type="html">aws_data_pipeline_console_1 copy</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/10/dnanexus.jpg?w=604" medium="image">
			<media:title type="html">DNAnexus&#039;s cloud-based architecture</media:title>
		</media:content>
	</item>
		<item>
		<title>Now you can simulate your world for (relatively) cheap in the cloud</title>
		<link>http://gigaom.com/2012/09/11/now-you-can-simulate-your-world-for-relatively-cheap-in-the-cloud/</link>
		<comments>http://gigaom.com/2012/09/11/now-you-can-simulate-your-world-for-relatively-cheap-in-the-cloud/#comments</comments>
		<pubDate>Tue, 11 Sep 2012 18:09:02 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Autodesk]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Computer-aided design]]></category>
		<category><![CDATA[green building]]></category>
		<category><![CDATA[Green IT]]></category>
		<category><![CDATA[high-performance computing]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=561450</guid>
		<description><![CDATA[Autodesk now offers a cloud-based version of its simulation software for a fraction of the cost of most similar on-premise options. While large enterprises might not being willing to make the jump yet, some innovative startups are already on board and testing the future.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=561450&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>What if high-end simulation software that used to cost up to $100,000 per year now cost a fraction of that and was actually more functional? What types of new products or techniques might arise from the ability to simulate, on the cheap, the flow of water through a system or stress-test new machines? Computer-aided design specialist Autodesk wants to find out, and has taken its simulation software to the cloud in a new offering called <a href="http://usa.autodesk.com/adsk/servlet/pc/index?siteID=123112&amp;id=19730839">Simulation 360</a>.</p>
<p>Autodesk has actually been moving various products and services to the cloud in some form for a few years, but the new product is different, said Grant Rochelle, the company&#8217;s senior director of manufacturing industry marketing. Most importantly, it&#8217;s cheap. Rochelle said traditional simulation software can cost between $20,000 and $100,000 per user per year, and that&#8217;s not to mention the high-performance systems necessary to run it. Many employees who need it don&#8217;t get it, and many companies are shut out from purchasing it altogether.</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/09/sim_360_lightbox_900x544_mechanical.jpeg"><img  title="SIM_360_Lightbox_900x544_Mechanical" src="http://gigaom2.files.wordpress.com/2012/09/sim_360_lightbox_900x544_mechanical.jpeg?w=300&#038;h=181" alt="" width="300" height="181" class="alignleft size-medium wp-image-561557" /></a>Simulation 360, on the other hand, costs mere thousands. For $3,200 a year, users can run a total of 120 jobs. For $7,200, they get unlimited access. For a limited time, actually, unlimited use is free. And rather than needing separate products for mechanical, fluid and thermal simulation, they&#8217;re all included in the new cloud offering.</p>
<p>Rochelle said the goal of Simulation 360 isn&#8217;t to move big-money customers such as large aerospace companies or defense contractors to the cloud service (&#8220;they will be the last bastion of adopting cloud for most things,&#8221; he said), but to attract entirely new users building innovative new products or doing interesting work. One customer, for example, is working to make hospital operating rooms as clean as the rooms in which microprocessors are fabricated. It&#8217;s simulating the spread of airborne germs through a hospital to prevent certain illnesses that patients contract while receiving treatment for something else.</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/09/scheme_about_technology1b.jpeg"><img  title="scheme_about_technology1b" src="http://gigaom2.files.wordpress.com/2012/09/scheme_about_technology1b.jpeg?w=300&#038;h=139" alt="" width="300" height="139" class="alignright size-medium wp-image-561556" /></a>Another customer, <a href="http://biolitestove.com/">BioLite</a>, has <a href="http://sustainabilityworkshop.autodesk.com/project-gallery/efficient-and-responsible-stove-design">developed a product</a> that turns the excess heat from a fire into electricity so, for example, campers or individuals in third-world countries can ensure their portable electronics are always charged. Rochelle noted that green building and heating and cooling, generally, are areas where the new cloud offering looks to open up a lot of new doors.</p>
<p>However, while offering this technology as a cloud service might be new, the value story is as old as cloud computing itself. Give innovators without huge IT budgets access to resources at a price point never before possible and see what happens. Maybe <a href="http://gigaom.com/cloud/why-instagram-is-likely-moving-on-from-amazons-cloud/">it&#8217;s Instagram</a>, maybe it&#8217;s <a href="http://gigaom.com/cloud/how-climate-corp-is-pitting-big-data-against-mother-nature/">a new approach to climate modeling</a>, maybe it&#8217;s a revolution in energy-efficient HVAC systems, but chances are we&#8217;re better off because of it.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=561450&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=500528"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=500528" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=561450+now-you-can-simulate-your-world-for-relatively-cheap-in-the-cloud&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2009/04/as-devices-converge-chip-vendors-girding-for-a-fight/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=561450+now-you-can-simulate-your-world-for-relatively-cheap-in-the-cloud&utm_content=dharrisstructure">As Devices Converge, Chip Vendors Girding For a Fight</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=561450+now-you-can-simulate-your-world-for-relatively-cheap-in-the-cloud&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2011/08/flash-analysis-the-tech-startup-investment-environment-q3-2011/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=561450+now-you-can-simulate-your-world-for-relatively-cheap-in-the-cloud&utm_content=dharrisstructure">Flash analysis: the tech startup investment environment, Q3 2011</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/09/11/now-you-can-simulate-your-world-for-relatively-cheap-in-the-cloud/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/09/sim_360_lightbox_900x544_mechanical.jpeg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/09/sim_360_lightbox_900x544_mechanical.jpeg?w=150" medium="image">
			<media:title type="html">SIM_360_Lightbox_900x544_Mechanical</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/09/sim_360_lightbox_900x544_mechanical.jpeg?w=300" medium="image">
			<media:title type="html">SIM_360_Lightbox_900x544_Mechanical</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/09/scheme_about_technology1b.jpeg?w=300" medium="image">
			<media:title type="html">scheme_about_technology1b</media:title>
		</media:content>
	</item>
		<item>
		<title>How researchers are letting us uncover secrets in social data</title>
		<link>http://gigaom.com/2012/09/07/as-social-data-grows-researchers-want-to-uncover-its-secrets/</link>
		<comments>http://gigaom.com/2012/09/07/as-social-data-grows-researchers-want-to-uncover-its-secrets/#comments</comments>
		<pubDate>Fri, 07 Sep 2012 16:37:07 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[graph database]]></category>
		<category><![CDATA[high-performance computing]]></category>
		<category><![CDATA[Klout]]></category>
		<category><![CDATA[social media]]></category>
		<category><![CDATA[social networking]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=560115</guid>
		<description><![CDATA[Thanks to the popularity of everything from social media sites such as Twitter to email to mobile phones, it's easier than ever to get data about who's connected to whom. With the right tools, we can apply it solve certain problems faster and easier than ever.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=560115&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>It&#8217;s not easy work turning the Mayberry Police Department into the team from <em>C.S.I.</em>, or turning an idea for a new type of social network analysis into something like Klout on steroids, but those types of transformations are becoming increasingly more possible. The world&#8217;s universities and research institutions are hard at work figuring out ways to make the mountains of social data generated every day more useful and, hopefully, to make us realize there&#8217;s more to social data than <a href="http://gigaom.com/cloud/why-klout-really-matters-money-money-money/">just figuring out whose digital voice is the loudest</a>.</p>
<p>Aspiring heirs to the Klout throne, for example, might look to a project called <a href="http://www.cc.gatech.edu/stinger/index.php">STINGER</a> currently under development at Georgia Tech University. STINGER, which stands for Spatio-Temporal Interaction Networks and Graphs Extensible Representation, is a graph-processing engine that project lead David Bader says is bigger, faster and more flexible than anything currently in use for analyzing social media connections. You provide a shared-memory computing system, and it provides an open-source tool that can help detect relationships between billions of people, places and things as those relationships change over time &#8212; even in real time.</p>
<p>Someone using Facebook data, for example, might write an algorithm using where people or pages would be the vertices and actions (likes, shares, wall posts, etc.) would be the graph&#8217;s edges. One relatively easy application, Bader explained, would be to analyze how activity around particular people is increasing, decreasing or changing, therefore indicating changes in their importance or the growth of new communities.</p>
<h2>We&#8217;ll do the hard work</h2>
<p>Writing an algorithm to perform that kind of analysis isn&#8217;t really the problem, though &#8212; it&#8217;s writing one that can scale into the billions of vertices and edges and <a href="http://highscalability.com/blog/2010/3/30/running-large-graph-algorithms-evaluation-of-current-state-o.html">still perform quickly enough to be useful</a>. An algorithm that generates one false positive in a million isn&#8217;t so bad when you&#8217;re dealing with tens of thousands of items, Bader explained, but it gets to be a big problem when you&#8217;re talking about billions of items against which it&#8217;s running.</p>
<p>There are <a href="http://en.wikipedia.org/wiki/Graph_database">dozens of open source graph databases available</a>, including popular offerings <a href="http://gigaom.com/cloud/springsource-links-up-with-neo-technology-on-nosql/">such as Neo4j</a> and <a href="http://gigaom.com/cloud/twitters-success-pulls-23-year-old-objectivity-into-nosql/">InfiniteGraph</a>, but he said, &#8220;Our lab focuses on algorithms that run fast on massive data sets and that are more accurate than what is traditionally done in social media.&#8221;</p>
<div id="attachment_560493" class="wp-caption alignleft" style="width: 196px"><a href="http://gigaom2.files.wordpress.com/2012/09/dbader2007-small.jpg"><img  title="dbader2007-small" src="http://gigaom2.files.wordpress.com/2012/09/dbader2007-small.jpg?w=708" alt=""   class="size-full wp-image-560493" /></a><p class="wp-caption-text">David Bader</p></div>
<p>Bader&#8217;s team recently presented a paper detailing a social media algorithm running atop STINGER that ran 100 times faster than some previous approaches because the system stores the graph&#8217;s previous state and only performs the minimal amount of processing necessary as new edges are inserted. This is in contrast to traditional approaches that re-process the entire graph every time there&#8217;s a change.</p>
<p>That being said, Georgia Tech isn&#8217;t entirely alone analyzing massive amounts of social data with graph databases. Google&#8217;s Pregel had <a href="http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html">already scaled to billions of vertices and edges</a> as of 2009, and Facebook is currently <a href="http://www.slideshare.net/Hadoop_Summit/processing-edges-on-apache-giraph">analyzing more than a billion edges</a> using <a href="http://incubator.apache.org/giraph/">Apache Giraph</a> (an open source, Hadoop-based Pregel implementation). But those cases &#8212; both companies are loaded with smart engineers, data scientists and powerful infrastructure &#8212; just underscore the importance of what researchers like Bader are building and releasing as open source.</p>
<h2>Forget social media, solve real problems</h2>
<p>But social data isn&#8217;t just useful for figuring out who&#8217;s influential on Twitter or Facebook &#8212; it also can be used to solve some real problems. Bader said he&#8217;s already used graph processing with Twitter data to determine who was leading resistance units during Egypt&#8217;s recent revolution. &#8221;Anywhere I can look at connections between entities,&#8221; he said, &#8220;these approaches are available.&#8221;</p>
<p>Indeed. On Wednesday afternoon, for example, a group of researchers from the University of Alberta, University of Connecticut and University of California-Merced unveiled a new data-based method that could make it faster, easier and less expensive to root out culprits in fraud cases.</p>
<p>The technique uses a method called the Steiner tree to analyze the connections &#8211;social, business, familial, etc. &#8212; between the people involved in a given case of fraud. The algorithm is able to determine the shortest path between two objects, which the researchers posit is especially applicable to fraud investigations &#8212; the person with the shortest path between himself and the crime is probably the culprit (or at least a solid suspect).</p>
<p>The fraud researchers&#8217; paper follows the publication in August of a <a href="http://gigaom.com/data/an-algorithm-for-tracking-viruses-and-twitter-rumors-to-their-source/">method for determining the source of everything</a> from a disease outbreak to a Twitter rumor by tracking its spread across a complex network over time. Their algorithm, the paper&#8217;s authors claim, could be particularly effective for combating cybercrime by tracking computer viruses back to their sources. The more connections (in the case of social data), or observers, a particular point has, the fewer that are needed to track down the source point.</p>
<p>However, all the algorithms and data frameworks in the world probably won&#8217;t make too big a difference until they&#8217;re turned into products that actually work on real-world situations. As the University of Alberta&#8217;s Ray Patterson pointed out in a <a href="http://www.news.ualberta.ca/article.aspx?id=598C1DAC742446ED84B477CB8FA05324">press release detailing the fraudster-detection algorithm</a>, &#8221;It might take several years or many years before anyone picks it up. But it&#8217;s a good thing if we can point people towards what&#8217;s useful.&#8221;</p>
<p>Georgia Tech&#8217;s Bader said DARPA, Intel, Sandia National Laboratory and other research institutions have already used STINGER to tackle some complex data sets, and he suspects a strong commercial interest, as well. If a company is willing to take STINGER from a project into a product, it could bring the project&#8217;s scale and speed to everything from analyzing customer interactions to monitoring the changing nature of criminal networks, Bader said. Considering the desire from companies of all types to extract some meaning from social data, I have to think someone will give it a shot.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-219685p1.html">Shutterstock user 3DProfi</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=560115&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=444167"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=444167" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=560115+as-social-data-grows-researchers-want-to-uncover-its-secrets&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2011/11/connected-world-the-consumer-technology-revolution/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=560115+as-social-data-grows-researchers-want-to-uncover-its-secrets&utm_content=dharrisstructure">Connected world: the consumer technology revolution</a></li><li><a href="http://pro.gigaom.com/report/best-practices-in-optimizing-content-for-social-engagement/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=560115+as-social-data-grows-researchers-want-to-uncover-its-secrets&utm_content=dharrisstructure">Best practices in optimizing content for social engagement</a></li><li><a href="http://pro.gigaom.com/2012/01/12-tech-leaders-resolutions-for-2012/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=560115+as-social-data-grows-researchers-want-to-uncover-its-secrets&utm_content=dharrisstructure">12 tech leaders’ resolutions for 2012</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/09/07/as-social-data-grows-researchers-want-to-uncover-its-secrets/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/09/shutterstock_99933086.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/09/shutterstock_99933086.jpg?w=150" medium="image">
			<media:title type="html">Network model</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/09/dbader2007-small.jpg" medium="image">
			<media:title type="html">dbader2007-small</media:title>
		</media:content>
	</item>
		<item>
		<title>Head to head: Amazon cloud beats Google on video benchmark</title>
		<link>http://gigaom.com/2012/07/24/for-some-workloads-googles-cloud-cant-yet-hang-with-aws/</link>
		<comments>http://gigaom.com/2012/07/24/for-some-workloads-googles-cloud-cant-yet-hang-with-aws/#comments</comments>
		<pubDate>Tue, 24 Jul 2012 19:45:25 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Amazon Web Services]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[high-performance computing]]></category>
		<category><![CDATA[iaas]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[video transcoding]]></category>
		<category><![CDATA[Zencoder]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=545923</guid>
		<description><![CDATA[Benchmarking results from Zencoder show that Amazon Web Services beats out Google's Compute Engine in a test of a specific CPU-intensive workload. Compute Engine's performance was hindered by a lack of HPC instances, which Google could one day add. But it's nice to see real-world comparisons.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=545923&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>According benchmark tests by video-transcoding startup <a href="http://zencoder.com">Zencoder</a>, Google&#8217;s new Compute Engine infrastructure-as-a-service offering has some work to do if it wants to catch up with Amazon Web Services on the performance front. But the offering, still in &#8220;limited preview&#8221; mode and far from fully baked, should be able to make the necessary adjustments rather easily.</p>
<p>The results, detailed in <a href="http://blog.zencoder.com/2012/07/23/first-look-at-google-compute-engine-for-video-transcoding/">a blog post on Tuesday</a>, suggest that Google Compute Engine&#8217;s real problem right now might just be a lack of high-performance instances. Its current workhorse &#8212; an 8-core Intel Sandy Bridge instance with 30GB of memory and 22 compute units &#8212; can&#8217;t hang with the Amazon Cluster Compute Instances that Zencoder <a href="http://gigaom.com/cloud/zencoder-raises-2m-for-cloud-based-video-encoding/">uses for its transcoding workloads</a>. The largest of those is a 16-core dual-CPU Intel Xeon instance providing 60.5GB of memory and 88 compute units running atop a 10 Gigabit Ethernet platform.</p>
<p>As Zencoder ramped up the workloads, the performance differences became clear:</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/07/gce-vs-ec2-copy.jpg"><img  title="GCE-vs-EC2 copy" src="http://gigaom2.files.wordpress.com/2012/07/gce-vs-ec2-copy.jpg?w=708" alt=""   class="aligncenter size-full wp-image-545947" /></a></p>
<p>Compute Engine didn&#8217;t fare any better when Zencoder tested transfer speeds between the cloud storage platform and the cloud computing platform. Whereas rates between Amazon S3 and Amazon EC2 topped out at 1,458.32 Mbps, the rate between Google Cloud Storage and Google Compute Engine peaked at 202.6 Mbps. In fact, the post&#8217;s author writes, &#8220;it appears that GCS is slower than S3, and GCE transfer is slower than EC2, such that even if you’re using Google for compute, you may be better off using S3 for storage.&#8221;</p>
<p>While the results are interesting because they&#8217;re the first real apple-to-apples comparison I&#8217;ve seen between Compute Engine and EC2 (BuildFax cloud architect Joe Emison&#8217;s pre-release benchmarks were pulled from <a href="http://www.informationweek.com/news/cloud-computing/infrastructure/240002899?pgno=1">his Compute Engine review on InformationWeek</a>), they need to taken as what they are. They are, as Zencoder points out, tests of a specific CPU-bound workload &#8212; the performance of which Google could improve by adding higher-powered instances &#8212; and don&#8217;t take into account the difficulties of running at massive scale &#8212; a capability Google touted <a href="http://http//gigaom.com/cloud/taking-on-amazon-google-launches-compute-on-demand-rival-to-ec2/">when it launched Compute Engine in June</a>.</p>
<p>And, the author notes, Compute Engine is generally a quality platform, &#8220;especially [with regard to] disk I/O, boot times, and consistency, which historically haven’t been EC2′s strong suit.&#8221;</p>
<p>This might actually be the more-important measure for most potential Compute Engine users. As <a href="http://gigaom.com/cloud/why-google-compute-engine-may-be-attractive-to-amazon-web-services-users/">GigaOM contributor James Urquhart wrote recently</a>, &#8220;If Google can deliver a service that eliminates most of the I/O and network performance inconsistencies that AWS customers currently experience, I can guarantee you there are many major compute customers of AWS that will want to give Compute Engine a test run.&#8221;</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=545923&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=92170"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=92170" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=545923+for-some-workloads-googles-cloud-cant-yet-hang-with-aws&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2013/01/cloud-and-data-fourth-quarter-2012-analysis/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=545923+for-some-workloads-googles-cloud-cant-yet-hang-with-aws&utm_content=dharrisstructure">The fourth quarter of 2012 in cloud</a></li><li><a href="http://pro.gigaom.com/2012/12/how-direct-access-solutions-can-speed-up-cloud-adoption/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=545923+for-some-workloads-googles-cloud-cant-yet-hang-with-aws&utm_content=dharrisstructure">How direct-access solutions can speed up cloud adoption</a></li><li><a href="http://pro.gigaom.com/2012/12/cloud-computing-2013-how-to-navigate-without-a-map/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=545923+for-some-workloads-googles-cloud-cant-yet-hang-with-aws&utm_content=dharrisstructure">Cloud computing 2013: how to navigate without a map</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/07/24/for-some-workloads-googles-cloud-cant-yet-hang-with-aws/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/07/gce-vs-ec2-copy.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/07/gce-vs-ec2-copy.jpg?w=150" medium="image">
			<media:title type="html">GCE-vs-EC2 copy</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/07/gce-vs-ec2-copy.jpg" medium="image">
			<media:title type="html">GCE-vs-EC2 copy</media:title>
		</media:content>
	</item>
		<item>
		<title>Straight outta Stanford, Bina wants to remake genome analysis</title>
		<link>http://gigaom.com/2012/04/30/straight-outta-stanford-bina-wants-to-remake-genome-analysis/</link>
		<comments>http://gigaom.com/2012/04/30/straight-outta-stanford-bina-wants-to-remake-genome-analysis/#comments</comments>
		<pubDate>Tue, 01 May 2012 00:45:44 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[appistry]]></category>
		<category><![CDATA[Apple]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Bina Technologies]]></category>
		<category><![CDATA[DNAnexus]]></category>
		<category><![CDATA[Genomics]]></category>
		<category><![CDATA[high-performance computing]]></category>
		<category><![CDATA[humane genome]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[Science]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=515950</guid>
		<description><![CDATA[Bina Technologies emerged from stealth mode last week and is bringing an Apple-like business model to genomics. The company relies on its Bina Box to make genome analysis faster than ever before possible without the benefit of having a supercomputer and a research network on hand. <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=515950&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom2.files.wordpress.com/2012/04/dna-sculpture.jpg"><img  title="dna sculpture" src="http://gigaom2.files.wordpress.com/2012/04/dna-sculpture.jpg?w=300&#038;h=269" alt="" width="300" height="269" class="alignleft size-medium wp-image-516125" /></a>The advent of the $1,000 genome is bound to revolutionize researchers&#8217; understanding of human health, but ever-lower prices on DNA sequencing are only half the battle. Researchers <a href="http://gigaom.com/cloud/as-genomics-pushes-big-data-limits-cloud-could-save-the-day/">also need to analyze the raw data that comes off sequencing machines</a>, which can range between many gigabytes to terabytes and can cost well more than the sequencing itself. That&#8217;s why a collection of startups are trying to stake their claims as essential parts of the genomics ecosystem by ensuring that analysis doesn&#8217;t become the bottleneck that slows progress.</p>
<h2>Domain expertise, statistics and HPC, unite!</h2>
<p>The latest is <a href="http://www.binatechnologies.com/">Bina Technologies</a>, which just emerged from stealth mode last week and is bringing an Apple-like business model to genomics. The company, which grew out of a research project at Stanford University, relies on its Bina Box appliance to make genome analysis faster than typically possible <a href="http://gigaom.com/cloud/fighting-cancer-at-100-gigabits-per-second/">without the benefit of having a supercomputer and a research network on hand</a>.</p>
<p>According to Bina CEO Narges Bani Asadi, who co-founded the company while completing her Ph.D. at Stanford, the appliance came about as part of a mission to solve a disconnect among the stakeholders in cancer research. Improving the analysis of cancer data required input from medical researchers, statisticians and high-performance computing experts, &#8220;but people are not speaking even the same language,&#8221; she said. While they&#8217;re all headed in the same direction, their paths rarely converge to harness peak velocity.</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/04/bina_box_01.jpg"><img  title="bina_box_01" src="http://gigaom2.files.wordpress.com/2012/04/bina_box_01.jpg?w=300&#038;h=170" alt="" width="300" height="170" class="alignright size-medium wp-image-516123" /></a>Bani Asadi and her team solved that problem by developing a system that merged the three areas into one. With Bina, researchers can develop analysis pipelines that are optimized at both the algorithmic and silicon levels to run optimally across a mix of CPUs, GPUs and FPGAs, all of which are present within the purpose-built box. Applications are getting what they need in order to perform their best, and Bina says results can be processed 10 to 100 times faster (hours instead of days) than running jobs on the Amazon Web Services cloud, which has proven very popular for genomics workloads <a href="http://gigaom.com/cloud/amazon-gets-graphic-with-cloud-gpu-instances/">thanks to its supercomputer-like performance</a>.</p>
<p>That being said, a chart Bina uses to illustrate the performance difference compares the Bina Box to a single eight-core AWS instance rather than a cluster of those high-performance instances.</p>
<h2>The Apple analogy</h2>
<p>Bani Asadi answers the inevitable question of whether research centers will want to special appliances instead of using the cloud or generic servers by pointing to Apple. That company&#8217;s devices and computers can be a little more expensive and more difficult to tinker with than alternatives, but they&#8217;re also designed specifically with Apple&#8217;s operating system and applications in mind. It&#8217;s an analogy other companies, <a href="http://gigaom.com/cloud/ex-nasa-cto-builds-cloud-dream-team-launches-nebula/">such as cloud computing startup Nebula</a>, also use to justify their appliance-based businesses.</p>
<p>Not that Bina dismisses the cloud. If Bina&#8217;s software is the Mac OS to the Bina Box&#8217;s iMac, the Bina Cloud is the company&#8217;s iCloud. Once the box processes the raw sequencing data and compresses it into a smaller volume (up to 1,000 times smaller), the data is shipped to the Bina Cloud where it&#8217;s stored and can be easily accessed and shared. Actually, Bani Asadi said that&#8217;s where the most-innovative research likely will take place. While Bina&#8217;s appliance handles the necessary first steps of  genome analysis (e.g., determining how it&#8217;s unique), it&#8217;s the resulting data sets that are accessible by doctors, specialists and others to really make sense of it all.</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/04/bina_process_01.jpg"><img  title="bina_process_01" src="http://gigaom2.files.wordpress.com/2012/04/bina_process_01.jpg?w=708" alt=""   class="aligncenter size-full wp-image-516124" /></a></p>
<p>Presumably, Bina is referring to companies such as DNAnexus when it compares its solution to entirely cloud-based approaches. DNAnexus is another Silicon Valley startup trying to democratize genome analysis, <a href="http://gigaom.com/cloud/dnanexus-cloudant-biotech-deals/">relying on the processing power and centralized nature of the cloud</a> to serve as a platform for analyzing and collaborating on DNA data. Another startup, St. Louis-based Appistry, has taken a somewhat different approach, <a href="http://gigaom.com/2012/03/22/appistry-structure-data-2012/">building its own high-powered cloud service</a> and developing its own algorithms specially designed for genome analysis.</p>
<h2>In the end, it&#8217;s all about the data</h2>
<p>Regardless of which approach a researcher takes to solving the problem of sequenced genome data (they all have unique benefits), the underlying trend driving innovation is the deluge of genome data itself. Bani Asadi said the biggest difference now compared with past efforts to analyze health data is that we have so much available. There are 30,000 fully sequenced genomes available right now, and some predict there will be 10 million in five years.</p>
<p>That means researchers can study DNA at a much more-granular level the previously possible, Bani Asadi said, and they can analyze findings across huge data sets to identify previously undetectable patterns. Especially with cancer, she said, each case is relatively unique, shaped by many conditions and factors. If we&#8217;re going to make significant progress on treating it, we&#8217;ll need to know exactly what&#8217;s going on in any given case and how similar cases have played out. The more data, the more accurate the diagnosis and treatment.</p>
<p><em>Feature image <a href="http://www.geograph.org.uk/photo/2848513">courtesy of Keith Edkins</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=515950&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=241255"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=241255" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=515950+straight-outta-stanford-bina-wants-to-remake-genome-analysis&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=515950+straight-outta-stanford-bina-wants-to-remake-genome-analysis&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2012/01/infrastructure-q4-big-data-gets-bigger-and-saas-startups-shine/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=515950+straight-outta-stanford-bina-wants-to-remake-genome-analysis&utm_content=dharrisstructure">Infrastructure Q4: Big data gets bigger and SaaS startups shine</a></li><li><a href="http://pro.gigaom.com/2011/11/dissecting-the-data-5-issues-for-our-digital-future/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=515950+straight-outta-stanford-bina-wants-to-remake-genome-analysis&utm_content=dharrisstructure">Dissecting the data: 5 issues for our digital future</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/04/30/straight-outta-stanford-bina-wants-to-remake-genome-analysis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/04/dna-sculpture.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/04/dna-sculpture.jpg?w=150" medium="image">
			<media:title type="html">dna sculpture</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/04/dna-sculpture.jpg?w=300" medium="image">
			<media:title type="html">dna sculpture</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/04/bina_box_01.jpg?w=300" medium="image">
			<media:title type="html">bina_box_01</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/04/bina_process_01.jpg" medium="image">
			<media:title type="html">bina_process_01</media:title>
		</media:content>
	</item>
		<item>
		<title>Metamarkets, DataPop and more! Investors show big data some love</title>
		<link>http://gigaom.com/2012/04/26/metamarkets-data-pop-and-more-investors-show-big-data-some-love/</link>
		<comments>http://gigaom.com/2012/04/26/metamarkets-data-pop-and-more-investors-show-big-data-some-love/#comments</comments>
		<pubDate>Thu, 26 Apr 2012 14:28:56 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[DataPop]]></category>
		<category><![CDATA[high-performance computing]]></category>
		<category><![CDATA[Metamarkets]]></category>
		<category><![CDATA[paraccel]]></category>
		<category><![CDATA[startup funding]]></category>
		<category><![CDATA[storage]]></category>
		<category><![CDATA[Terascala]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=514610</guid>
		<description><![CDATA[If you don't think venture capitalists and other investors love all things big data, think again. In the past three days alone, companies claiming some connection to big data -- either analyzing and/or storing large volumes of data -- have announced at least $56 million in new funding.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=514610&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom2.files.wordpress.com/2012/04/hummingbird.jpg"><img  title="hummingbird" src="http://gigaom2.files.wordpress.com/2012/04/hummingbird.jpg?w=300&#038;h=218" alt="" width="300" height="218" class="alignleft size-medium wp-image-514688" /></a>If you don&#8217;t think venture capitalists and other investors love all things <em>big data</em>, think again. In the past three days alone, companies claiming some connection to big data &#8212; either analyzing and/or storing large volumes of data &#8212; have announced at least $56 million in new funding. On Tuesday, it was <a href="http://gigaom.com/cloud/datapop-scores-7m-for-custom-built-ads/">online advertising specialist DataPop with $7 million</a> and <a href="http://www.businesswire.com/news/home/20120424005653/en/Terascala-Announces-14M-Series-Funding-Led-Strategic">big-data storage engine Terascala with $14 million</a>; Wednesday <a href="http://www.paraccel.com/news/press-releases.php?acc=250412a#.T5lEhMRYv_4">brought $20 million more for analytic database ParAccel</a>; and on Thursday morning, Metamarkets announced a $15 million round led by Khosla Ventures.</p>
<p>For the San Francisco-based Metamarkets (see disclosure), the Series B round brings its total to $23.5 million, and represents some serious confidence in <a href="http://gigaom.com/cloud/metamarkets-takes-its-big-data-in-the-cloud-message-to-the-masses/">the company&#8217;s cloud-based analytics platform</a>. Investors should be confident: the company has a handful of customers, but they&#8217;re rather large, and it has already spurned acquisition offers from some household names in the IT world, including Twitter.</p>
<p>ParAccel is in another league altogether, having now raised nearly $100 million over the past several years, and having just completed a first quarter that saw 500 percent year-over-year revenue growth. It&#8217;s one of the best-known independent analytic database providers around after Greenplum, Netezza and Vertica all got snatched up by large vendors in the past few years. Last summer, Amazon <a href="http://gigaom.com/cloud/amazon-invests-big-in-big-data-startup/">came on board as a strategic investor</a>, a move rife with possibility considering persistent rumors of an Amazon Web Services analytic offering.</p>
<p>However, big data isn&#8217;t constrained to companies building the technology. Often times, as with DataPop, it&#8217;s the consumers of big data technologies that are the most interesting. At just under $9 million, it&#8217;s hardly the highest-backed user of big data, even in the marketing space, but the story is the same across the board. In theory, big data means more-accurate and dynamic ad targeting, <a href="http://gigaom.com/cloud/5-companies-turning-your-data-into-dollars/">which means more money for everyone</a>.</p>
<p>Terascala plays in an entirely different field &#8212; the high-performance computing field &#8212; where the speed of big data systems becomes critical. Terascala and companies of its ilk don&#8217;t analyzing anything, rather they feed enormous scientific (and other) data sets to high-performance processors without creating a bottleneck. That means their research, government, media and financial services users can do existing analyses much faster, and can even do entirely new types of analysis that used to be slowed by a lack of performance and a lack of analytic tools (e.g., Hadoop).</p>
<p>When buzzwords reach a certain level of ubiquity, they start to mean both everything and nothing, which is arguably the case for <em>big data </em>right now. But I don&#8217;t see it as too big a problem. I see it as a new understanding of the power of data, which &#8212; harnessed correctly &#8212; is immense. For investors, it&#8217;s not a question of whether to put money behind big data, but of figuring out which of the dozens pitching themselves as big data companies are actually doing it right.</p>
<p><em>Image courtesy of <a href="http://www.flickr.com/photos/34745138@N00/3642127084">Flickr user kaibara87</a>.</em></p>
<p><em><strong>Disclosure:</strong> Metamarkets is a portfolio company of True Ventures, which is also an investor in GigaOM. Om Malik is also a venture partner at True.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=514610&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=86325"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=86325" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=514610+metamarkets-data-pop-and-more-investors-show-big-data-some-love&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=514610+metamarkets-data-pop-and-more-investors-show-big-data-some-love&utm_content=dharrisstructure">Takeaways from the second quarter in cloud and data</a></li><li><a href="http://pro.gigaom.com/2009/12/will-the-real-time-web-bring-high-performance-to-a-system-near-you/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=514610+metamarkets-data-pop-and-more-investors-show-big-data-some-love&utm_content=dharrisstructure">Will the Real-Time Web Bring High Performance to a System Near You?</a></li><li><a href="http://pro.gigaom.com/report/the-new-economics-of-enterprise-data-warehousing/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=514610+metamarkets-data-pop-and-more-investors-show-big-data-some-love&utm_content=dharrisstructure">How data warehousing is now a cost-effective solution for businesses</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/04/26/metamarkets-data-pop-and-more-investors-show-big-data-some-love/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/04/hummingbird.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/04/hummingbird.jpg?w=150" medium="image">
			<media:title type="html">hummingbird</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/04/hummingbird.jpg?w=300" medium="image">
			<media:title type="html">hummingbird</media:title>
		</media:content>
	</item>
		<item>
		<title>Cycle Computing spins up 50K core Amazon cluster</title>
		<link>http://gigaom.com/2012/04/19/cycle-computing-spins-up-50k-core-amazon-cluster/</link>
		<comments>http://gigaom.com/2012/04/19/cycle-computing-spins-up-50k-core-amazon-cluster/#comments</comments>
		<pubDate>Thu, 19 Apr 2012 13:00:16 +0000</pubDate>
		<dc:creator>Barb Darrow</dc:creator>
				<category><![CDATA[Amazon]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Cycle Computing]]></category>
		<category><![CDATA[high-performance computing]]></category>
		<category><![CDATA[hpc]]></category>
		<category><![CDATA[Ramy Farid]]></category>
		<category><![CDATA[Schrodinger]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=512542</guid>
		<description><![CDATA[Working with Schrödinger, which specializes in computational drug design, Cycle Computing built a 50k-core AWS cluster that screened 21 million compounds in less than three hours. The cluster enabled the company to use a much more accurate screening process than other technology. <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=512542&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom2.files.wordpress.com/2012/04/nagascreen-shot-2012-04-19-at-7-06-07-am.jpg"><br />
<img  title="nagaScreen Shot 2012-04-19 at 7.06.07 AM" src="http://gigaom2.files.wordpress.com/2012/04/nagascreen-shot-2012-04-19-at-7-06-07-am.jpg?w=300&#038;h=224" alt="" width="300" height="224" class="alignleft size-medium wp-image-512545" /></a> For those who doubt that public cloud infrastructure can handle the toughest high-performance computing (HPC) jobs, <a href="http://cyclecomputing.com/">Cycle Computing</a> and <a href="http://www.schrodinger.com/">Schrödinger</a> have some news for you. The two companies used a 50,000-core Amazon cluster to run a complex screening process to locate compounds that could pay off in new cancer drugs.</p>
<p>The problem for computational chemists and biologists is there&#8217;s a trade-off between accuracy and speed. This  &#8221;Naga&#8221; compute environment built by Cycle atop the Amazon cloud eased that tradeoff, said Ramy Farid, president of New York-based Schrödinger, which specializes in computational drug design.</p>
<p>&#8220;We&#8217;ve got these really accurate methods but they would take months on a normal cluster. The problem is we want to do the best possible science fast,&#8221; Farid said in an interview this week.</p>
<p>That&#8217;s where Cycle Computing comes in. The company has made its name building <a href="http://gigaom.com/cloud/meet-the-new-breed-of-hpc-vendor/">high-performance computing</a> atop AWS infrastructure and has previously deployed 10,000 and 30,000 core clusters on the cloud. For Schrödinger, it upped the ante to 50,000 cores. The alternative in this case would be for Schrödinger to build its own 50,000-core cluster or log time on a supercomputer, said Cycle Computing CEO Jason Stowe.</p>
<h2>HPC for rent</h2>
<p>&#8220;Practically speaking, the latter is impractical for a for-profit company and is generally restrictive. If you&#8217;re an academic wanting time on the San Diego Super Computer, for example, you&#8217;ll have months of wait time to get approved and even then you get a limited-time window &#8212; so if something with your software is not working at that time, you&#8217;re out of luck.&#8221; And, big supercomputers don&#8217;t tend to run the kinds of software these companies want to run. &#8220;The beauty of the cloud is it runs your flavor of Linux and other software,&#8221; Farid said.</p>
<p>On the other hand, building a 50,000 core cluster could easily cost $20 million to $30 million, he said. The Schrödinger project, by contrast, cost about $4,850 per hour to run.</p>
<p>For this trial, the cluster had access to all regions of AWS and used all of them in some capacity. The application used the various EC2 APIs to provision the resources.</p>
<p>&#8220;All the compound data for analysis was uploaded into S3 [Amazon's Simple Storage System]. The cluster was provisioned along side it and grabbed data from S3 to run the calculations and then pushed it back into S3,&#8221; said Crowe, who will be talking about the implementation Thursday at an Amazon event in New York. The application also took advantage of some Amazon IP address and DNS capabilities.</p>
<h2><a href="http://gigaom2.files.wordpress.com/2012/04/naga2.jpg"><img  title="naga2" src="http://gigaom2.files.wordpress.com/2012/04/naga2.jpg?w=300&#038;h=224" alt="" width="300" height="224" class="alignleft size-medium wp-image-512568" /></a>Finding a needle in a very big haystack</h2>
<p>Farid could not talk much about the specific research purpose other than to say the goal was to find compounds that could be developed into drugs that fight a type of cancer.  The use of the huge cluster enabled Schrödinger to use the more accurate version of its Glide software &#8212; in the past it would have had to use the less accurate screen and perhaps miss some compounds that could be extremely useful.</p>
<p>&#8220;The problem they&#8217;re solving is amazing. It&#8217;s like the target is a lock and the confirmation is a key &#8212; what Ramy&#8217;s software does is let you simulate 21 million keys potentially matching that lock. There are typically a number of false negatives and the reason is you can&#8217;t sample all the orientations [of key to lock] properly so it will not show a match that could actually be a match. You could miss amazing drugs that could have an impact,&#8221; Crowe said.</p>
<p>The use of this huge cluster enabled Schrödinger to run its much more accurate, but much more compute-intensive version of its screening software and find compound candidates the other software may have missed.</p>
<p>The result of the three-hour run? &#8220;We identified  a number of compounds that we will purchase and test,&#8221; Farid said.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=512542&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=909523"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=909523" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=512542+cycle-computing-spins-up-50k-core-amazon-cluster&utm_content=gigabarb">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/04/sector-roadmap-hadoop-platforms-2012/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=512542+cycle-computing-spins-up-50k-core-amazon-cluster&utm_content=gigabarb">2012: The Hadoop infrastructure market booms</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=512542+cycle-computing-spins-up-50k-core-amazon-cluster&utm_content=gigabarb">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2011/07/newnet-q2-google-closes-the-quarter-with-a-bang/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=512542+cycle-computing-spins-up-50k-core-amazon-cluster&utm_content=gigabarb">NewNet Q2: Google closes the quarter with a bang</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/04/19/cycle-computing-spins-up-50k-core-amazon-cluster/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/04/naga3-e1334851409223.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/04/naga3-e1334851409223.jpg?w=150" medium="image">
			<media:title type="html">naga3</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/4af03439988d64f816da72496325cb73?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigabarb</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/04/nagascreen-shot-2012-04-19-at-7-06-07-am.jpg?w=300" medium="image">
			<media:title type="html">nagaScreen Shot 2012-04-19 at 7.06.07 AM</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/04/naga2.jpg?w=300" medium="image">
			<media:title type="html">naga2</media:title>
		</media:content>
	</item>
		<item>
		<title>How federal money will spur a new breed of big data</title>
		<link>http://gigaom.com/2012/03/29/how-federal-money-will-change-the-face-of-big-data/</link>
		<comments>http://gigaom.com/2012/03/29/how-federal-money-will-change-the-face-of-big-data/#comments</comments>
		<pubDate>Thu, 29 Mar 2012 22:16:55 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Academia]]></category>
		<category><![CDATA[appistry]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[DARPA]]></category>
		<category><![CDATA[Department of Defense]]></category>
		<category><![CDATA[DoE]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[federal government]]></category>
		<category><![CDATA[genetics]]></category>
		<category><![CDATA[Genome]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[high-performance computing]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[supercomputers]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=505263</guid>
		<description><![CDATA[By pumping hundreds of millions of dollars into big data research and development, the Obama administration thinks it can push the current state of the art well beyond what's possible today, and into entirely new research areas. It's a noble goal, but also a necessary one. <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=505263&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom2.files.wordpress.com/2012/03/istock_000001007494xsmall1.jpg"><img  title="istock_000001007494xsmall" src="http://gigaom2.files.wordpress.com/2012/03/istock_000001007494xsmall1.jpg?w=708" alt=""   class="alignleft size-full wp-image-505339" /></a>If you think Hadoop and the current ecosystem of big data tools are great, &#8220;you ain&#8217;t seen nothing yet,&#8221; to quote Bachman Turner Overdrive. By <a href="http://gigaom.com/cloud/obamas-big-data-plans-lots-of-cash-and-lots-of-open-data/">pumping hundreds of millions of dollars a year into big data research and development</a>, the Obama administration thinks it can push the current state of the art well beyond what&#8217;s possible today, and into entirely new research areas.</p>
<p>It&#8217;s a noble goal, but also a necessary one. Big data does have the potential to change our lives, but to get there it&#8217;s going to take more than <a href="http://gigaom.com/cloud/heres-another-big-data-startup-from-team-yahoo/">startups created to feed us better advertisements</a>.</p>
<h2>Consumer data is easy to get, and profitable</h2>
<p>It&#8217;s not fair to call the current state of big data problematic, but it is largely focused on profit-centric technologies and techniques. That&#8217;s because as companies &#8212; especially those in the web world &#8212; realized the value they could derive from advanced data analytics, they began investing huge amounts of money in developing cutting-edge techniques for doing so. For the first time in a long time, <a href="http://gigaom.com/cloud/how-business-taught-scientists-about-big-data/">industry is now leading the academic and scientific research communities</a> when it comes to technological advances.</p>
<p>As Brenda Dietrich, IBM Fellow and vice president for business analytics for IBM Software (and former VP of IBM&#8217;s mathematical sciences division), explained to me, universities are still doing good research, but students are leaving to work at companies like Google and Facebook as soon as their graduate or Ph.D. studies are complete, often times beforehand. Research begun in universities is <a href="http://googleresearch.blogspot.com/2012/03/excellent-papers-for-2011.html">continued in commercial settings</a>, generally with commercial interests guiding its direction.</p>
<p>And this commercial focus isn&#8217;t ideal for everyone. For example, Sultan Meghji, vice president of product strategy at Appistry, told me that many of his company&#8217;s government- and intelligence-sector customers aren&#8217;t getting what they expected out of Hadoop, and they&#8217;re looking for alternative platforms. Hadoop might well be the platform of choice for large web and commercial applications &#8212; indeed, it&#8217;s where most of those companies&#8217; big data investments are going &#8212; but it has its limitations.</p>
<h2>Enter federal dollars for big data</h2>
<p>However, as John Holdren, assistant to the president and director of White House Office of Science and Technology Policy, noted <a href="http://live.science360.gov/bigdata/">during a White House press conference</a> on Thursday afternoon, the Obama administration realized several months ago that it was seriously under-investing in big data as a strategic differentiator for the United States. He was followed by leaders from six government agencies explaining how they intend to invest their considerable resources to remedy this under-investment. That means everything from the Department of Defense, DARPA and the Department of Energy developing new techniques for storage and management, to the U.S. Geological Survey and the National Science Foundation using big data to change the way we research everything from climate science to educational techniques.</p>
<p>How&#8217;s it going to do all this, apart from agencies simply ramping up their own efforts? Doling out money to researchers. As Zach Lemnios, Assistant Secretary of Defense for Research &amp; Engineering for the Department of Defense, put it, &#8220;We need your ideas.&#8221;</p>
<p>IBM&#8217;s Deitrich thinks increased availability of government grants can play a major role in keeping researchers in academic and scientific settings rather than bolting for big companies and big paychecks. Grants can help steer research away from targeted advertising and toward areas that will &#8220;be good … for mankind at large,&#8221; she said.</p>
<div id="attachment_505340" class="wp-caption alignright" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2012/03/genomes.jpg"><img  title="genomes" src="http://gigaom2.files.wordpress.com/2012/03/genomes.jpg?w=300&#038;h=199" alt="" width="300" height="199" class="size-medium wp-image-505340" /></a><p class="wp-caption-text">The 1,000 Genomes Project data is now freely available to researchers on Amazon's cloud.</p></div>
<p>Additionally, she said, academic researchers have been somewhat limited in what they can do because they haven&#8217;t always had easy access to meaningful data sets. With the government now pushing to open its own data sets, and as well as for collaborative research among different scientific disciplines, she thinks there&#8217;s a real opportunity for researchers to do conduct better experiments.</p>
<p>During the press conference, Department of Energy Office of Science Director William Brinkman expressed his agency&#8217;s need for better personnel to program its fleet of supercomputers. &#8220;Our challenge is not high-performance computing,&#8221; he said, &#8220;it&#8217;s high-performance people.&#8221; As my colleague Stacey Higginbotham has noted in the past, the ranks of Silicon Valley companies are deep with people <a href="http://gigaom.com/cloud/supercomputings-problem-isnt-power-its-software/">who might be able to bring their parallel-programming prowess to supercomputing centers</a> if the right incentives were in place.</p>
<h2>Self-learning systems, a storage revolution and a cure for cancer?</h2>
<p>As anyone who follows the history of technology knows, government agencies have been responsible for a large percentage of innovation over the past half century, taking credit for no less than the Internet itself. &#8220;You can track every interesting technology in the last 25 years to government spending over the past 50 years,&#8221; Appistry&#8217;s Meghji said.</p>
<p>Now, the government wants to turn its brainpower and money to big data. As part of its new, roughly $100-million XDATA program, DARPA Deputy Director Kaigham &#8220;Ken&#8221; Gabriel said his agency &#8220;seek[s] the equivalent of radar and overhead imagery for big data&#8221; so it can locate a single byte among an ocean of data. The DOE&#8217;s Brinkman talked about the importance of being able to store and visualize the staggering amounts of data generated daily by supercomputers, or by the second from CERN&#8217;s Large Hadron Collider.</p>
<p>IBM&#8217;s Dietrich also has an idea for how DARPA and the DOE might spend their big data allocations. &#8220;When one is doing certain types of analytics,&#8221; she explained, &#8220;you&#8217;re not looking at single threads of data, you tend to be pulling in multiple threads.&#8221; This makes previous storage technologies designed to make the most-accessed data the easiest to access somewhat obsolete. Instead, she said, researchers should be looking into how to store data in a manner that takes into account the other data sets typically accessed and analyzed along with any given set. &#8220;To my knowledge,&#8221; she said, &#8220;no one is looking seriously at that.&#8221;</p>
<p>Not surprisingly given his company&#8217;s large focus on genetic analysis, Appistry&#8217;s Meghji is particularly excited about the government promising more money and resources in that field. For one, he said, the Chinese government&#8217;s <a href="http://gigaom.com/cloud/supercomputings-problem-isnt-power-its-software/">Beijing Genomics Institute</a> probably accounts for anywhere between 25 and 50 percent of the genetics innovation right now,  and &#8220;to see the U.S. compete directly with the Chinese government is very gratifying.&#8221;</p>
<p>But he&#8217;s also excited about the possibility of seeing big data turned to areas in genetics other than cancer research &#8212; which <a href="http://gigaom.com/cloud/fighting-cancer-at-100-gigabits-per-second/">is presently a very popular pastime</a> &#8212; and generally toward advances in real-time data processing. He said the DoD and intelligence agencies are typically two to four years ahead of the rest of the world in terms of big data, and increased spending across government and science will help everyone else catch up. &#8220;It&#8217;s all about not just reacting to things you see,&#8221; he said, &#8220;but being proactive.&#8221;</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/03/obama.jpg"><img  title="obama" src="http://gigaom2.files.wordpress.com/2012/03/obama.jpg?w=300&#038;h=200" alt="" width="300" height="200" class="size-medium wp-image-505336 alignright" /></a>Indeed, the DoD has some seriously ambitious plans in place. Assistant Secretary Lemnios explained during the press conference how previous defense research has led to technologies such as IBM&#8217;s Watson system and Apple&#8217;s Siri that are becoming part of our everyday lives. Its latest quest: utilize big data techniques to create autonomous systems that can adapt to and act on new data inputs in real time, but that know enough to know when they need to invite human input on decision-making. Scary, but cool.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=505263&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=110734"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=110734" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=505263+how-federal-money-will-change-the-face-of-big-data&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=505263+how-federal-money-will-change-the-face-of-big-data&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2011/11/dissecting-the-data-5-issues-for-our-digital-future/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=505263+how-federal-money-will-change-the-face-of-big-data&utm_content=dharrisstructure">Dissecting the data: 5 issues for our digital future</a></li><li><a href="http://pro.gigaom.com/2011/03/defining-hadoop-the-players-technologies-and-challenges-of-2011/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=505263+how-federal-money-will-change-the-face-of-big-data&utm_content=dharrisstructure">Defining Hadoop: the Players, Technologies and Challenges of 2011</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/03/29/how-federal-money-will-change-the-face-of-big-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/03/obama.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/03/obama.jpg?w=150" medium="image">
			<media:title type="html">obama</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/03/istock_000001007494xsmall1.jpg" medium="image">
			<media:title type="html">istock_000001007494xsmall</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/03/genomes.jpg?w=300" medium="image">
			<media:title type="html">genomes</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/03/obama.jpg?w=300" medium="image">
			<media:title type="html">obama</media:title>
		</media:content>
	</item>
	</channel>
</rss>
