5 Comments

Summary:

In the world of science, cloud computing provides an ideal platform for crowdsourcing scientific problems across the whole world of researchers, giving them access to data sets and the computing resources to analyze them. If big data is any indicator, scientific crowdsourcing should catch on.

crowdsource

If you’re tired of hearing about cloud computing and big data, you might want to wear earplugs for the next year or so. These two trends are only going to get hotter, in large part because they’re also becoming ideal bedfellows. This is especially true in the world of science, where the cloud provides an ideal platform for crowdsourcing scientific problems across the whole world of researchers, giving them access to data sets and the computing resources to analyze them.

Generally speaking, we’ve already seen how crowdsourcing can be an effective method for solving big data problems. The Netflix Prize challenge in 2009 attracted more than 50,000 participants trying to improve Netflix’s Cinematch algorithm, and today we have Kaggle — an entire company dedicated to hosting competitions for companies trying to crowdsource their own analytical challenges. And it’s the cloud, with its centralized nature, virtually unlimited and on-demand resources, that makes it possible to have so many people access and work with the same data sets at the same time.

It’s true, of course, that big data doesn’t necessarily connote scientific workloads, but scientific workloads do increasingly rely on big data techniques. Some refer to data as the fourth paradigm of science because the sheer amount of data available and the new technologies and techniques for working with it are fundamentally changing how scientists go about their research. This has been going on for quite a while, actually, hence the massive research networks connecting supercomputers and research centers across the world. Researchers needed a way to transfer massive data sets to their peers to run on their systems, so they built networks such as the National LambdaRail, XSEDE and CERN’s Large Hadron Collider network.

However, while this arrangement might work fine for researchers working on projects for national labs or universities, who also happen to have time reserved on supercomputing systems, it’s not entirely democratic. Enter cloud computing. Now, anyone can have access to supercomputer-like processing power and, equally important, centralized data sets that don’t require a 40 Gbps connection to download. Companies such as DNAnexus rely on the cloud to host massive genomic data sets on which scientists can collaborate, and also to power those scientists’ computations on the data.

And although companies such as DNAnexus focus more on collaboration than on crowdsourcing, the tools for crowdsourcing are in place. Today, for example, I read about a company, Life Technologies, which makes semiconductor chips that actually carry out a variety of genome-sequencing workloads. Life is hosting a competition within its online community to improve the speed, scalability and accuracy of chips. Contestants will have access to the raw data as well as cloud-based resources for running computations.

Critics can call cloud computing overblown until they’re blue in the face — they might even be right when it comes to certain business applications — but there’s no denying the effects it could have in the scientific world. By giving virtually anybody access to relevant scientific data sets and the resources necessary to analyze them in a timely manner, cloud computing could result in real answers to some previously perplexing questions.

Feature image courtesy of Flickr user Kennisland.

You’re subscribed! If you like, you can update your settings

  1. Robert Cathey Tuesday, January 3, 2012

    Yep. And this only becomes more tangible as prices of storage, compute and network fall. The disruption rolls on.

  2. Oh yeah right, you’re going to simply “crowd source” busy scientists who have spent years in specialized education (probably many with Ph.D’s) who are busy working for their employers — they’re just going to come out of the woodwork and hold hands and sing “free to be you and me” in one big crowd sourcing party. Ha.

    1. I would agree with your pessimism, if only most competitions weren’t attached to a cash prize. That’s what gets people involved. Ask Kaggle about the breadth of its participants for big data competitions, or Netflix (granted, that was a very big prize).

  3. Well i’m not sure about how crowd sourcing will play out, but I guess we will have to wait and see. I don’t know how much more useful it will be to science as it has and will be towards financial industries, for things like Liquidity risk and data management

    1. Derrick you presents here a significant advancement due to the cloud. The big data can be generated as well as analyzed thanks to the cloud. I think that it is relevant to mention here some other tech trends that becomes a reality due to the cloud and the bug data – here I will mention the famous IPA (intelligent personal assistant) – Mr.Siri. You are welcome to read more of my thoughts about the same subject here – “The Cloud is Alive” http://www.iamondemand.com/post/13107626795/the-cloud-is-alive-integration-collaboration-and

Comments have been disabled for this post