CERN, home of the Large Hadron Supercollider, has some minor problems to solve: like how the universe works and what’s it made of.
And now it’s building out more infrastructure to attack those problems, adding a new 3-mW data center in Budapest to augment the current 3.5 mW facility outside Geneva. And it’s rewriting the tools it will use to crunch and store petabytes of data created in its experiments, said Tim Bell, infrastructure manager at CERN, speaking at Structure:Europe on Wednesday morning.
CERN collects nearly unbelievable amounts of data. Its 100-megapixel cameras take 40 million pictures a second of proton collisions — creating 1 petabyte of data per second that needs analysis.
“Our big data challenge is that 35 petabytes a year need to be recorded and the Large Hadron Collider upgrade will double that. And physicists want to keep that data for 20 years,” Hay said. Right now the effort has 45,000 tape drives for archiving. If that isn’t big data, I’m not sure what is.”
One thing CERN had to do, to cut costs and boost efficiency, was to let physicists avail themselves of cloud services in the time it takes to get coffee versus waiting for weeks, he said.
“We conceded that our challenge is not special — Google is way ahead of us in scale. We need to build on what they’ve done,” he said.
So CERN is rewriting its 10-year-old toolchain — using Puppet (see disclosure) for configuration and OpenStack for orchestration, he said.
As for the status of hte new data center, it stands at about 50,000 cores now with a target of 300,000 cores by 2014.
Check out the rest of our Structure:Europe 2013 coverage here, and a video embed of the session follows below:
Disclosure: Puppet Labs is backed by True Ventures, a venture capital firm that is an investor in the parent company of this blog, Giga Omni Media. Om Malik, founder of Giga Omni Media, is also a venture partner at True.
A transcription of the video follows on the next page