An effort to build a telescope that can see back 13 billion years to the creation of the universe is prompting a five year €32 million ($42.7 million) effort to create a low-power supercomputer and networks to handle the data the new telescope will generate.

An effort to build a radio telescope that can see back 13 billion years to the creation of the universe is prompting a five-year €32 million ($42.7 million) effort to create a low-power supercomputer and networks to handle the data the new radio telescope will generate. The DOME project, named for a mountain in Switzerland and the covering of a telescope, is the joint effort between IBM and the Dutch space agency ASTRON to build such a network and computer.

There are three problems with building a telescope capable of reading radio waves from that far out in deep space (actually there’s a real estate problem too, because the array will require millions antennas spread over an area the width of the continental U.S., but we’ll stick to computing problems). The first problem is the data that this Square Kilometre Array (SKA) will generate. IBM estimates it will produce:

… a few Exabytes of data per day for a single beam per one square kilometer. After processing this data the expectation is that per year between 300 and 1,500 Petabytes of data need to be stored. In comparison, the approximately 15 Petabytes produced by the Large Hadron Collider at CERN per year of operation is approximately 10 to 100 times less than the envisioned capacity of SKA.

And guys, the LHC is in the midst of getting its own cloud computing infrastructure in order to handle its data. So this IBM/ASTRON project may be just the beginning for SKA. As I say in the headline, in many ways, projects like the LHC and the SKA are ambitious investigations into the origins and composition of the universe. Our investigations into dark matter will require a compute effort that could rival the engineering effort that it took to get men on the moon. Which makes big data our Sputnik and our Apollo 11.

Now, back to the problems associated with the telescope. It will generate data like a corpse breeds maggots, so the project needs a computer big enough to process it without requiring a power plant or two. Additionally that data might have to travel from the antenna arrays to the computer, which means the third problem is the network. I’ve covered the need for compute and networks to handle our scientific data before in a story on Johns Hopkins’ new 100 gigabit on-campus network, but the scale of the DOME project dwarfs anything Johns Hopkins is currently working on. From that story:

[Dr. Alex Szalay of Johns Hopkins] ascribes this massive amount of data to the emergence of cheap compute, better imaging and more information, and calls it a new way of doing science. “In every area of science we are generating a petabyte of data, and unless we have the equivalent of the 21st-century microscope, with faster networks and the corresponding computing, we are stuck,” Szalay said.

In his mind, the new way of using massive processing power to filter through petabytes of data is an entirely new type of computing which will lead to new advances in astronomy and physics, much like the microscope’s creation in the 17th century led to advances in biology and chemistry.

So we need the computing and networking equivalent of a microscope to enable us to deal with a telescope planned for 2024, and the time to start building it is now. That gives us a lot longer than the time frame we had to land on the moon. IBM views the problem as one worthy of the following infographic:

As the infographic shows, we’re going to need massively multicore, low-power computers, better interconnection using photonics and new ways of building our networks. Hopefully, the search for dark matter is worth it.

SKA image courtesy of the Square Kilometer Array.

  1. By 2024 it’s reasonable to think that an exabyte of storage and the related network capacity to transmit will be within reason and economical. While it’s difficult to state now that it could be trivial, moors law and the related exponential growth in transmission and storage capacity has been substantial. Today we have services that store and transmit dozens if not hundreds of petabytes. Simply look at where we were 10 years ago. 2024 is a long way off and my sense is that we will more than make the progress needed to support goals like this.

  2. “It will generate data like a corpse breeds maggots.”

    Nice comparison, Stacey.

    1. Stacey Higginbotham Rich Monday, April 2, 2012

      It was late. I had no editor :)

      1. “It will generate data like a corpse breeds maggots …”
        This really should have read “It will generate noise like a corpse breeds maggots …”. This is what happens when you approach an infrastructure vendor with a problem that may benefit from their infrastructure – they go over the top! These systems do NOT need so much data being processed centrally, because the vast majority of it is noise, or simply the same signal re-sampled countless times. Requiring this much data capacity to, in effect, take a high resolution picture indicates the system has not been designed carefully from ground up. Surely there are ways around it, like performing more computation at each individual node (think a massive distributed cluster), other than this brute force approach?

  3. Christopher Glenn Wednesday, April 4, 2012

    Good fodder. Thanks for the post. http://bit.ly/HeYEJq

Comments have been disabled for this post