As datasets get fatter and cumbersome, it’s becoming increasingly harder to move them around. Even the fattest multi-gigabit pipes look like cocktail straws when you’re talking about petabyte databases. At a panel discussion at GigaOM’s Structure conference, cloud computing executives pointed out that it’s going to become more and more difficult to move these massive troves of data to the applications that use them – or vice versa.
One option is to simply move the application closer to the data. NYSE Euronext(s nyx) has built out its own data centers in New Jersey and London in order to be close to its principal exchanges and customers, said Ken Barnes, SVP and global head of platforms of NYSE Technologies. At first, that proximity was necessary for latency reasons – in the securities trading business, milliseconds count – but NYSE finds that the issue of bandwidth is now becoming its bigger concern as its customers move massive amounts information in and out of its data centers.
Aspera co-founder and VP of engineering Serban Simu pointed out that kind of co-location might work well for financial services where both data and its users are concentrated in a few centers, but it doesn’t work for other industries, such as healthcare, where hospitals, research institutions, millions of doctors and billions of patients are distributed around the world. A medical researcher collecting or analyzing data overseas for a university located in the U.S. faces a bandwidth problem.
Even if we are able to move applications closer to datasets or move databases closer to the cloud computing resources that use them, any information collected or analyses performed in one location will always be useful somewhere else, said Haseeb Budhani, product VP at Infineta.
We’re generating data far faster than we can move it, and the more we generate the more immobile it will become, said Lew Tucker, VP and CTO at Cisco Systems(s csco) “Data does have inertia,” he said. “It tends to stay where it’s originally put.” He proposed that data analysis will eventually adopt a distributed computing model. Fields that deal with a huge quantities, such as genomic research, will collect and an pre-process their data locally and the pass more refined datasets to other distributed data centers. The video industry solved its bandwidth distribution problem by introducing the content delivery network (CDN), he said, why can’t other data analysis do the same?