We managed to create 800,000 petabytes of digital information last year according to a study released today by IDC and EMC Corp.. The annual survey forecasts that the creation of digital data will increase
44 times to 1.2 million petabytes — or 1.2 zettabytes — by the end of this year. (A petabyte is the equivalent of a stack of DVDs stretching from here to the moon.) And by 2020, we’ll have created 35 trillion gigabytes of data — much of that in the “cloud.”
Even though EMC is putting out this data, and likes to explain how many “containers” it would take to hold that amount of data, what this enormous flood of data really requires will be robust broadband networks. The success of all this data will not be in storing it but in moving it around the world to people and companies who can use it to make new products, draw new conclusions or analyze it for profits.
Date stores will be essential for creating a home for the data (GigaOM Pro sub req’d), but data marketplaces, such as Microsoft’s Project Dallas, the World Bank’s data stores or startups like Infochimps will also have a place. Linked into those marketplaces will be cloud computing services or platforms as a service such as Amazon’s EC2 or Microsoft Azure that will be able to process the data on demand.
But for the maximum flexibility we’re going to need fat pipes linking data centers, data providers and even down to the end user, who will also be sending data from home networks, wireless phones and even their vehicle. The only way we’re going to handle this data tsunami is to create the infrastructure to route, process and connect it at gigabyte-per-second or even terabyte-per-second speeds. We’ll talk more about managing big data at our Structure 10 event in June.