Blog Post

We Can't Squeeze the Data Tsunami Through Tiny Pipes

We managed to create 800,000 petabytes of digital information last year according to a study released today by IDC and EMC Corp. (s emc). The annual survey forecasts that the creation of digital data will increase 44 times to 1.2 million petabytes — or 1.2 zettabytes — by the end of this year. (A petabyte is the equivalent of a stack of DVDs stretching from here to the moon.) And by 2020, we’ll have created 35 trillion gigabytes of data — much of that in the “cloud.”

Even though EMC is putting out this data, and likes to explain how many “containers” it would take to hold that amount of data, what this enormous flood of data really requires will be robust broadband networks. The success of all this data will not be in storing it but in moving it around the world to people and companies who can use it to make new products, draw new conclusions or analyze it for profits.

Date stores will be essential for creating a home for the data (GigaOM Pro sub req’d), but data marketplaces, such as Microsoft’s (s msft) Project Dallas, the World Bank’s data stores or startups like Infochimps will also have a place. Linked into those marketplaces will be cloud computing services or platforms as a service such as Amazon’s (s amzn) EC2 or Microsoft Azure that will be able to process the data on demand.

But for the maximum flexibility we’re going to need fat pipes linking data centers, data providers and even down to the end user, who will also be sending data from home networks, wireless phones and even their vehicle. The only way we’re going to handle this data tsunami is to create the infrastructure to route, process and connect it at gigabyte-per-second or even terabyte-per-second speeds. We’ll talk more about managing big data at our Structure 10 event in June.

11 Responses to “We Can't Squeeze the Data Tsunami Through Tiny Pipes”

  1. Stacey,

    Applause. Yes, the data tsunami has been coming at us for as long as disk capacity has been growing and price per gigabyte dropping. Some pretend the data is not needed, others don’t know what to do with it. As the internet and cloud computing expands, fat pipes inside data centers and between them are mandatory for data in motion. When the data slows down, it needs to land somewhere. More and more companies are turning to enterprise data warehouses (EDWs) to store, manage and analyze data in a way that is meaningful. Teradata provides one key IT infrastructure component — the data warehouse — to create a comprehensive view of customers, supply chain and financial and performance management by bridging gaps between data and ensuring it is utilized, not just stored. So imagine a network of massively parallel data hubs –aka warehouses — consuming oceans of data and emitting facts and insights on demand, all linked by the fat pipes you mention. See you there!

    • I expect a Marketing Director for Teradata to argue that “Some pretend the data is not needed, others don’t know what to do with it” and to set out a grand vision of a world made better by his product. I remain skeptical and curmudgeonly about the benefits of much of the “big data” IT infrastructure to the consumers and taxpayers, who ultimately pay for this stuff.

      Transaction data required by law and for business continuity? Yes.

      Media shared by consumers who are paying for the service? Yes.

      The rest of it – YAGNI!

  2. Sundeep Singh Basra

    Some of the data is useless and spam. Some of it is redundant. I think we need ways to compress, and have ways to recognize data that is repeated so many times. Of course we need bigger pipes due to the increase use of Video and other uses of Internet being more then just web pages.

    • znmeb

      I think we need to be more pro-active before creating the data in the first place. If there isn’t a paying customer, why create it? If it’s a “solution in search of a problem”, why waste the effort?

      I’m all in favor of video and picture and music sharing, and yes, there’s a lot of duplication that could potentially be eliminated, at least with media that’s “published”, as opposed to “shared socially” by individuals.

      But within many businesses, there’s this whole “data driven / business intelligence” mindset that collects, stores and analyzes huge data sets under the name “business intelligence” with little regard for the cost, environmental impact or the difficulty of mining real value from the petabytes. I think it’s wasteful of resources and peoples’ time. There’s a saying in the agile community, “YAGNI – You Ain’t Gonna Need It!”

      Seriously – you ain’t gonna need those petabytes! Why are you collecting them? Why are you storing them? Why are you building huge data centers to analyze them? Why are you asking for huge pipes to shuffle them back and forth inside your enterprise? You have to keep certain records by law, you need a CRM system, and you have to have databases for eCommerce. But that’s it – the rest of it is pure waste as far as I’m concerned.

      So – if you’re a publisher, or if your business model involves helping people share media, by all means, yes, build the infrastructure, lobby for the big pipes to customers, etc. But just to shuffle data back and forth inside an enterprise with no benefit to your end users? I call bullshit! ;-)

  3. Yes, I know we are creating massive amounts of “data”. But how does all of that data make peoples’ lives better? You say we need bigger pipes. I say we need to take a huge step back and ask why we are creating so much data? How much of it is noise? How much of it is relevant to people other than “data geeks”?