11 Comments

Summary:

We managed to create 800,000 petabytes of digital information last year, according to a study released today by IDC and EMC. The creation of digital data will increase to 1.2 million petabytes by the end of this year, which means we need fatter pipes.

We managed to create 800,000 petabytes of digital information last year according to a study released today by IDC and EMC Corp.. The annual survey forecasts that the creation of digital data will increase 44 times to 1.2 million petabytes — or 1.2 zettabytes — by the end of this year. (A petabyte is the equivalent of a stack of DVDs stretching from here to the moon.) And by 2020, we’ll have created 35 trillion gigabytes of data — much of that in the “cloud.”

Even though EMC is putting out this data, and likes to explain how many “containers” it would take to hold that amount of data, what this enormous flood of data really requires will be robust broadband networks. The success of all this data will not be in storing it but in moving it around the world to people and companies who can use it to make new products, draw new conclusions or analyze it for profits.

Date stores will be essential for creating a home for the data (GigaOM Pro sub req’d), but data marketplaces, such as Microsoft’s Project Dallas, the World Bank’s data stores or startups like Infochimps will also have a place. Linked into those marketplaces will be cloud computing services or platforms as a service such as Amazon’s EC2 or Microsoft Azure that will be able to process the data on demand.

But for the maximum flexibility we’re going to need fat pipes linking data centers, data providers and even down to the end user, who will also be sending data from home networks, wireless phones and even their vehicle. The only way we’re going to handle this data tsunami is to create the infrastructure to route, process and connect it at gigabyte-per-second or even terabyte-per-second speeds. We’ll talk more about managing big data at our Structure 10 event in June.

You’re subscribed! If you like, you can update your settings

  1. gregorylent Tuesday, May 4, 2010

    obvious for years and rarely talked about in the tech press

  2. M. Edward (Ed) Borasky Tuesday, May 4, 2010

    Yes, I know we are creating massive amounts of “data”. But how does all of that data make peoples’ lives better? You say we need bigger pipes. I say we need to take a huge step back and ask why we are creating so much data? How much of it is noise? How much of it is relevant to people other than “data geeks”?

  3. Sundeep Singh Basra Tuesday, May 4, 2010

    Some of the data is useless and spam. Some of it is redundant. I think we need ways to compress, and have ways to recognize data that is repeated so many times. Of course we need bigger pipes due to the increase use of Video and other uses of Internet being more then just web pages.

    1. I think we need to be more pro-active before creating the data in the first place. If there isn’t a paying customer, why create it? If it’s a “solution in search of a problem”, why waste the effort?

      I’m all in favor of video and picture and music sharing, and yes, there’s a lot of duplication that could potentially be eliminated, at least with media that’s “published”, as opposed to “shared socially” by individuals.

      But within many businesses, there’s this whole “data driven / business intelligence” mindset that collects, stores and analyzes huge data sets under the name “business intelligence” with little regard for the cost, environmental impact or the difficulty of mining real value from the petabytes. I think it’s wasteful of resources and peoples’ time. There’s a saying in the agile community, “YAGNI – You Ain’t Gonna Need It!”

      Seriously – you ain’t gonna need those petabytes! Why are you collecting them? Why are you storing them? Why are you building huge data centers to analyze them? Why are you asking for huge pipes to shuffle them back and forth inside your enterprise? You have to keep certain records by law, you need a CRM system, and you have to have databases for eCommerce. But that’s it – the rest of it is pure waste as far as I’m concerned.

      So – if you’re a publisher, or if your business model involves helping people share media, by all means, yes, build the infrastructure, lobby for the big pipes to customers, etc. But just to shuffle data back and forth inside an enterprise with no benefit to your end users? I call bullshit! ;-)

  4. Microsoft Wants to Build Its Business With Data Thursday, May 13, 2010

    [...] to cloud computing and the efforts of several companies, however, the ability to access and make sense of huge chunks of information is here. The question is whether there’s a business in providing intelligible data sets to [...]

  5. Microsoft Wants to Build Its Business With Data Thursday, May 13, 2010

    [...] to cloud computing and the efforts of several companies, however, the ability to access and make sense of huge chunks of information is here. The question is whether there’s a business in providing intelligible data sets to [...]

  6. Dan Graham, Marketing Director of Active Data Warehousing, Teradata Corporation Tuesday, May 18, 2010

    Stacey,

    Applause. Yes, the data tsunami has been coming at us for as long as disk capacity has been growing and price per gigabyte dropping. Some pretend the data is not needed, others don’t know what to do with it. As the internet and cloud computing expands, fat pipes inside data centers and between them are mandatory for data in motion. When the data slows down, it needs to land somewhere. More and more companies are turning to enterprise data warehouses (EDWs) to store, manage and analyze data in a way that is meaningful. Teradata provides one key IT infrastructure component — the data warehouse — to create a comprehensive view of customers, supply chain and financial and performance management by bridging gaps between data and ensuring it is utilized, not just stored. So imagine a network of massively parallel data hubs –aka warehouses — consuming oceans of data and emitting facts and insights on demand, all linked by the fat pipes you mention. See you there!

    1. I expect a Marketing Director for Teradata to argue that “Some pretend the data is not needed, others don’t know what to do with it” and to set out a grand vision of a world made better by his product. I remain skeptical and curmudgeonly about the benefits of much of the “big data” IT infrastructure to the consumers and taxpayers, who ultimately pay for this stuff.

      Transaction data required by law and for business continuity? Yes.

      Media shared by consumers who are paying for the service? Yes.

      The rest of it – YAGNI!

  7. Unified Computing Growth Drives Netronome’s $23M Funding Tuesday, August 10, 2010

    [...] speeds get faster and cloud computing grows, the need for speedy processors that can route bits despite the tsunami of information [...]

  8. Hot Trend — Tools To Find Relevant Web Information « Tuesday, September 7, 2010

    [...] the global population starts drowning in data — 1.2 million petabytes of digital data are expected to be created this year alone — and consumers can access such information nearly anywhere on a handheld, management of that [...]

  9. Sensor Networks Top Social Networks for Big Data: Cloud « Tuesday, September 21, 2010

    [...] For example, a Boeing jet generates 10 terabytes of information per engine every 30 minutes of flight, according to Stephen Brobst, the CTO of Teradata. So for a single six-hour, cross-country flight from New York to Los Angeles on a twin-engine Boeing 737 — the plane used by many carriers on this route — the total amount of data generated would be a massive 240 terabytes of data. There are about 28,537 commercial flights in the sky in the United States on any given day. Using only commercial flights, a day’s worth of sensor data quickly climbs into the petabyte scale — for a single day. Multiply that by weeks, months and years, and the scale of sensor data gets massive. [...]

Comments have been disabled for this post