Laptop Displaying the GigaOm Research Portal

Get your Free GigaOm account today.

Access complimentary GigaOm content by signing up for a FREE GigaOm account today — or upgrade to premium for full access to the GigaOm research catalog. Join now and uncover what you’ve been missing!

How to handle petabyte-scale growth in enterprises

Table of Contents

  1. Summary
  2. The problem with traditional data storage
  3. Understanding the scalability problem
  4. The scalability solution
  5. Why the scalability issue matters
  6. An example of a multi-petabyte customer
  7. Final notes
  8. About Enrico Signoretti

1. Summary

The unprecedented data growth many enterprises experience today is difficult to predict. Sprawling unstructured data is doubling every 18 to 24 months. Meanwhile, anticipating the number of objects for future storage varies by the application or the device producing the data. A mobile phone currently able to record HD at 1080p, for example, will soon be able to record at 4K (2160p) and 8K UHD (4320p). In addition to video data, sensors are expected to grow into the billions, surpassing the number of mobile devices.

Rethinking storage infrastructure in order to sustain predictable data growth over time is now crucial for enterprises. In addition to sheer growth, access patterns are changing. Saving data for different applications and sources means that several workloads will flood the storage system at the same time. Regardless of the speed at which the data becomes inactive, the storage system must manage the ingestion (and re-distribution) of huge amounts of data from individual sources and at the same time support fast, distributed concurrent access from a multitude of different mobile clients globally dispersed.

Most organizations are finally realizing data’s potential and value, and that the data stored today will build a competitive advantage tomorrow. By embracing a “do-not-delete-anything” strategy they are finding the best way to implement a stored-data analytics mechanism. In fact, organizations sometimes believe they are producing new data when in reality they are simply unaware that it already exists and therefore creating redundancy. Other times, digging into the entire organization’s knowledge base could save time and money.

Moving from multi-terabyte to multi-petabyte storage requires a radically different approach to infrastructure design. That design should be based on a true no-compromise scale-out architecture with an object-store back end that is capable of offering multiple access protocols and APIs. Economically, the acquisition costs should be pennies per GB instead of dollars. Automation, resiliency, and availability techniques must drive down the total cost of ownership.

Thumbnail image courtesy of Henrik5000/iStock.