Market intelligence firm IDC recently released its fifth annual Digital Universe study which, using text and video, combines hard data and projections with analysis to highlight issues and opportunities. It also sets up the question that, in a world where the volume of data more than doubles every two years, should we just keep buying more storage, or are smarter approaches to data description, selection and retention, such as using metadata, required?
In 2010, the report claims, more than 1 trillion gigabytes (1 zettabyte) were stored on iPods, laptops, desktops, and servers worldwide. By the end of this year that figure will have exceeded 1.8 zettabytes, and there’s no sign of the pace slowing.
Meanwhile, since 2005, enterprise investment in IT has increased around 50 percent, reaching $4 trillion. This is despite the plummeting cost of an IT system’s building blocks, such as storage, which have dropped from almost $20 per gigabyte to less than $3 in the same period. IDC’s projections suggest that storage will cost mere cents in the next few years. And still the amount of data grows while the cost of IT to the enterprise rises.
Despite fresh enthusiasm for big data, there remains a real danger that enterprises (and individuals) simply keep data because doing so is easier than actively deciding what can be discarded. Sensor logs, email inboxes, customer transaction data and more continue to fill enterprise storage arrays, and without robust data management policies in place the only realistic solution is often to keep buying additional storage. IDC appears to recognize this problem, suggesting that the “ultimate value of a big data implementation will be judged” on three guidelines:
- Does it provide more useful information [than the enterprise had access to previously]?
- Does it improve the fidelity of the information?
- Does it improve the timeliness of the response?
While there is clear value in collecting and analyzing more data than ever before, simply storing everything by default is a strategy that is increasingly hard to defend. Storage may be getting cheaper, but it’s not free, and valuable data could disappear under an unquantifiable mass of poorly structured dross.
First among a set of calls to action that includes “master virtualization” and “move what you can to the cloud,” IDC exhorts CIOs to “investigate the new tools for creating metadata.” The report continues, “Big data will be a fountain of big value only if it can speak to you through metadata.” Metadata, or data about data, records the context within which data was captured. It describes the characteristics of the process that generates data (for example, the resolution of a camera photographing queues outside a store, and the interval between pictures). Metadata defines the structures in which data is stored, and lends meaning to cryptic codes and measurements.
The use of metadata remains less prevalent than we might expect. But where it can be applied cheaply and automatically as part of the process of data creation, it offers information architects and storage managers the means to effectively curate the data for which they are responsible. In many industries, the hardware and software already in use is generating this information automatically — even cheap consumer cameras record data about themselves and the environment around them as they take pictures. Information managers simply need to keep the data and start using it as part of their management workflow.
As data volumes grow ever larger, and as truly valuable data starts to represent a smaller proportion of the whole, metadata is going to become increasingly important. It will help companies understand the data they need, and it will help them to more systematically discard the data that they do not. We are already seeing acquisitions that bring data analysis solutions and storage hardware together. EMC’s acquisition of Greenplum last year is just one example of this trend. It points toward solutions that enable complex data analysis as well as intelligent management of all the data flowing through an enterprise. Metadata will be the key that enables those intelligent decisions to be made, finally slowing the rate at which enterprise data centers fill with ever-more storage arrays.