The Future of Unstructured Data Management

And Why it is Important

One of the most interesting and successful research projects I’ve worked on lately was the one about unstructured data management. Our clients loved the Key Criteria and Radar reports, and I had many fascinating conversations with vendors and users about what is coming next in this space.

Why Unstructured Data Management

First, let’s be clear here—explosive data growth is not something you can bargain with or avoid. You can’t stop it. Human-generated data has been joined by a growing host of sensors, cameras, and countless other devices that are capable of producing overwhelming amounts of data for an incredibly diverse range of use cases. Most of this data we are keeping in our on-premises and cloud storage systems. Some of this data is analyzed almost immediately and then lays dormant for quite a long time, sometimes forever. There are plenty of reasons to keep data around for long periods of time—internal policy, compliance, regulations, you name it.

Traditional storage systems are not designed to cope with this. The capacities required by all this data gets out of hand, which is why scale-out architectures are being broadly adopted by organizations of every type, no matter the size. Scale-out Infrastructure complexity is no longer an issue and it can be managed by every system administrator. But there are at least three challenges that remain:

  1. Correct data placement: Data is not created equal and its value may change over time. The concept of primary and secondary storage are well worn, but it offers insight into where your spending should go. Primary, secondary, and tertiary data provide bright targets for budgeting spend based on $/GB, with attendant impacts on latency, throughput, and scalability in the storage system. Modern solutions can take advantage of object and cloud storage to further optimize data placement, leaving the user with several options to expand system capacity at a low cost.
  2. Data silos: These have always been a challenge, even with new technology like automated tiering to address them. The rise of hybrid and multi-cloud infrastructures has organizations working to get data closer to users and applications, and that is creating a growing amount of data silos. They are difficult to manage and can turn even the most basic operations into a nightmare.
  3. Understand value: Dispersing data across multiple environments makes it harder to find it, analyze it, and ultimately understand its real value. And without that understanding, it is very hard to manage cost or data placement strategies.

Figure 1. Traditional Storage Systems

This is only the beginning. In my report I talked about infrastructure-focused and business-oriented data management: The first is aimed at improving infrastructure TCO, but it is the latter that can really bring the biggest return on investment (ROI) and amplify the value of your data. At the end of the day, you need to adopt the right data management practices and tools to respond correctly to the demanding requirements imposed by business and regulations.

Why the Future of Data Management is in the Cloud

The trend is clear: The future of IT infrastructures is hybrid, with data distributed across on-premises systems and multiple clouds no matter the size of your organization. If you think about this scenario, you realize that it requires a different and modern approach to data management. Some of the key criteria I analyzed in my GigaOm report will become even more important and provide the foundation for the next generation of products and services to manage your unstructured data estate. These include:

  • Virtual global data lakes: You can’t fight data growth or its inevitable sprawl of data silos. The larger the organization, the more data sources and repositories you must manage. What you can do is consolidate these repositories into a single, virtual domain. This approach lets you implement global indexing and establish the foundation for everything you need on top of it: search, analytics, reporting, security, auditing, and so on.
  • SaaS-based tools: Data management should be global, flexible, and adaptable. You don’t want to court risk with a data management solution — including SaaS-based solutions — that are limited in reach (i.e. the use cases they support) or available resources, or, are unable to scale quickly in capacity and functionality when needed. Additionally, by concentrating on a single SaaS platform, the infrastructure is dramatically simplified and data silos virtually eliminated. Users and system administrators can take advantage of a global view of all the data to manage multiple applications and workflows while saving time and getting results faster. More so, a SaaS-based data management solution is more approachable by mid-size IT organizations, making data management affordable to a broader spectrum of organizations.
  • App marketplaces: App marketplaces are still a new thing, but extending the basic functionality of the product to address new use cases has several benefits. First, a SaaS deployment model eliminates the need for physical infrastructure to accommodate the application and its necessary copies of the data. And second, it enables organizations to extend the data management platform with additional applications, improving reach across the org and taking advantage of the virtual global data lake to improve several processes.

Figure 2. Hybrid Storage Systems

The connection between virtual global data lakes, SaaS, and marketplaces is important. It creates a universal data domain that enables complete visibility and reuse of data.

Virtual data lakes are easy to create with the right technology. For example, as described in the following flow chart, the backup process can be instrumental in the creation of a virtual data lake. All data is automatically collected and indexed while ingested, becoming immediately available for data management tasks.

Figure 3. Unstructured Data Management Flow Chart

At this point, the potential use cases are limited only by the applications offered in the marketplace. Think about it:

  • Global search: Imagine a private Google-like experience with all the necessary access control mechanisms. Creation of legal holdings on specific queries. Creation of data sets for compliance checks or other applications.
  • Security analytics: AI-based analytics tools that scan your data for access patterns to discover every possible security threat including ransomware, data leaks, and data breaches.
  • Compliance: Scan for specific patterns inside your files to prevent data privacy issues. System to create “Right to be forgotten” reports for authorities, document checks, and so on.

These are only a few examples. Having access to all of your data from a single and extensible platform will open a world of possibilities and cost savings. What’s more, the pay-as-you-go model of the cloud enables enterprises of all sizes to use only the apps they need when they need them, simplifying adoption and testing of new use cases while, again, keeping costs under control.

At the end of the day, by adopting a SaaS-based global data management solution, the user can quickly optimize costs and improve overall infrastructure TCO. This is only the low hanging fruit though. In fact, the business owners will have powerful tools to increment the value of data stored in their systems while responding adequately to several threats posed by poor data management.

Figure 4. All Risks Associated With Poor Unstructured Data Management Are Connected

Closing the Circle

Data management is becoming the pillar to make storage infrastructures sustainable over time and it is the only way to plan investments in the right way.

Users are starting to adopt multi-cloud infrastructures and they need to respond to a growing number of challenges, with some of them around infrastructure efficiency and cost reduction. But with data dispersed across several repositories, the focus is shifting toward more complex and business-focused requirements.

Cloud-based data management solutions can be the answer if implemented correctly, by creating global virtual data lakes that can be accessed by many applications and users simultaneously. In this context, the ease of use and the consistent user experience provided by a SaaS solution and a good app marketplace will be key to attract different personas in the organization. And not only that, with this kind of approach (SaaS and app marketplace), data management is democratized and available to a broader audience no matter how small is your IT staff or size of the organization.