This GigaOm Research Reprint Expires Jul 17, 2024

GigaOm Radar for Unstructured Data Management: Infrastructure-Focused Solutionsv3.0

1. Summary

Managing storage capacity with efficiency has become more accessible, less expensive, and reasonably priced, thanks to scale-out storage systems for files and objects. At the same time, the cloud offers the opportunity to expand the number of options available in terms of performance, capacity, and cold data archiving. The resulting proliferation of data silos is an issue, though, and new multicloud IT strategies and edge computing are accelerating this trend alarmingly

The flexibility and scalability provided by public clouds also come at a price. In this period of financial turmoil and uncertainty, some organizations are actively seeking cost reduction opportunities: cloud-first initiatives are being reevaluated, and data repatriation projects are becoming commonplace across multiple verticals. Those projects require careful planning and execution and can become very costly without a prior analysis of the existing and anticipated data footprint.

Complexity also impacts internal policy and regulatory compliance; strict regulations akin to GDPR, CCPA, HIPAA, and Payment Card Industry Data Security Standard (PCI DSS) are being adopted worldwide, making analysis and classification more difficult without the help of an unstructured data management solution. Furthermore, data sovereignty regulations impose restrictions on physical data location and data flows, requiring organizations to adequately segment access to resources by location and identify and geo-fence impacted datasets. Solutions that support these regulatory frameworks and are capable of handling data privacy requests–like Data Subject Access Requests (DSARs), identifying and classifying personally identifiable information (PII), or even taking further action on right to be forgotten (RtbF) and right of erasure (RoE) requests–can radically simplify compliance operations.

Those two business imperatives—repatriation projects and regulatory compliance—increase the need for solutions that can seamlessly handle data movement at scale automatically, with minimal oversight, ideally based on a policy engine.

We are coming to a point where storing data safely and for a long period of time does not actually bring any benefit to an organization, and it can quickly become a liability. However, with the right processes and tools, it’s now possible to take control of data and exploit its hidden value, transforming it from a liability to an asset.

With the right unstructured data management solutions, it’s possible to:

  • Understand what data is stored in the storage systems, no matter how complex and dispersed.
  • Build a strategy to intervene on costs while increasing the return on investment (ROI) for data storage.

Depending on the approach chosen by the user, there are several potential benefits to building and developing a data management strategy for unstructured data, including better security and compliance, improved end-user services, reduced costs, and data reusability. The right data management strategy enables organizations to mitigate risk and make the most of opportunities.

This GigaOm Radar report highlights key unstructured data management vendors and equips IT decision-makers with the information needed to select the best fit for their business and use case requirements. In the corresponding GigaOm report “Key Criteria for Evaluating Unstructured Data Management Solutions,” we describe in more detail the key features and metrics that are used to evaluate vendors in this market.

How to Read this Report

This GigaOm report is one of a series of documents that helps IT organizations assess competing solutions in the context of well-defined features and criteria. For a fuller understanding, consider reviewing the following reports:

Key Criteria report: A detailed market sector analysis that assesses the impact that key product features and criteria have on top-line solution characteristics—such as scalability, performance, and TCO—that drive purchase decisions.

GigaOm Radar report: A forward-looking analysis that plots the relative value and progression of vendor solutions along multiple axes based on strategy and execution. The Radar report includes a breakdown of each vendor’s offering in the sector.

Solution Profile: An in-depth vendor analysis that builds on the framework developed in the Key Criteria and Radar reports to assess a company’s engagement within a technology sector. This analysis includes forward-looking guidance around both strategy and product.

2. Market Categories and Deployment Types

To better understand the market and vendor positioning (Table 1), we assess how well unstructured data management solutions are positioned to serve specific market segments and deployment models.

This Radar report covers infrastructure-focused solutions and provides insights into whether evaluated solutions can also meet business-focused solution requirements. Business-focused solutions will be covered in a separate Radar report; however, some solutions overlap and may appear in both Radars, although with different placements and evaluations. Here’s how we define and distinguish the two categories:

  • Infrastructure focus: Solutions designed to target data management at the infrastructure level and metadata, including automatic tiering and basic information lifecycle management, data copy management, analytics, index, and search.
  • Business focus: Solutions designed to solve business-related problems, including compliance, security, data governance, big data analytics, and e-discovery.

In addition, we recognize two deployment models for solutions in this report:

  • User managed: Usually installed and run on-premises, these products often can work well in hybrid cloud environments.
  • Software as a service (SaaS): Based on a cloud back end and usually provided as a service, solutions deployed this way work in a manner quite distinct from that of the products in the on-premises category. Traditionally, this type of solution is optimized more for hybrid, multicloud, and mobile/edge use cases.

Table 1. Vendor Positioning

Market Segment

Deployment Model

Infrastructure Focus Business Focus User Managed SaaS
Arcitecta
Atempo
Cohesity
CTERA
Datadobi
Data Dynamics
Dell Technologies
Druva
Hitachi Vantara
Komprise
NetApp
Panzura
3 Exceptional: Outstanding focus and execution
2 Capable: Good but with room for improvement
2 Limited: Lacking in execution and use cases
2 Not applicable or absent

3. Key Criteria Comparison

Building on the findings from the GigaOm report “Key Criteria for Evaluating Unstructured Data Management Solutions,” Table 2 summarizes how each vendor included in this research performs in the areas that we consider differentiating and critical in this sector. Table 3 follows this summary with insight into each product’s evaluation metrics—the top-line characteristics that define the impact on the organization.

The objective is to give the reader a snapshot of the technical capabilities of available solutions, define the perimeter of the market landscape, and gauge the potential impact on the business.

Table 2. Key Criteria Comparison

Key Criteria

Metadata Analytics Global Content & Search Big Data Analytics Compliance Security Orchestration AI/ML
Arcitecta 3 3 3 2 2 2 0
Atempo 2 2 2 1 1 3 1
Cohesity 3 3 3 3 3 2 3
CTERA 3 2 1 2 3 2 3
Datadobi 3 3 1 1 1 3 1
Data Dynamics 3 3 1 2 1 3 2
Dell Technologies 3 3 2 1 2 2 1
Druva 3 3 0 2 3 2 3
Hitachi Vantara 3 3 3 3 3 2 2
Komprise 3 3 3 1 2 3 2
NetApp 3 3 2 3 3 2 3
Panzura 2 3 1 2 1 2 1
3 Exceptional: Outstanding focus and execution
2 Capable: Good but with room for improvement
2 Limited: Lacking in execution and use cases
2 Not applicable or absent

Table 3. Evaluation Metrics Comparison

Evaluation Metrics

Architecture Scalability Flexibility Performance & Efficiency Manageability & Ease of Use Ecosystem
Arcitecta 3 3 3 3 3 3
Atempo 2 3 3 3 3 3
Cohesity 3 3 3 3 3 3
CTERA 3 3 3 3 3 2
Datadobi 3 3 3 3 3 3
Data Dynamics 3 2 3 2 1 2
Dell Technologies 3 3 3 2 3 2
Druva 3 3 2 3 3 3
Hitachi Vantara 3 3 3 3 2 3
Komprise 3 3 3 3 3 3
NetApp 3 3 3 3 3 3
Panzura 3 3 3 3 3 2
3 Exceptional: Outstanding focus and execution
2 Capable: Good but with room for improvement
2 Limited: Lacking in execution and use cases
2 Not applicable or absent

By combining the information provided in the tables above, the reader can develop a clear understanding of the technical solutions available in the market.

4. GigaOm Radar

This report synthesizes the analysis of key criteria and their impact on evaluation metrics to inform the GigaOm Radar graphic in Figure 1. The resulting chart is a forward-looking perspective on all the vendors in this report based on their products’ technical capabilities and feature sets.

The GigaOm Radar plots vendor solutions across a series of concentric rings, with those set closer to the center judged to be of higher overall value. The chart characterizes each vendor on two axes—balancing Maturity versus Innovation and Feature Play versus Platform Play—while providing an arrow that projects each solution’s evolution over the coming 12 to 18 months.

Figure 1. GigaOm Radar for Infrastructure-Focused Unstructured Data Management Solutions

As you can see in the Radar chart in Figure 1, vendors are spread across an arc that lies primarily in the lower half of the Radar, denoting a market that is particularly innovation-driven.

Five vendors sit in the Innovation/Platform Play area: Arcitecta, Cohesity, Druva, Komprise, and NetApp.

  • Arcitecta is a new entrant in this Radar. The solution is very interesting and proposes a unique, holistic approach that combines a massively scalable global file system with unstructured data management capabilities such as full content indexing, ransomware resiliency, support for compliance use cases, and comprehensive auditing capabilities.
  • Already offering comprehensive end-to-end data management capability, Cohesity further raises the bar with DataHawk, a comprehensive suite that combines its threat intelligence solution with advanced data classification, further adding to an already very complete solution.
  • Druva offers an interesting approach characterized by providing data compliance, search, and analytics capabilities on top of its SaaS-based data protection platform. It includes a broad set of features such as AI/ML-based anomaly and ransomware detection, and continues to strongly focus strongly on security improvements.
  • Komprise delivers a compelling SaaS platform with a strong emphasis on metadata analysis, automation, and orchestration capabilities. The solution combines ease of use with data classification capabilities, data placement recommendations with actionable insights, and a dynamic development pace.
  • NetApp has integrated Cloud Data Sense into its BlueXP SaaS unified data management plane. In addition to improved usability, the solution, branded BlueXP classification, includes Cloud Data Sense’s enviable business-oriented capabilities such as data classification and compliance and extends it with advanced ransomware protection and data mobility features.

Three vendors—CTERA, Datadobi, and Hitachi Vantara—are positioned just outside of the Innovation/Platform Play quadrant but are on a trajectory to enter this area soon.

  • CTERA’s cloud-based SaaS distributed file storage solution implements an intuitive and interactive data insight visualization platform that, combined with geo-zoning and traffic routing features, enables its customers to define granular, regulatory compliant access policies. The solution now includes native ransomware protection capabilities.
  • Datadobi’s new StorageMAP solution (replacing DobiMigrate and DobiProtect) includes excellent data orchestration capabilities. Metadata analytics and global index and search features are very good; the solution also includes orphaned and dark data detection features.
  • Hitachi offers a broad ecosystem of solutions, with Hitachi Content Intelligence being the most focused on the key criteria described in the companion to this report. The solution is mature and proven, with a strong emphasis on business-oriented features such as policies and data workflows, making it best suited for large enterprises. Still, it can also work for infrastructure use cases.

Two companies are in the Innovation/Feature Play quadrant: Atempo and Panzura. Atempo is a new entrant in this Radar with Miria, a holistic platform that covers a broad spectrum of services, including analytics, migration, archiving, and backup. In data management, its strongest focus is in the orchestration area. The solution presents a promising roadmap. Panzura Data Services is an easy-to-use and effective solution providing data analytics with classification criteria, data insights, and growth patterns, plus various auditing features (like those that check for regulatory compliance violations) and anomaly detection mechanisms, including ransomware protection.

Finally, two vendors are listed in the Maturity/Platform Play quadrant: Data Dynamics and Dell Technologies. Data Dynamics excels in multiple areas with a comprehensive and unified solution that combines broad vendor support, enterprise-scale data management, and policy-based data copy and migration scenarios with strong data analytics, security, and compliance capabilities. Dell Technologies’ DataIQ solution provides a unified system view across Dell, third-party, and cloud storage; it provides solid reporting capabilities with policy-based data management and migration options and offers an open plug-in development framework.

Inside the GigaOm Radar

The GigaOm Radar weighs each vendor’s execution, roadmap, and ability to innovate to plot solutions along two axes, each set as opposing pairs. On the Y axis, Maturity recognizes solution stability, strength of ecosystem, and a conservative stance, while Innovation highlights technical innovation and a more aggressive approach. On the X axis, Feature Play connotes a narrow focus on niche or cutting-edge functionality, while Platform Play displays a broader platform focus and commitment to a comprehensive feature set.

The closer to center a solution sits, the better its execution and value, with top performers occupying the inner Leaders circle. The centermost circle is almost always empty, reserved for highly mature and consolidated markets that lack space for further innovation.

The GigaOm Radar offers a forward-looking assessment, plotting the current and projected position of each solution over a 12- to 18-month window. Arrows indicate travel based on strategy and pace of innovation, with vendors designated as Forward Movers, Fast Movers, or Outperformers based on their rate of progression.

Note that the Radar excludes vendor market share as a metric. The focus is on forward-looking analysis that emphasizes the value of innovation and differentiation over incumbent market position.

5. Vendor Insights

Arcitecta

Arcitecta implements a holistic approach to data management based on its Mediaflux platform. Mediaflux can be seen as an operating system for data and metadata; the solution consists of cluster controllers and I/O and compute nodes and offers a single global namespace with multiprotocol support (network file system/NFS, server message block/SMB, AWS S3, API overHTTPS, sFTP, and DICOM). The file system is also versioned, supporting Arcitecta’s Point In Time backup feature, which allows organizations to seamlessly view and recover data from any particular point in time, obviating the need for separate backup products The solution is highly scalable and supports up to trillions of files. To efficiently handle indexing and searching, Arcitecta developed its own proprietary database solution, XODB, based on a NoSQL-like approach. XODB powers the Mediaflux operating system and allows organizations to search for any data nearly instantly.

Arcitecta supports rich metadata and includes provenance information as well; metadata is added by automated extraction of technical metadata from the data through analytical processing pipelines, by manual addition, or automatically from within the context of a specific activity. The solution also performs full-content indexing of digital assets, enabling efficient search based on keywords, file type, date range, and other search criteria. Once indexing is complete, the solution can display search results in real time with the ability to filter and sort data.

The file system is synthetic and can be reconfigured on the fly. This allows Mediaflux to create synthetic views of data without having to move or copy any of the underlying data, providing the ability to create virtual views for big data analytics. Those views can be projected through API and file systems, and via any of the protocols supported by the system.

The solution includes support for data security, governance, and data protection requirements. Mediaflux creates and enforces metadata schemas that ensure data is consistently described and organized to comply with regulatory requirements, such as GDPR or HIPAA, that mandate the use of specific metadata standards. The solution comes with an extensive auditing framework that allows forensic reconstruction of access patterns and vectors, providing enhanced compliance and tracking. It also maintains information about data owners, a prerequisite for handling DSARs.

From a security perspective, the solution does not implement any anomaly detection engine or ransomware-based protection system. Instead, it relies on its write once, read many (WORM)-based file system, versioning capabilities, and the point-in-time data protection solution to allow users to immediately revert to a healthy state in case of data loss caused by a malicious actor or ransomware. Nevertheless, it comes with robust role-based access controls (RBACs), attribute-based access controls (ABACs), and supports in-flight and at-rest encryption. The solution also provides multifactor authentication (MFA) and authorization for every protocol and vector of access to prevent anomalies from occurring in the first place.

Orchestration capabilities are handled by a workflow engine that automates complex data processing and management tasks. Workflows can include a variety of data processing and analysis steps, such as file transformations, metadata extraction, data tiering, migration, and quality control checks. Mediaflux provides a simple API to programmatically access and interact with data assets stored in the system. This API can be used to automate repetitive data management tasks, such as data ingestion, metadata extraction, and file format conversion. It can also orchestrate the execution of HPC workflows with integrations for batch scheduling systems such as portable batch system (PBS) and simple Linux utility for resource management (SLURM).

The solution does not currently use AI/ML in its products. Instead, the company sees Mediaflux as a system that can feed data to AI/ML environments.

Strengths: With Mediaflux, Arcitecta offers comprehensive in-band data management capabilities with a clever implementation of multiple features such as metadata management and full-content indexing and search, but also extensive auditing and seamless ransomware recovery features. The solution is built for scale and is capable of handling hundreds of billions of files.

Challenges: Despite Mediaflux’s inherent resilience capabilities, more proactive detection and prevention of ransomware attacks or malicious activities is an area that’s up for improvement.

Atempo

Atempo delivers unstructured data management capabilities through its Miria solution. Miria consists of five complementary yet independent data services: Analytics (available regardless of the data service used), Mobility, Archiving, Migration, and Backup (these four are individually licensed through a volume-based subscription). In addition to data protection, use cases for Atempo Miria include data/storage relocation, storage lifecycle, consolidation initiatives, on-premises-to-cloud migration, cloud-to-cloud migration, and data repatriation. The solution is currently deployable either as a virtual appliance (compatible with a broad set of virtualization platforms) or as software installed on physical or virtual servers, with support for Microsoft Windows, macOS, and Linux. In the future, it should also become available natively on public cloud marketplaces.

Miria includes an analytics component that explores, identifies, and classifies data, with the ability to order or sort files based on system metadata or extensions. It also includes reporting capabilities. The analytics layer is highly configurable and offers advanced filtering capabilities that can be saved in custom views to be reused later. These views also can provide reusable file lists for data movement operations. In 2023, the solution will include support for cloud storage as well as growth trend projections.

Compliance is currently supported through third-party integrations. Full indexing is not yet available, but Atempo plans to add this feature in a future release. The company’s research department is also evaluating ML techniques to detect personal information and assist with automated data classification.

From a security perspective, Miria includes audit trails that can be provided to security information and event management (SIEM) platforms via API calls. In addition, Miria can copy object lock configurations between source and destination object buckets to ensure data remains adequately protected. Finally, the management system offers a granular permission set to allow access for a variety of roles within an organization. Future security improvements may include anomaly detection features in its Analytics service, an area that Miria is actively working on.

Regarding workflow management, Miria allows policy-based data movements and creation of activities that can be either executed on demand or scheduled. Activities can be launched from the command-line interface or through Miria’s fully public REST API. This flexibility allows the solution to be easily integrated in workflows, a frequent use case for media and entertainment customers.

Miria is well-known for data migration use cases, as it supports data migration and replication activities for file-to-file, file-to-object, and object-to-object actions. It supports any object storage or cloud provider solution offering S3-based or Swift-based access, and supports Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). The FastScan capability helps users quickly identify changed files to reduce migration times after an initial full migration cycle has completed. Atempo plans to introduce FastScan support for object stores in a next product release. Worth noting, Miria supports cold object tiers and is capable of natively writing to offline, long-term storage media systems, such as tape libraries, based on different technologies.

Strengths: Atempo Miria offers a holistic platform that covers a broad spectrum of use cases and services, including data protection, data archiving, user driven data movement to data analytics, and automatic data management. From a data management perspective, its strongest focus is in the orchestration area. The solution can be extended, thanks to third-party integrations, and presents a promising roadmap.

Challenges: The solution presents multiple areas for improvement. If the company delivers on its roadmap, full content indexing and the use of ML can significantly improve its handling of compliance and security use cases.

Cohesity

Cohesity offers the Cohesity Data Cloud Platform, an end-to-end solution designed to tackle data and apps challenges in modern enterprises. It is available both as a software-defined, scale-out solution deployed on physical or virtual servers and as a service from major cloud providers—AWS, Azure, and GCP.

Users can consolidate disparate workloads, including backup, archiving, file shares, object stores, test/dev, and analytics, onto a single software-defined platform. This approach simplifies the storage, protection, and management of massive volumes of data. On top of the efficient, web-scale distributed file system and integrated data protection, Cohesity offers a growing list of capabilities that address both infrastructure- and business-focused applications.

The Helios management interface provides a unified view of objects and files stored across locations and offers a set of actionable insights such as backup and restore, cloning for test and development use cases, and reporting. Helios also supports the deployment of applications such as Insight and Data Classification. These go far beyond standard metadata search and support real content and context-based search and discovery, all within the unified Helios management interface. When combined with another native application branded Spotlight, organizations can use Insight to analyze user activity and search for unstructured data.

Data management remains one of the key differentiators for Cohesity. In fact, it has implemented an extensive series of features in this area aimed at simplifying data mobility, protection, security, and governance. SmartFiles includes remote data replication capabilities, automated tiering between different storage systems and the cloud, transparent archiving functionality, data migration, and sophisticated ransomware protection, which benefits from Cohesity platform-level advanced security features.

The solution offers a rich set of security-related capabilities: ransomware protection leverages ML-based early detection of attacks by monitoring data changes against normal patterns (using several metrics) and measuring abnormal activity against the usual activity baseline. Built upon immutable snapshots, ransomware protection is extended with Fort Knox, a strongly secured, isolated, cloud air-gapped immutable storage solution that is delivered as a service. Another differentiator is a strong zero-trust MFA module with quorum-based approval for sensitive actions in the environment, such as changing protection policies. Finally, the user behavior analytics (UBA) capability detects risky user behaviors by identifying indicators of data exfiltration, tampering, deletion, and more. It also audits user file activities with interactive log search.

This approach is further enhanced by Cohesity DataHawk, an add-on to Smart Files that provides automated threat intelligence by simplifying threat detection through a deep learning-based engine. DataHawk is highly curated and managed by Cohesity, includes indicator of compromise (IoC) threat feeds, and is extensible to third-party integrations with SIEM or security orchestration, automation, and response (SOAR) platforms. DataHawk also addresses compliance requirements by providing one-click access to finding, identifying, and classifying regulated data such as PII, HIPAA, and PCI, and includes over 200 classifiers and more than 50 predefined policies with ML-based pattern matching and recognition. Furthermore, it is capable of identifying impacted backup snapshots, servers, virtual machines (VMs), and files.

Strengths: Cohesity offers a complete end-to-end solution for data protection, consolidation, and management, with a centralized user interface (UI), great overall efficiency, and total cost of ownership (TCO). The latest security and compliance improvements in Cohesity DataHawk further raise the bar and increase the overall value of the solution.

Challenges: The solution, designed for large and distributed enterprise deployments, has a good ROI, but the initial investment may be steep for smaller organizations.

CTERA

CTERA proposes a cloud and on-premises distributed file storage solution incorporating unstructured data management and analytics capabilities. These are delivered through CTERA Insight, an add-on data visualization service delivered as SaaS that analyzes file assets by type, size, and usage trends, and presents the information through a well-organized, customizable UI. Users can drill down to understand which tenants and locations are experiencing data growth patterns and pinpoint the related groups, individuals, and data types.

Besides data insights, this interface also provides real-time usage, health, and monitoring capabilities, encompassing central components and edge appliances. CTERA also implements a comprehensive RBAC system that supports folder and user-based tagging to grant dynamic data access, including geographic or department-based access.

The solution allows enterprises to design their global file system in compliance with data sovereignty regulations through CTERA Zones. With Zones, the global file system can be segmented into multiple data units to prevent data leakage between zones. Users are prevented from accessing any share in the global namespace that does not belong to their defined Zone. Shares can be shared among multiple zones. Administrators can define zones based on the content required by each department and associate a department edge filer to each zone, ensuring that users have access only to relevant data while restricting access to sensitive data across the organization. Another product feature is deploying the solution across multiple cloud providers and performing transparent policy-based data movement between clouds for data locality or financial reasons without impacting front-end access to the data. Compliance adherence has been enhanced with Cloud Storage Routing (CSR), a feature that allows organizations to further enforce data sovereignty laws by routing traffic to the right area and network.

The solution includes a broad set of security features, including audit trails, authentication mechanisms (including two-factor authentication), antivirus scanning, granular versioning, and immutability. In 2023, the company will release CTERA Ransom Protect, an AI-based ransomware detection and prevention system that detects ransomware within 30 seconds based on behavioral analysis, blocks the offending users, and sends relevant warnings with audit trail information. CTERA continues to offer integrations with Varonis to deliver capabilities in multiple areas, including data classification (regulated, sensitive, and critical data), security analytics, deep data context and audit trails, and security recommendations.

Besides the ability to perform transparent policy-based data movement described above, administrators can use CTERA Migrate—a built-in migration engine—to discover, assess, and automatically import file shares from network-attached storage (NAS) systems. Existing file systems are supported via native file synchronization and share capabilities on Windows, Linux, macOS, and mobile devices. When customers use AWS S3 as the back-end object storage for CTERA, the solution can use AWS S3 intelligent tiering, which allows data to be moved between the S3 frequent- and S3 infrequent-access tiers, helping organizations achieve further cost savings. CTERA also can be deployed entirely on-premises in a fully private architecture that meets the stringent security requirements of several homeland security sector customers.

Strengths: CTERA combines proprietary data insights, advanced geo-compliance features, and a strong focus on security and cyber resiliency, including AI/ML-based anomaly detection capabilities. It has a strong roadmap of security features.

Challenges: Some compliance capabilities such as full content indexing and data classification are missing.

Datadobi

In 2022, Datadobi released StorageMAP, an unstructured data management platform that integrates the data analytics and data migration capabilities of its former standalone solutions, DobiMigrate and DobiReplicate.

StorageMAP is an agentless solution that provides a broad set of infrastructure-grade data analytics and data management capabilities such as data discovery, reporting, tagging, data migration, data protection, archival functions, movement, and deletion. The solution scans data sources, collects metadata, and aggregates collected information in dashboards, where it can be analyzed further by individuals who can take action.

Metadata analytics is an important component of the Datadobi StorageMAP solution. The solution can analyze system metadata tags to provide data insights, help with classification, and subsequently trigger actions on the selected datasets. In addition, StorageMAP now supports manual tagging up to the folder level. These tags can be used for classification and action. Datadobi’s policy engine can pick up action tags and programmatically execute actions based on previously defined policies. The policy engine allows the definition of multiple parameters, including frequencies of copies, schedules, and exclusions.

Datadobi also provides advanced filtering capabilities with the Datadobi Query Language (DQL), an SQL-like scalable query language capable of performing very granular and specific queries based on any criteria a customer may prescribe. DQL is a building block for task automation and allows users to query data sets and plan subsequent activities. A future addition to StorageMAP (in the short-term roadmap) will add actionable insights based on search results.

Although StorageMAP is not a governance product, the solution includes mechanisms to identify dark data and orphaned data, allowing administrators to isolate those data sets and look for potential owners or make decisions about further actions, potentially reducing the organization’s risk footprint caused by a lack of data ownership. The solution can be extended with third-party solutions (via API integration) for full content indexing and data classification.

One of Datadobi’s greatest strengths lies in its orchestration capabilities. The solution includes support for policy-based data movement as well as replication and migration capabilities. StorageMAP fully supports file-to-file, file-to-object, and object-to-object migrations and provides support for a broad ecosystem of on-premises and cloud platforms. Besides cross-vendor NAS migrations, the solution also supports concurrent multiprotocol access as well as specific use cases for which WORM data attributes must be maintained. Data replication is also supported and allows organizations to create failover copies or golden copies of their data protected by air-gapped network connections. The solution enforces data integrity through file-level verification, advanced integrity protection, and chain-of-custody reports, providing evidence that the target data is identical to the source.

The administration UI offers an intuitive experience by providing insights into migration activities, including operations per second and bandwidth utilization, while also reporting on performance issues and errors. The UI includes several FinOps features that can track costs at the dataset level and measure their respective carbon footprint (CO2 emissions). Currently, those figures have to be provided manually, but Datadobi provides guidelines to its customers on how to calculate those costs. Financial and predictive cost simulations are on the roadmap.

Security capabilities are currently limited; there are no particular features to support anomaly detection or ransomware protection (although the solution can be used to replicate or migrate data to immutable storage tiers). Similarly, Datadobi is not using AI/ML currently to assist with event trends, real-time recommendations, or content analysis/metadata augmentation. Although the company does not share its roadmap, it is highly probable that Datadobi is privately exploring AI/ML enhancements to its solution.

Strengths: StorageMAP is a significant evolution for Datadobi and includes excellent data orchestration capabilities. Metadata analytics and global index and search features are very good and include DQL, a unique query language that can be used for automation activities. It has noteworthy features for orphaned and dark data detection that decrease an organization’s risk surface.

Challenges: File analysis capabilities could be further developed to provide better support in the context of compliance, governance, and adherence to regulatory requirements. Although not a focus area for Datadobi, there are no particular security capabilities.

Data Dynamics

Data Dynamics offers a complete unstructured data management solution built around three products: StorageX (data location optimization and enterprise data migration), Insight AnalytiX (data privacy risk classification), and ControlX (data exposure risks, compliance, and remediation). StorageX allows organizations to manage unstructured data at petabyte scale across storage systems and locations, including cloud-based storage, with features such as data discovery, classification, tagging, and augmentation; it supports a broad set of data movement options and policy-based management capabilities.

The solution gathers all metadata and stores it in a highly scalable database. Then, using custom tagging, the customer can merge or split data analysis as needed.

StorageX analyzes data across storage systems and performs automated metadata tagging and metadata augmentation based on various criteria: tags can be added automatically based on criteria such as file type, file content, or file name and folder expressions, but alternatively, administrators can define and apply custom policies.

StorageX is complemented by Insight AnalytiX, a data privacy risk classification solution that recognizes files containing PII across more than 200 known file types. Privacy Risk Classifier currently recognizes over 80 different types of PII; the solution combines pattern recognition technology, keyword recognition, and AI. It works in coordination with StorageX and fetches dataset information from StorageX by building advanced multilevel logical expressions and a combination of logical operators, then proceeds to stream and analyze data to identify both PII and potentially risky content.

Once analysis is complete, the solution offers templates to view the analyzed data and allows users to download reports in various formats. The report is powered by deep analytics (both descriptive and diagnostic) to help enterprises get a clear understanding of the risk that exists and an easy means of quantifying it. Both StorageX and Insight AnalytiX support RBAC, boast an intuitive UI, and support full-text search functionality.

ControlX integrated with Insight AnalytiX gives enterprises the ability to proactively mitigate risk and adhere to compliance regulations. It provides scalable security remediation and enables users to quarantine at-risk datasets and re-permission files intelligently and to create an immutable audit trail backed by blockchain technology. ControlX’s file control operations can be integrated into an enterprise’s existing environment service management, data management, and governance workflow automation via RESTful APIs.

The solution is policy-based and supports multiple data copy and data movement scenarios. Data sets can be used to create data lakes for big data analytics applications; age and last-accessed criteria can also be used as the basis for data tiering policies, which can automate data placement into cheaper storage tiers.

Strengths: Data Dynamics offers a robust, policy-based, unstructured data management platform that embeds outstanding metadata augmentation capabilities, broad storage solution coverage, outstanding data movement/tiering options, and a solid data analytics privacy risk classification and compliance solution. ControlX provides a quarantine option for moving files to a specific location and isolating them. The air gap provided by Quarantine helps prevent ransomware attacks on critical files while providing immediate protection.

Challenges: The Data Dynamics platform offers interesting capabilities for data management. Data Dynamics has an opportunity to extend its feature set further and include actionable insights on the data as well as the ability to manage structured data.

Dell Technologies

Dell Technologies offers unstructured data management capabilities through its Dell DataIQ storage monitoring and dataset management software. The solution offers a unified file system view of PowerScale, ECS, third-party storage platforms, and cloud storage, with insights into data usage and storage infrastructure health. DataIQ is software-based and can be deployed either on Linux servers or as a VM. For large deployments involving frequent, sizable data transfers, organizations can offload data traffic and optimize transfer flows with additional components (like DataIQ, Data Mover, or external workers).

Dell DataIQ’s capabilities for analyzing and classifying large data sets across platforms and locations are optimized for high-speed scanning and indexing, with the ability to get search results within seconds, regardless of where the data resides. DataIQ supports metadata tagging of files and datasets. Tagging can be automated or manual, with automated tags applied during regular scan activities based on policies previously configured by administrators. Tags can contain size limits and/or expiration dates for tagged data (with the ability to alert data owners when one of the criteria is met).

DataIQ also provides solid reporting capabilities (including the ability to identify redundant, unused, and dark data) and can provide reports on storage usage by project, teams, or individuals, as well as cost-based reports for chargeback/showback purposes. The solution visually presents some of the data through “data bins,” each of which contains a view of data sets classified by their latest modified and accessed attributes, presenting pools of hot, warm, cold, or frozen data. The time ranges for each bin are customizable and provide a clear view of data categories and data placement optimization opportunities. In addition, DataIQ can be used to provide advanced monitoring capabilities to Dell PowerScale scale-out file system storage.

The DataIQ platform is extensible through plug-ins, including data movement capabilities with the data mover plug-in. It allows the transfer of specific files and data sets across locations and storage systems, and through different source and target protocols to feed relevant data to appropriate applications. Other plug-ins enable the identification of duplicate data (only for file-based repositories), auditing of deleted files, and previewing of files.

From a security perspective, the solution supports RBAC and active directory-based authentication while it implements industry-standard traffic encryption protocols. However, it doesn’t yet have any anomaly detection mechanism to help identify early abnormal user behavior or a potential ransomware attack. Dell CloudIQ usually handles ransomware detection, but it is a separate product and may not always cover the same data and system scope as DataIQ does.

DataIQ offers an API (branded ClarityNow!) that can be accessed directly through Python. In addition, DataIQ’s modular architecture allows the creation of third-party plug-ins, which can tap into the API, and offers a comprehensive developer guide for front-end and back-end plug-in development, and sample code on Dell Technologies’ GitHub repository.

Strengths: Dell DataIQ seamlessly integrates with the Dell Technology storage portfolio, complementing CloudIQ and allowing the monitoring of data across Dell and third-party products as well as cloud-based storage. The solution offers solid reporting capabilities and an excellent open architecture that allows third-party plug-ins.

Challenges: The solution currently lacks anomaly and ransomware detection capabilities. It also doesn’t include any regulatory or compliance features that could facilitate data classification and/or actionable insights related to data privacy laws, e-discovery, or data sovereignty.

Druva

The Druva Data Resiliency Cloud provides centralized protection and management across end-user data sources and is offered as SaaS. By unifying distributed data across endpoints, data center workloads, AWS workloads, and SaaS applications, organizations have a single place to manage backup and recovery, disaster recovery, archiving, cyber resilience, legal hold, and compliance monitoring. This unification minimizes data risks and ensures continuity for employee productivity.

Druva provides advanced metadata analytics based on unstructured data by analyzing data pipelines consisting of hundreds of millions of events per hour and more than 400,000 queries per hour. Data is collected from backup events and then run through big data analytics pipelines to make it queryable. Currently, Druva provides dashboards, giving users summary level information and federated search capabilities (including e-discovery and legal hold queries), but also provides storage insights and recommendations.

The solution offers an easy-to-use and feature-rich management console that provides useful metrics and statistics. Druva implements federated search, a powerful search engine that empowers administrative, security, legal, and forensic teams with enhanced capabilities to conduct global metadata searches across workloads, including Microsoft 365, Salesforce, Google Workspace, and endpoint devices. Various attributes can be used to search, including for email-related information.

Druva doesn’t offer big data analytics capabilities currently (in the sense of allowing data copy actions to create data lakes); however, the company uses big data analytics internally with extract, transform, load (ETL) pipelines to build datasets for its AI/ML solutions and for monitoring its own cloud services.

Druva’s SaaS platform offers a broad set of compliance and security features. As mentioned, the solution supports compliance queries related to e-discovery and legal hold. In addition, Druva monitors unusual data activity to detect potential ransomware attacks. It implements an accelerated ransomware recovery feature that performs quarantine and orchestrated recovery based on curated snapshots. This is a unique way of automatically selecting files in their last known good state to ensure they are not encrypted or infected when recovered. Security-related features include RBAC, strong user authentication, MFA, and multiple security certifications. It can provide access insights on data usage, inform of potential anomalies, and integrate with a rich ecosystem of security, monitoring, and logging solutions. Additional security capabilities include a one-week retention period on deleted backups and the ability to implement 100% immutability on backups, meaning there’s no possibility to delete them even if the retention policies are deleted or altered.

While Druva has no marketplace of its own, the solution provides a full REST API, enabling integration with industry-acclaimed third-party solutions in multiple areas, such as authentication and ITSM (Okta, Splunk, ServiceNow, ADFS, GitHub), e-discovery (Disco, Access Data, OpenText, Exterro), and security (Palo Alto Networks, FireEye, Splunk).

Druva considers AI and ML to be essential capabilities for improving its solutions and differentiating them against competitors. AI/ML are currently used to enhance customer experience with unusual behavior detection and IoC scans, provide content-based recommendations such as file-level storage insight and advanced privacy services, and enhance the underlying metadata. The product capabilities enhanced by AI/ML include ransomware anomaly detection, storage consumption forecasting, and data privacy and compliance features.

Strengths: Data governance and management tools are integrated in a modern SaaS-based data protection solution. It’s easy to deploy and manage at scale, with a simple licensing model, good TCO, and quick ROI. The company has a strong security-focused roadmap and will continue to deliver on these capabilities.

Challenges: The solution may be less appealing to organizations looking for a standalone unstructured data management solution and not considering adoption of a new data protection platform.

Hitachi Vantara

Hitachi Vantara has a comprehensive data management strategy for internet of things (IoT), big data, and unstructured data. When it comes to unstructured data management, Hitachi Vantara offers a broad solution portfolio, including Hitachi Ops Center Protector, aimed at data protection and copy management, Hitachi Content Platform (HCP) object store, and Hitachi Content Intelligence.

Hitachi Content Intelligence offers the necessary features to optimize and augment data and metadata, making it more accessible for further processing through tools like Pentaho (data analytics suite) and Lumada Data Catalog. One of the key features of Content Intelligence is the ability to define policies and actions based on standard and custom object metadata: policies can be related to a variety of actions such as data placement (protection, replication, cost-based tiering, and delivery to processing location), data transformation (anonymization, format conversion, data processing), security, and data classification.

Content Intelligence supports the creation of simple or complex end-to-end workflows that work on-premises or in the cloud. A new object or file can be augmented automatically with application-supplied metadata, scanned for a variety of criteria (for example, identifying PII), and subsequently augmented with classification and compliance-related metadata. It also offers multiple capabilities related to compliance and governance. Besides detecting PII, Content Intelligence can be used for retention management and legal hold purposes; it supports geo-fencing, GDPR, HIPAA, and other regulatory frameworks. These are supported by data disposal workflows, including a built-in system to process RtbF requests, the ability to automatically delete data after retention periods have elapsed, and custom audit logging of disposition activities.

Strengths: This solution framework can be optimized for several use cases, including indexing and search, data governance and compliance, auditing, e-discovery, ransomware, and detection of other security threats. Hitachi Ops Center Protector can be used with a wide variety of sources, including non-Hitachi, virtualized storage systems, while HCP and Pentaho are designed for high scalability and can be deployed in hybrid cloud environments.

Challenges: Hitachi’s ecosystem is designed for large organizations and can be expensive and complicated for smaller ones.

Komprise

Komprise is a compelling data management platform that boasts easy deployment and management to enable rapid ROI. The solution provides data analytics, search, and tagging, the ability to build virtual data lakes, and comprehensive orchestration capabilities across any file and object storage. Komprise is a SaaS-based solution compatible with any NFS and SMB network share, on-premises or in the public cloud, as well as in S3-compatible object stores. Komprise moves data without any changes to user access and preserves file-object duality across multiple vendors.

Recent innovations include Komprise Hypertransfer, a secure proprietary protocol acceleration channel that significantly increases throughput and reduces migration times by minimizing wide area network (WAN) roundtrips and mitigating SMB protocol chattiness. In addition, the solution implements “zones” based on one or more proxies: data is transferred from one proxy to another, and all the activities are performed locally on the given zone proxy to avoid unnecessary communication with the source data and increase security by not accessing cloud file storage over the network during migrations.

Komprise Deep Analytics, an ElasticSearch-powered feature, is capable of indexing metadata and tags across heterogeneous storage systems, whether on-premises or in the cloud, and automatically creates a global file index. Deep Analytics allows the creation of queries to identify specific data sets and create reports and dashboards that let users drill down into data. It also provides actionable insights, such as prompts to initiate data movements from queries, offers data retention capabilities, and can identify dark or orphaned data.

Komprise leverages its global index to find and then securely share data relevant for compliance, legal hold, legal discovery, and retention purposes. The solution comes with rich orchestration capabilities, thanks to its policy engine that takes advantage of both Deep Analytics and Komprise’s Transparent Move Technology (TMT). TMT moves data seamlessly based on customer-defined policies and datasets from deep analytics queries, allowing either data tiering or migration, without disruption to users. This feature is very relevant for big data use cases and allows organizations to copy or move assets related to these analytics queries into a data lake through TMT.

Another feature, Komprise Smart Data Workflows, allows IT teams to automate the process of tagging and discovering relevant file and object data across hybrid storage silos and feeding the right data to cloud services. Data migration and repatriation capabilities take advantage of an analysis-driven migration approach that performs comprehensive pre-checks before a migration activity is started, highlighting potential issues that can impact the migration tasks. While Smart Data Workflows does not include native data classification capabilities, the solution seamlessly integrates with third-party content indexers such as AWS Macie to tag sensitive data and move it accordingly.

Komprise Deep Analytics is capable of identifying anomalous activities and can help provide additional protection against ransomware attacks for unstructured data by tiering cold data into immutable object storage buckets in the cloud. It’s also capable of maintaining secure access by preserving access control and security posture across platforms.

Finally, Komprise leverages adaptive automation (which incorporates ML techniques) to deliver alerts, detect anomalies, and provide reports on data usage, data growth, data costs, and other key metrics.

Strengths: Komprise offers a compelling SaaS-based data management platform that combines simplicity of use with powerful data insights and automation and orchestration capabilities. It has a good roadmap focused on potential automation, reporting, and performance improvements.

Challenges: Content-based indexing, a key capability for compliance and data classification use cases, currently relies on integration with third-party solutions.

NetApp

NetApp offers BlueXP classification, a comprehensive, predominantly business-oriented unstructured data management solution that covers infrastructure-based needs. It performs several types of analysis on storage systems (NetApp and non-NetApp) and their content (including files, objects, and databases), providing insightful dashboards, reports, and guidance for several roles in the organization. Still powered by NetApp Cloud Data Sense, the solution was fully integrated with BlueXP when it was launched in Q4-2022, hence the new solution name.

Based on ElasticSearch, it centrally manages all storage repositories and can scale to hundreds of petabytes. The solution is implemented on the three major cloud hyperscalers (AWS, Azure, GCP) and is also available for on-premises customers as a local setup because of significant demand from NetApp’s installed base. Data can reside on a single server or a cluster of servers, either in the cloud (customer-operated servers) or on-premises, putting organizations fully in control of their data.

Metadata analytics and compliance and classification features include full data mapping, data insights and control over redundant and stale data, the ability to perform advanced data investigation through comprehensive search options, and the possibility of mapping PII across storage systems. Similarly, the solution can be used to search for sensitive data through specific patterns (for example, social security numbers). Organizations can generate legal-ready compliance reports in minutes, with automatically classified data, and can generate reports for privacy risk assessments as well as reports meeting the requirements of HIPAA and PCI DSS.

The solution supports DSARs (usually related to, but not limited to, GDPR and CCPA regulations) to locate human data profiles and related PII. Those capabilities are accessible through a comprehensive yet intuitive UI that seamlessly integrates with BlueXP. In addition, alerts can be created that inform administrators automatically whenever sensitive data is created (for example, when files contain credit card information) or to identify dark data sources (such as large email address lists), helping to achieve better compliance within organizations. The solution also natively supports Azure Information Protection labels, allowing organizations to view and modify these directly in BlueXP classification.

Big data analytics are supported by the solution’s data source consolidation capabilities. Users can create queries to find specific data sets across storage systems, then copy those files to a designated target location, effectively creating a new data subset. Users can take other actions such as deleting or labeling the files, assigning them to others for further investigation and action, and/or creating alerts and policies to automate actions.

An adjacent capability called BlueXP Ransomware Protection leverages the findings from BlueXP classification, including top data repositories by sensitivity and open permissions, to provide a dashboard showing potential areas of vulnerability. The dashboard includes a ransomware protection score along with real-time recommended actions. BlueXP also addresses compliance and security by providing data encryption with Cloud Volumes ONTAP (open networks technology for appliance products).

Orchestration capabilities are natively present in BlueXP and include seamless data movements across clouds and locations as well as policy-based data movement. BlueXP classification leverages AI and ML for automated data classification, data categorization, and contextual deep data analysis.

Strengths: NetApp BlueXP classification provides a compelling set of business-oriented capabilities and comprehensive data source support. The integration of Cloud Data Sense in BlueXP expands the solution’s use case and allows it to better serve infrastructure-driven unstructured data management initiatives.

Challenges: The solution will be less appealing to organizations seeking a standalone unstructured data management solution.

Panzura

Panzura Data Services is a SaaS-based analytics suite compatible with Panzura CloudFS and any NFS or SMB compatible file repository (including NetApp and Dell PowerScale/Isilon), both on-premises and in the cloud. It offers a complete view of storage infrastructures, including resource utilization, file auditing, and global search, while enabling enterprises to analyze trends and get complete reports about the file systems and data stored in them.

The solution takes a snapshot of all data every 60 seconds and incorporates it into a metadata catalog that provides comprehensive information about files, owners, access frequency, and data growth. Panzura Data Services provides a simple and easy-to-use management interface that includes free-text search and a broad set of filters; searches can be saved for future use. Search includes file recovery capabilities, soft user quotas, and data analytics.

Data analytics presents information about hot, warm, and cold data; filters it by age, size, storage distribution, file type, and file size; and provides insights about how data is distributed. In addition, it shows how data grows daily, helping organizations to understand growth patterns and identify potential spikes. The solution includes monitoring capabilities and can report latency issues or spikes in CPU usage. Currently, metadata tagging and augmentation are unavailable, but the capability should be implemented in 2023.

Panzura Data Services provides comprehensive auditable information for data stored on CloudFS about several user activities such as data copy, file and folder creation, file system operations (lock, write, move, read, deletion, rename), and changes in attributes and permissions. This information is accessible through the same search mechanisms highlighted previously, using filters to refine a search by audit action and date range or user, and the solution can return millions of results in under a second. Auditing capabilities can be used to identify violations of regulatory compliance mandates, such as data sovereignty legislation, for example, if files are copied or moved to or from geo-restricted storage systems by end users. Search capabilities can also be used to rapidly identify and retrieve data impacted by legal hold notices.

Besides comprehensive auditing capabilities, the solution implements various anomaly detection mechanisms used for ransomware detection and protection. When a suspicious activity that follows ransomware patterns is detected, Panzura Data Services can identify, alert, and shut down access to data repositories to prevent further damage.

Taken together, the features contribute to improving overall storage infrastructure TCO.

Strengths: The solution is straightforward and effective; it can be deployed in minutes and strips hours from time-consuming IT tasks such as legal holds. Even more so, support for ElasticSearch and Kibana increases the number of use cases and possibilities the platform offers.

Challenges: An area of improvement for Panzura is developing additional capabilities to serve big data analytics use cases better.

6. Analysts’ Take

Data ubiquity makes it nearly impossible to manually curate large sets of unstructured data scattered across both an organization’s premises and public clouds. A growing number of enterprises are looking at management solutions to minimize costs and increase control over critical security and compliance functions.

Compared to 2022, the demand for security and cyber resiliency capabilities has risen sharply, leading unstructured data management vendors to implement better detection and protection mechanisms against malicious actors, insider threats, and ransomware attacks. These vendors are increasingly adopting AI/ML to bolster their anomaly detection algorithms and provide better security options to users.

Also showing heightened demand in 2023 are compliance solutions. Once dominated by GDPR, CCPA, and HIPAA, regulatory compliance is now a complex and foggy landscape in which organizations can rapidly get lost without tools to automate adherence to regulations. While full-content indexing is essential (another area where AI and ML can significantly improve results), solutions that go above and beyond by implementing automated classification and tagging, policy-based data movement, and processes for handling DSARs will deliver incredible value to organizations.

Another area worth mentioning is data orchestration. Besides policy-based data movement and the more classical data migration use cases, data repatriation is gaining momentum as organizations reassess their IT budget spend and perform cost-benefit analysis of their cloud strategies. Even if data repatriation could fall under the scope of data migration activities, there is additional complexity that needs to be addressed, including the costs of repatriation (egress transfer fees, data retrieval fees, and so on).

Unstructured data management solutions are varied in nature because of natural overlaps between disciplines such as security, data protection, and storage. Nevertheless, we will continue to witness more interactions and cross-pollination among these ecosystems due to the obviously data-centric approach taken by unstructured data management solutions, an evolution that will ultimately benefit organizations and users.

7. Methodology

For more information about our research process for Key Criteria and Radar reports, please visit our Methodology.

7. About GigaOm

GigaOm provides technical, operational, and business advice for IT’s strategic digital enterprise and business initiatives. Enterprise business leaders, CIOs, and technology organizations partner with GigaOm for practical, actionable, strategic, and visionary advice for modernizing and transforming their business. GigaOm’s advice empowers enterprises to successfully compete in an increasingly complicated business atmosphere that requires a solid understanding of constantly changing customer demands.

GigaOm works directly with enterprises both inside and outside of the IT organization to apply proven research and methodologies designed to avoid pitfalls and roadblocks while balancing risk and innovation. Research methodologies include but are not limited to adoption and benchmarking surveys, use cases, interviews, ROI/TCO, market landscapes, strategic trends, and technical benchmarks. Our analysts possess 20+ years of experience advising a spectrum of clients from early adopters to mainstream enterprises.

GigaOm’s perspective is that of the unbiased enterprise practitioner. Through this perspective, GigaOm connects with engaged and loyal subscribers on a deep and meaningful level.

8. Copyright

© Knowingly, Inc. 2023 "GigaOm Radar for Unstructured Data Management: Infrastructure-Focused Solutions" is a trademark of Knowingly, Inc. For permission to reproduce this report, please contact sales@gigaom.com.