File-Based Storage Systems for the Cloudv1.0

A Critical Component for Multi-Cloud Data Storage Strategy

Table of Contents

  1. Summary
  2. Market Framework
  3. Maturity of Categories
  4. Considerations for Using Cloud File Storage Solutions
  5. The Vendors to Watch
  6. Near-Term Outlook
  7. Key Takeaways

1. Summary

With more and more enterprises embracing the cloud and moving their data and workloads between private and public infrastructures, ISVs and service providers have worked to increase the number of storage options for their clients. In the beginning, most providers started by offering block storage, usually attached to virtual machines for primary storage needs, followed by object storage for other workloads. File systems, however, are still the most common way to store large amounts of unstructured data that need to be accessed frequently and quickly, and the number of solutions designed to provide file-based cloud storage is steadily growing.

Most organizations want to replicate the same logical infrastructure they have on their premises to the cloud. The reasons for doing so are quite simple. On one side it eases the migration process while on the other, it helps to bridge the gap between legacy and next-generation applications.

Even though object storage is rising in popularity, file systems, often accessed via network protocols like NFS and SMB, are still the data storage system of choice for a large number of workloads including big data analytics, AI/ML, HPC, and more. Furthermore, legacy applications are usually written to work with a POSIX-compliant file system and having multiple applications accessing the same data sets is quite common as well. Rewriting these old applications to take advantage of object stores is not always a viable option, so  many end users prefer to move their applications to the cloud as-is.

If we take a look at the type of workloads that can effectively take advantage of a file system in the cloud we can split them into two major categories that are characterized by a totally different profile both in terms of resources and performance needs. Common use cases for file systems in the cloud include:

High performance and large capacity: usually associated with big data analytics but now also HPC, I,oT and AI/ML workloads. All these workloads differ in the size of a single data set, access patterns, size of the single file, and so on, but they also have common characteristics, such as continuous capacity growth, high throughput, and low latency needs. Scalability is key and scale-out file systems are the most obvious choice.

Limited capacity: a very common use case that is usually associated with legacy applications that do not actually need large data repositories. Most of these applications, or data sets, are migrated or kept in sync with the cloud primarily for two reasons:

  1. The application has been migrated to the cloud because of a strategic choice, but can’t be refactored to work in a fully cloud-native environment;
  2. Data needs to be accessed concurrently by multiple applications, on-premises and in the cloud.

Scalability is not a big concern for these applications but it is likely that they still need a highly available storage system underneath. In fact, most legacy applications, designed before the advent of cloud computing, rely on highly available and resilient infrastructures to provide the required level of service.
There are other use cases for file systems in the cloud, of course, but most of them are usually characterized by low performance requirements and large capacities, as in the case of archiving for example. For this type of use case, a NAS gateway with an object storage backend is usually more than enough and provides the best $/GB alongside great scalability. Most of the service providers have this option in their service catalog and some solutions also offer file/object parity, helping to simplify the migration path from legacy to modern applications.

The scope of this report is to give an overview of the market landscape and analyze the most important aspects of cloud-based file storage architectures available in the market, while giving indications to help the end user design a modern storage infrastructure that can be the foundation for a solid multi-cloud infrastructure strategy.
In this report we will analyze several aspects of next-gen file services including:

  • Differences between traditional and cloud file systems
  • Pros and cons of FS-as-a-Service or self-managed solutions
  • Important characteristics of cloud-based file storage
  • Applications and use cases
  • File system integration with object storage to lower the $/GB
  • Common functionalities, front-end interfaces, and data services
  • Data migrations and synchronization across clouds.

Key findings (and considerations for adopting the solution):

  • File storage systems are a good compromise between cost and capacity for migrating legacy applications to the cloud. They allow users to avoid or postpone application refactoring while keeping costs under control.
  • Some cloud file systems are designed for high performance, allowing end users to get results faster than with traditional file systems or object storage for high-performance workloads in the cloud, while improving overall infrastructure efficiency and total cost of ownership (TCO).
  • Cloud file systems allow implementation of a two-tier storage strategy enabling the end user to take advantage of object storage and seamlessly move cold data to a cheaper repository, improve data protection and data mobility.