What are the Benefits of Cloud File Storage?
In recent years, file systems have become more cloud-friendly, offering better integration with cloud technologies such as object storage. This brings several advantages:
- Better scalability. Policy-driven tiering mechanisms allow cold data to be moved to S3-compatible storage, saving precious resources in the high-performance tier.
- Best combination of speed and $/GB. File storage gateways specifically designed to work with an object storage back-end provide a good balance of performance and cost.
- Simplified data migrations and synchronization. Many file storage systems can replicate data to remote files or object stores in the cloud or on-premises. This makes it possible to synchronize and serve data sets across different infrastructures to optimize compute-data proximity and reduce latency.
- Disaster Recovery (DR). Syncing data to a remote object store enables users to leverage a cheaper storage repository in the cloud and populate a file system only if necessary.
These capabilities are particularly important now that vendors are optimizing their file storage for flash memory and access speed, enabling users to build a multi-tier infrastructure to optimize $/GB across on-premise and cloud-based systems.
What are the Scenarios of Use?
Historically, cloud providers neglected to add file services to their product portfolios, concentrating instead on block and object storage. While these cover many use cases and new applications can be developed to render file storage unnecessary, in many circumstances, files remain preferable:
- Lift and Shift (NAS Replacement): For enterprises opting for the public cloud as their primary IT infrastructure, it is common to see “lift and shift” migrations, particularly of existing NAS systems. In this scenario, users want to replicate their services on-premises, including POSIX-compliant file systems, data services, and other familiar enterprise features.
- High-Performance Workloads: Though object storage performance is improving quickly, file systems still provide the best combination of performance, usability, and scalability for many workloads. It is still the primary interface for most big data, artificial intelligence/machine learning (AI/ML), and high-performance computing (HPC) applications. Usually, it offers data services such as snapshots to improve data management operations.
- Collaboration: For distributed organizations, the ability to access data from everywhere seamlessly simplifies teamwork while keeping data under control. COVID-19 has dramatically boosted the remote work use case. With a sizable user base working in a geographically distributed fashion, distributed cloud file storage addresses the scalability and performance challenges that on-premises storage finds a struggle.
What are the Alternatives?
Alternatives to file storage are block and object storage, but these are not necessarily viable for demanding production environments:
- Block storage can be mounted by a cloud server and used as a local storage area. Once a file system is created, it can be shared on the network with standard tools available in the operating system. This solution is cost-effective but inadequate in scalability, performance, data protection, and management.
- Object storage can be a better alternative, but performance is not always aligned with user needs, and the application must be written to use a RESTful API instead of a file system interface.
What are the Costs and Risks?
Shift of file services to the cloud needs to take into account existing file use and access, to minimize user downtime and so nothing is lost in migration. It is, of course, imperative to ensure all backups are up to date without commencing any data migration activity. While a migration should be non-destructive to the data, applications will need to relink to cloud-based data stores, which should take place carefully and incrementally.
Note that existing cloud-based capabilities are still maturing. Cloud File Storage solutions will continue to develop new capabilities, especially those related to advanced data management such as data classification, compliance, and sovereignty. Other features that are already available but require further improvement are security (notably ransomware protection) and analytics.
30/60/90 Plan
30 days: evaluate use cases and classify them according to priority, scalability, and performance. Most important is to define what fits in the collaboration and distributed storage use cases or the traditional file storage area. Scalability and performance are key aspects of the latter.
60 days: evaluate solutions with research and PoCs. Identify operational costs of the solutions. Group applications and data that need to stay close to each other, for example due to application dependencies or according to users.
90 days: define a migration plan that considers account network limitations and potential egress fees of the cloud provider.