Object Storage – Time for Differentiation and Diversification

Beyond the Traditional General-Purpose Approach

I recently started to work on new research about object storage, and I’ll be writing a report about it soon. As you may have noticed, object storage has finally found its place in enterprise IT. There are several reasons for that, but the most important is the success of Amazon AWS S3. In fact, the popularity of this public cloud storage service influenced the development of both standard access methods and applications that can take advantage of it.

Thank You AWS

S3 is not 100% “standard,” meaning that S3 is not regulated by organizations like SNIA, and Amazon is not very open when it comes to sharing information with others – especially if “others” are potential competitors or private cloud players. Also, private cloud is a non-existent word in Amazon’s vocabulary. This didn’t change the result in the end, many small and large players started to implement their solutions starting from the APIs trying to match AWS S3 behavior as much as possible, and most of them did it successfully in the end.

AWS S3 was launched in 2006, it was the first service for Amazon and, still today, it is one of the most used services and, probably, profitable. In one way or the other, directly or indirectly, practically every one of their customers uses it. At the same time, with more and more organizations looking at public and hybrid cloud, many hardware and storage vendors added S3 interfaces to their products and now S3 clients are practically available from every solution that needs a secondary storage option (and in some cases S3 is considered the only storage option).

Private Clouds and S3-Compatible Object Stores

Cloudian was probably the first vendor claiming a 100% compatibility with AWS S3 and, still today, it probably remains the best and most complete implementation. That said, S3 compatibility is no longer a differentiator and there is a subset of the S3 API that is now very stable and implemented by all vendors. Yes, some APIs are still more complicated to implement than others, Lifecycle management is an example, but this is mostly due to dependencies with other Amazon services and how object store vendors decided to map these S3 API and functionalities to their product architecture.

Following AWS S3 success in the public cloud, most object storage vendors fought for winning in the private cloud. They all worked to build general-purpose object stores with the idea of building an on-premises replacement for S3. The reality is that only a few of them really saw success while all the others struggled. Why did this happen?

There are some factors to take into consideration if we want to understand why on-premises object stores didn’t storm the enterprise IT landscape in the past:

      • Hard to adopt: Object stores are all scale-out systems, $/GB is key for this type of storage and in smaller configurations the cost of CPU, RAM, and even low capacity hard disks drove up the price to a point that the cost was higher than the benefit.
      • File interface: While S3 was the final goal, most enterprise legacy applications are written with files in mind. Most file interfaces on top of object stores were poorly implemented lacking features, compared to enterprise NAS, and performance, to make them usable.
      • Lack of flexibility: An object store should be a storage infrastructure on which many different applications and on which clients can store data. Unfortunately, the reality is that many vendors didn’t implement the multi-tenancy and security features to make this happen. Additionally, in many cases, most of these object stores had a very rigid architecture and were not able to support multiple types of workloads and file sizes concurrently and efficiently.

Diversification and Differentiation

In the last couple of years, many vendors finally learned something from their mistakes and decided to take a different approach. In fact, they started to differentiate their products from each other and add functionalities specifically tailored for specific types of markets and users.

Practically, all object stores now have similar basic characteristics and all of them are now able to work as a low-cost storage repository for high capacity, low performance, secondary workloads. Most successful ones do more though, and the difference comes from the specialization. You can find at least three categories to describe them:

      • Enterprise object stores: This type of object stores implement all those multi-tenancy and security features necessary to build a private cloud storage infrastructure, and they work pretty well in hybrid cloud scenarios. They have a large solution ecosystem, strong enterprise-grade support, and a good file interface as well. In this category, we will also find all the products that are usually selected by tier-2 cloud providers for their infrastructures.
      • High-performance object stores: More and more organizations and developers use S3 APIs in their applications. Although not all of them need huge capacities, performance is becoming an important requirement. This category of object stores is particularly optimized for workloads that require throughput and/or many operations per second, these also take advantage of any hardware resource at their disposal, including NVMe flash memory for example.
      • Specialized object stores: These object stores have implemented features that are designed for specific industries and use cases making their product highly optimized for specific workloads. These products come with specific APIs, plug-ins, or else, to build a seamless and efficient connection between the object store and the applications that use it.

If we look at the market landscape in this way, you’ll find that most object stores available in the market are a good fit for one of these categories, and most of them can also be partially placed in one of the others, but they never cover all of them.

Minio and Caringo are two good examples of this categorization. The first is really optimized for performance in big data analytics and ML workloads, it can take advantage of NVMe to speed up IO operations, offers strong consistency, and is actively working on integrations with common analytics and AI tools. The latter has announced a new version of its product yesterday, aimed at improving some aspects of video archiving (like, for example, the ability to fetch only a portion of a video stored a single object), making this solution very efficient for workflows that are pretty common in the media & entertainment industry.

Takeaways

Object storage is finally getting the attention it deserves in enterprise IT. It’s not only about the $/GB, but also because many applications are now able to take advantage of it.

General-purpose object stores are not easy to build, except when you need low performance with high capacities. The majority of object storage vendors finally understood, and are now focusing on one of the categories described above.

Do you want to know more? How the vendors are categorized? And how to evaluate an object store for your organization? My next GigaOm report, “Key Criteria for Evaluating Object Stores,” is set to publish (Date) and will cover all these topics and more. Stay tuned!