Augmented Hardware and the Future of Software-Defined Storage

How Hardware Acceleration Can Dramatically Improve Efficiency and Performance

One of the most exciting announcements at VMworld this year was Project Monterey. Long story short, VMware wants to offload some high-demand activities to smart network interface cards (SmartNICs) and other types of accelerators. By doing so, the virtualization stack will become more efficient and feature-rich, while enabling resource disaggregation and composability. This description might seem a bit vague, but there are a number of real-world benefits for the user, including increased security and efficiency, as well as better resource management. And all of these happen while also improving overall cluster performance!

The idea is quite simple and similar approaches have been used several times in the past. General-purpose CPUs are very sophisticated now and can do several things in parallel, but like it or not, there are limitations, and some resources are shared. The more varied the tasks a CPU is asked to do, the more it must switch context, producing internal cache misses that delay processing and impair system performance.

From a storage perspective, encryption, compaction, protocol conversions, and all the math around data protection contribute to stress on the CPU, resulting in less overall system efficiency and performance. In this regard, Project Monterey will bring to the table the ability to offload these tasks to accelerators and specialized hardware, enabling the system CPU to focus its potential and increase efficiency to run applications.

Unfortunately, Project Monterey is not here yet and we will need to wait several months to see a productized version of it. On the other hand, this technology is already available and several vendors in different fields are working on the same exact model as Project Monterey.

Software + Hardware-Defined Storage

Software is great, but software that can take full advantage of hardware is better. For many years we had hardware-based storage systems powered by purpose-built CPUs and ASICs, and this was the only way to provide the necessary power to make everything fast enough to work properly. The operating system of the storage array was specifically designed to work with the hardware and squeeze every single bit out of it. Over time, thanks to the increased power of CPUs and networking components we found out that ASICs (and other esoteric accelerators) became practically useless and more and more system architectures started to focus on standard hardware. This brought to the market an increasing number of “software-defined” solutions.

Everything worked pretty well with hard disk drives until the rise of, in order of appearance, flash memory, NVMe, and storage-class memory. It didn’t happen quickly, in fact flash adoption was slow because of its price at the beginning, but things radically changed over the last couple years.

Now we have 100Gb/s Ethernet (if not more!), NVMe and NVMe-oF (shorter data path and massive parallelization), and faster-than-ever storage that can be configured as a RAM extension (Intel Optane). The amount of data that these devices can manage is massive. To keep the storage system balanced and efficient, we need to ensure that every single component can receive the heightened flow of data without bottlenecks. It’s a classic example of history repeating, some would say.

Standard (that is, non-accelerated) software-defined storage systems were able to use general purpose CPUs because:

  • Hard drives were slow (hundred of IOPS)
  • Flash was faster but still manageable (up to tens of thousands of IOPS)
  • Ethernet was relatively slow (10Gb/s)
  • Protocols were designed to deal with hard drives in a serial fashion (SCSI, SATA, etc).

The day we unleashed the power of next-generation memory options thanks to NVMe and faster networks (100Gb/s or more), general purpose CPUs became the bottleneck. At the same time, expanding storage systems needed more powerful and expensive CPUs to operate. At the end of the day everybody wants to go faster, but nobody wants to give up on data protection, data services, security, and data footprint optimization. Would you do that?

Some SDS vendors understood this quickly and started to build a next generation of systems that take advantage of accelerators to do more (and better) with less (power and space).

Software + Hardware Optimized

I want to single out the example of Lightbits Labs here. Lightbit’s LightOS is an innovative NVMe-based scale-out software-defined solution that aggregates NVMe devices on storage nodes and exposes NVMe/TCP as its front-end protocol. It combines low latency and high performance of NVMe-oF (NVMe over fabric) storage with data services on standard TCP/IP Ethernet networks.

The company recently announced a partnership with Intel to take advantage of most of the latest hardware technology from the giant chip-maker:

  • Intel Optane. For fast, non-volatile write buffer and metadata handling
  • Intel Ethernet 800 Series NICs. For optimized low-latency NVMe/TCP optimization
  • Intel QLC 3D NAND SSDs. For better $/GB.
  • And more to come…

Lightbits already demonstrated incredible performance on general-purpose hardware. By adding these technologies to its solution, it is able to further optimize performance while also increasing overall system efficiency and cost. On one hand, Lightbits is offloading a series of tasks to the Intel SmartNIC (with specific optimizations made possible by ADQ technology) while taking advantage of the latest memory options for performance, capacity, and better cost compared to other solutions. For the user, it means better performance, more capacity, and higher overall efficiency alongside a reduced data center footprint that translates readily to better TCO.

What’s noteworthy is that these accelerators can be considered specialized hardware, but they are not custom hardware. In fact, we are talking about off-the-shelf components, not ASICs designed by Lightbits. This is particularly important and gives Lightbits a huge advantage in the long run because it can focus on software development instead of managing an ASIC design. This is something that will benefit Lightbits’ customers as well, because they will have additional options. In fact, they can choose between software-defined (fast, efficient, cost effective) and software-defined with hardware acceleration (faster, more efficient, TCO-focused).

Closing the Circle

If I’ve said it once, I’ve said it a million times. Modern data centers are no longer x86-only. More and more, the most advanced infrastructures now rely on specialized hardware and accelerators like GPUs, TPUs, FPGAs, smart NICs, and more.

Software can already take advantage of these components and Lightbits is a great example of this. Its solution can work on general purpose hardware, but it can work even better with these components and provide better TCO and quicker return on investment at the end of the day.

From the user perspective, hardware augmentation (for lack of a better term) is just software-defined on steroids, and it provides additional options to design solutions that better respond to business needs.

Disclaimer: Lightbits Labs is a client of Gigaom