Why bringing processing to storage devices could be the answer to the data dilemma

Green hosting
(Image credit: Shutterstock/Timofeev Vladimir)

As the number of connected devices and IoT sensors multiplies, the amount of data generated each year will continue to skyrocket. For many organizations, the question has become: how can we most effectively store and harness it?

One school of thought is that we need new technologies capable of combining data storage and processing (known as computational storage), to eliminate performance inefficiencies and thereby unlock new opportunities for businesses.

To find out more, we spoke to Richard New, VP of Research at Western Digital, who briefed us on the opportunities around computational storage, in what situations it should be applied and the kinds of challenges standing in the way of its widespread adoption.

Tell us about the kinds of challenges created by the rise in the production of data

Virtually everything we do is driven by intelligent devices, edge architectures and high-speed networks connected to the cloud. The Internet of Things (IoT) is a leading driver of data and has been expanding rapidly in recent years, and naturally, the amount of data produced correlates to this growth. It is predicted that by 2025, 55.7 billion connected IoT devices will be generating almost 80B zettabytes (ZB) of data.

With the proliferation of data comes the challenge of storing, managing and protecting it at scale with the lowest TCO, as well as having the data accessible to be transformed into valuable insights. As data becomes more abundant and workloads grow in complexity, the need for better storage efficiencies becomes paramount. Characteristics such as density, access time, TCO, sustainability, reliability, capacity and even moving data between compute and storage more efficiently are important. However, there’s no one-size-fits-all approach. What matters most to customers varies, and depends on their application, workload,  environment and economics.  

How are businesses dealing with increased data storage costs and do you expect them to continue to rise?

Businesses large and small need to consider total cost of ownership (TCO). This can include the characteristics above as well as management and maintenance costs.

In addition, when it comes to optimizing TCO, you have a choice: you can either add more servers and infrastructure using traditional approaches, or you can modernize your infrastructure to more efficiently and effectively maximize the capacity or performance of your storage by embracing new architectures like composable infrastructure or computational storage, or new technologies like zoned storage using ZNS SSDs or SMR HDDs.

As an organization optimizes its data center for today and into the future, taking a multi-faceted approach to create more efficient and effective data infrastructure with the lowest possible TCO is critical. 

Tell us about computational storage and the kinds of opportunities it will create

Moving data between storage and compute is inefficient and can limit system performance. This fundamental data movement bottleneck is spurring new approaches to data storage and management. Computational storage – bringing compute closer to traditional storage devices – is not for general purpose usage, or for every application or use case. It’s about taking a specific problem and creating a purpose-built architecture or platform to address it more efficiently.

While many of the use cases for computational storage are still being formulated, there are some general characteristics which make certain classes of problems amenable to this approach.

For instance, applications that are IO-bound more than compute bound, like data scanning, searching and filtering, could benefit from moving a simple compute operation closer to the storage to reduce the amount of IO required to solve a problem.

In addition, there are certain classes of streaming problems, where some operation needs to be performed on every byte that is written to or read from a storage device. This class includes applications like encryption and compression that could be performed efficiently at the storage device level during normal IO operations.

What are some additional use cases for computational storage?

The broadest definition of computational storage includes both compute operations that are performed on the storage device itself, as well as compute operations that are performed by a storage accelerator that is located close to storage, for example within the same PCIe domain. 

This second model – offload of compute to an accelerator that sits close to storage – is of course already in wide use today, although not always under the name of computational storage.

There is clearly a broad class of applications that benefit from offloading compute functions from a main CPU to a more efficient processing engine that is more suited to the specific problem of interest. In the context of storage, we can think of applications like video transcoding, compression, database acceleration as falling into this category.  A video transcoding device closely paired with a storage device can allow a video server to more efficiently stream content at many different quality levels while minimizing unnecessary IO and data transfers throughout the system. 

What are the main challenges and how long will it take for computational storage to become mainstream?

There are some significant challenges with the computational storage model, which will need to be overcome and addressed as part of future architectures:

Lack of file system context
Most storage devices are block devices, with no file system, and so the device does not necessarily know which blocks are associated with which files. That context needs to be passed down to the device in order for some computational storage operation to take place.

In some cases, data stored on the device may already be encrypted, which means that the device needs to be able to perform decryption of the data and needs to be part of the overall system security domain.

Compression and deduplication
Likewise, data on the storage device may already be compressed before being stored, so computations would require an initial decompression step.

Error correction
In many systems, higher-level erasure codes are applied across multiple devices. If errors occur during the reading of the data, there must be some way to invoke these higher-level codes in order to retrieve that data to perform the desired computation. 

Data striping
Any computational storage architecture must reckon with the fact that data is often striped across devices, so any one device may not have all of the data it needs to perform a calculation.

This set of problems is relevant not only to computational storage, but to distributed compute architectures in general, including the now-classical accelerator offload model that powers much of our AI and machine-learning infrastructure.  There is some hope that these problems will be solved as compute architectures evolve to support more disaggregated forms of computation, thereby opening the door to allow disaggregation of compute down to the storage device level.

What needs to happen to make computational storage a mainstream reality?

There are a few key building blocks required to make computational storage successful.

The first step is further defining and narrowing the most relevant set of use cases.  Many use cases are still being formulated and are in the early stages, and it’s possible that new use cases will arrive.  There must be a widely accepted canonical set of use cases, and a set of compelling proof-of-concept demonstrations that will drive industry adoption. 

Second, the industry needs a well-defined set of standards for computational storage devices and accelerators, as well as a mature software stack.  Standardization efforts are well underway and ongoing, but much more work is required at the software level in order to define a set of libraries of computational primitives that make sense at the device level.

Finally, computational storage could benefit from improved PCIe (peripheral component interconnect express) peer-to-peer enablement, to allow accelerators to exchange data rapidly and efficiently with nearby storage devices. Here, the emerging CXL standard may play a significant enabling role.

What are Western Digital’s plans when it comes to computational storage? What does the roadmap look like?

Our view is that system architectures will continue to evolve to address these fundamental problems of compute offload and reduction of unnecessary data movement.  Computational storage will likely be part of the solution, but will most likely be adopted only for certain classes of problems in the industry where it makes sense.  We believe this evolution will take time to develop, and our plans are focused on enabling the computational storage ecosystem through standardization and software support.

Western Digital is an active member of both the NVMe standards group and the Storage Networking Industry Association (SNIA), where computational storage standards are being defined.  Most of the effort around standardisation of computational storage has now moved to NVMe, where Western Digital is a leading participant in the NVMe Technical Working Group on computational storage.

Ecosystem support
Western Digital is working with the open-source community to create the appropriate level of software support for computational storage. This includes software mechanisms to offload certain types of computation down to a storage device, as well as software libraries to enable basic compute primitives that will make sense to transfer to a device. 

Zoned storage
Computational storage is closely related to what Western Digital is already doing with Zoned Namespaces (ZNS) – connecting applications to storage in a more intelligent fashion to drive more efficient and improved performance. Some of the applications for computational storage are made easier by ZNS, which essentially moves some part of the FTL (Flash Translation Layer) up into the host and combines it with the file system or application layer. There is some advantage, if you are doing computational storage, to having more information about the data location on the host side as opposed to on the device side. One example of this is compression, where the ZNS architecture can enable more efficient management of compressed data by moving the FTL layer up to the host.

Storage fabrics
Storage fabrics such as NVMe-oF enable independent scaling of storage and compute, starting with foundational constructs such as blocks, as well as ZNS.  Improving the degree of scale further expands the use-cases for computational storage as more complex problems can be addressed