As the number of connected devices and IoT sensors multiply, the amount of data generated each year will continue to soar. For many organizations, the question has become: how can we store it and operate it most efficiently?
One school of thought is that we need new technologies that can combine storage and data processing (known as computing storage), to eliminate performance inefficiencies and thereby open up new opportunities for businesses.
To learn more, we spoke to Richard New, VP of Research at Western Digital, who briefed us on the opportunities in computing storage, in which situations it should be applied, and the types of challenges that stand in the way of its widespread adoption.
Tell us about the types of challenges created by increasing data production
Virtually everything we do is driven by smart devices, edge architectures, and high-speed, cloud-connected networks. The Internet of Things (IoT) is one of the main drivers of data and has grown rapidly in recent years, and naturally the amount of data produced correlates with this growth. It is predicted that by 2025, 55.7 billion connected IoT devices will generate nearly 80 billion zettabytes (ZB) of data.
With the proliferation of data comes the challenge of storing, managing and protecting it at scale with the lowest total cost of ownership, as well as making data accessible to turn it into valuable information. As data becomes more abundant and workloads become more complex, the need to improve storage efficiency becomes paramount. Characteristics such as density, access time, total cost of ownership, durability, reliability, capacity, and even moving data more efficiently between compute and storage are important. However, there is no one-size-fits-all approach. What matters most to customers varies and depends on their application, workload, environment and economics.
How are companies dealing with rising data storage costs and do you think they will continue to rise?
Businesses large and small need to consider total cost of ownership (TCO). This may include the above features as well as management and maintenance costs.
Additionally, when it comes to optimizing TCO, you have a choice: you can either add more servers and infrastructure using traditional approaches, or modernize your infrastructure to more efficiently and more increase the capacity or performance of your storage by adopting new architectures. like composable infrastructure or computing storage, or new technologies like zoned storage using ZNS SSDs or SMR hard drives.
As an organization optimizes its data center for today and the future, it is essential to take a multi-faceted approach to creating a more effective and efficient data infrastructure with the lowest total cost of ownership. possible.
Tell us about IT storage and the kinds of opportunities it will create
Moving data between storage and compute is inefficient and can limit system performance. This fundamental bottleneck of data movement is driving new approaches to data storage and management. Computing storage—bringing compute closer to traditional storage devices—is not intended for general purpose, or for every application or use case. It’s about taking a specific problem and creating a purpose-built architecture or platform to solve it more effectively.
Although many computer storage use cases are still being formulated, certain general characteristics make certain classes of problems likely to lend themselves to this approach.
For example, applications that are more I/O-bound than compute-bound, such as analyzing, searching, and filtering data, could benefit from moving a simple compute operation closer to storage to reduce the amount I/O needed to solve a problem.
Also, there are certain classes of streaming problems, where certain operations must be performed on every byte written to or read from a storage device. This class includes applications such as encryption and compression that could be performed efficiently at the storage device during normal I/O operations.
What are other use cases for computer storage?
The broadest definition of computing storage includes both compute operations performed on the storage device itself, as well as compute operations performed by a storage accelerator located near the storage, such as in the same domain PCIe.
This second model – offloading the computation to an accelerator located near the storage – is of course already widely used today, but not always under the name of computing storage.
There is clearly a large class of applications that benefit from offloading computational functions from a main processor to a more efficient processing engine more suited to the specific problem of interest. In the context of storage, one might think that applications such as video transcoding, compression, database acceleration fall into this category. A video transcoding device tightly coupled with a storage device can allow a video server to more efficiently deliver content at many different quality levels while minimizing unnecessary I/O and data transfers throughout the system.
What are the main challenges and how long will it take for computer storage to become mainstream?
There are significant challenges with the computing storage model, which will need to be overcome and addressed as part of future architectures:
Lack of file system context
Most storage devices are block devices, with no file system, and so the device doesn’t necessarily know which blocks are associated with which files. This context must be passed to the device for a computing storage operation to take place.
In some cases, the data stored on the device may already be encrypted, which means that the device must be able to perform data decryption and must be part of the overall system security domain.
Compression and deduplication
Also, the data on the storage device may already be compressed before being stored, so the calculations would require an initial decompression step.
Correction of errors
In many systems, higher level erasure codes are applied across multiple devices. If errors occur while reading the data, there must be a way to invoke these higher level codes in order to retrieve that data to perform the desired calculation.
Any computing storage architecture must consider that data is often spread across multiple devices, so a device may not have all the data it needs to perform a computation.
This set of problems concerns not only computing storage, but also distributed computing architectures in general, including the now classic accelerator offload model that powers much of our AI and machine learning infrastructure. It is hoped that these issues will be addressed as compute architectures evolve to support more disaggregated forms of computing, thus opening the door to disaggregate computing down to the storage device level.
What needs to happen to make computer storage a mainstream reality?
A few key building blocks are needed for IT storage success.
The first step is to further define and narrow down the set of most relevant use cases. Many use cases are still being formulated and are in the early stages, and it is possible that new use cases will come. There needs to be a canonical set of widely accepted use cases and a set of compelling proof-of-concept demonstrations that will drive industry adoption.
Second, the industry needs a well-defined set of standards for computing storage devices and accelerators, as well as a mature software stack. Standardization efforts are well advanced and underway, but much more work is needed at the software level in order to define a set of computational primitive libraries that make sense at the device level.
Finally, computing storage could benefit from better PCIe (peripheral component interconnect express) peer-to-peer activation, to allow accelerators to exchange data quickly and efficiently with nearby storage devices. Here, the emerging CXL standard can play an important facilitating role.
What are Western Digital’s plans for computer storage? What does the roadmap look like?
Our view is that system architectures will continue to evolve to address these fundamental issues of offloading computation and reducing unnecessary data movement. Computer storage will likely be part of the solution, but will likely only be adopted for certain categories of problems in the industry where it makes sense. We believe this evolution will take time to develop, and our plans are focused on enabling the computing storage ecosystem through standardization and software support.
Western Digital is an active member of the NVMe Standards Group and the Storage Networking Industry Association (SNIA), where computing storage standards are being defined. Much of the effort around computing storage standardization has now shifted to NVMe, where Western Digital is a key participant in the NVMe Computer Storage Technical Working Group.
Western Digital works with the open source community to create the appropriate level of software support for computing storage. This includes software mechanisms to offload certain types of computation to a storage device, as well as software libraries to enable basic computational primitives that will make sense to transfer to a device.
Computing storage is closely tied to what Western Digital is already doing with zoned namespaces (ZNS) – connecting applications to storage in smarter ways to drive more efficient and improved performance. Some of the computer storage applications are facilitated by ZNS, which essentially moves part of the FTL (Flash Translation Layer) to the host and combines it with the file system or application layer. There is an advantage, if you are doing computer storage, to having more information about the location of data on the host side rather than on the device side. An example of this is compression, where the ZNS architecture can enable more efficient handling of compressed data by moving the FTL layer down to the host.
Storage fabrics like NVMe-oF enable independent scaling of storage and compute, starting with foundational constructs like blocks, as well as ZNS. Improving the degree of scale further expands the use cases for computing storage because more complex problems can be solved