Compute Densification Is Driving Storage Innovation for NVIDIA GPU Workloads
Barbara Murphy. November 18, 2019
Barbara Murphy, Vice President of Marketing at WekaIO, addresses how compute densification is driving storage innovation for NVIDIA GPU workloads, highlighted by the announcement that WekaIO is incorporating NVIDIA Magnum IO Technology into Weka products. She provides her insight on this key industry trend in this blog titled “Compute Densification Is Driving Storage Innovation for NVIDIA GPU Workloads”.
Today we are delighted to announce that we are incorporating NVIDIA Magnum IO technology into Weka products to deliver new innovation in storage and high performance for NVIDIA GPU-accelerated workloads. A key element of this new software stack is GPUDirect Storage, specifically designed for server platforms leveraging NVIDIA GPU technology. GPUDirect Storage is one element of NVIDIA Magnum IO, a suite of software that delivers up to 20x faster data throughput on multi-server, multi-GPU computing.
You may ask why this technology announcement is so important. The answer lies in how NVIDIA has changed the compute landscape with its GPU-enabled systems. The NVIDIA V100 Tensor Core GPU is a massively powerful data center accelerator, and as part of a compute system, it delivers the same compute power of many racks of CPU-based servers. This densification of compute resources into a single system has delivered incredible acceleration to science and research but has exposed the challenge of feeding these data-hungry machines with enough data.
Recent years have seen an explosion in the generation of data from a variety of sources: connected devices, IoT, analytics, healthcare, smartphones, and much more. In fact, as of 2016, 90% of all data ever created had been created in just the two preceding years. Gaining insights from all of this data presents a tremendous opportunity for organizations to further their businesses and expand more quickly into new markets. NVIDIA GPUs are ideal for delivering incredible processing power to applications that are computationally intensive, in areas such as machine learning, finance, medical research, atmospheric modeling, and other core scientific research. However, the value of the insight is tightly coupled to the source data that the GPUs need to process; and those data sets are monstrous in size. Individual systems have limited storage capacity, and when applications call for large data sets, they require a network-attached shared storage system to support the demanding workloads.
This has put the underlying IT infrastructure under tremendous stress across organizations because NVIDIA GPUs consume data much faster than CPU-based systems and have incredible IO bandwidth demands. Legacy storage solutions were designed to address the demands of CPU-based systems, and cannot handle the IO workloads of a supercomputer. They can neither feed data into compute resources fast enough, resulting in an inability to complete work on time, nor can they scale to petabytes of capacity when performance counts. NVIDIA Magnum IO uses a suite of software that is optimized for network and storage IO performance, enabling data to bypass CPUs and travel on “open highways” offered by NVIDIA.
Figure 1: Courtesy of NVIDIA. The path between GPU memory and NVMe drives uses a bounce buffer in system memory that hangs off the CPU. The Direct data path from storage gets higher bandwidth by skipping the CPU altogether.
A key feature of Magnum IO is NVIDIA GPUDirect Storage, a technology that removes many of the bottlenecks that slow down IO between the persistent storage and the GPU processor, by eliminating the need for CPU involvement in the communication loop. GPUDirect Storage, outlined in Figure 1, avoids extra data copies to the CPU memory and enables direct access to the external storage system without burdening either the CPU or GPU. The result is increased overall system performance and efficiency by eliminating the IO overhead.
An essential part of delivering GPUDirect Storage is the ability to support the InfiniBand protocol, which most legacy storage systems do not. The Weka file system (WekaFSTM) has native support for InfiniBand was able to quickly add GPUDirect Storage support. In our most recent testing, outlined in Figure 2, WekaFS demonstrated that it could fully saturate multiple 100Gbit InfiniBand connections over GPUDirect Storage, showing near-linear performance scaling across 8 Mellanox 100Gbit EDR InfiniBand network connections, NVIDIA GPUs, and networking and storage devices.
Figure 2: Performance measurements from WekaFS to GPUs utilizing GPUDirect Storage
WekaFS and GPUDirect Storage will provide valuable performance improvements as customers scale beyond a single system to larger configurations. A single data set can be shared across hundreds of GPU-based systems while maintaining maximum performance to the data-hungry GPUs.