GPUDirect Storage: How it Works and More

April 16, 2024

What is GPUDirect Storage?

NVIDIA Magnum IO GPUDirect Storage (GDS) enables a direct path between GPU memory and local or remote storage devices such as NVMe or NVMe over Fabrics (NVMe-oF), and GPU memory for data transfers. Moving data bypassing traditional data paths and the CPU reduces latency and CPU overhead in data-intensive applications.

NVIDIA GPUDirect Storage Design

GPUDirect storage from NVIDIA has a few basic features:

Direct Memory Access (DMA) engine. DMA capabilities enable direct communication between GPU memory and storage devices. This bypasses the need for data to be copied through system memory, reducing latency and improving overall system performance.

RDMA capabilities. GPUDirect Storage leverages remote direct memory access (RDMA) technology to efficiently access data stored in remote memory locations without involving the CPU, and transfer it between GPUs and storage devices across the network.

NVIDIA kernel extensions and drivers. These facilitate the integration of GPUDirect Storage and enable efficient data transfer paths between storage and GPU memory.

Coherent memory access. GPUDirect Storage ensures memory access and data between GPUs and storage devices is consistent during data transfers.

GPUDirect Storage requirements are flexible:

GPUDirect storage supports many compatible storage devices, architectures, drivers, and file systems.
Integration with existing storage solutions and frameworks.
Network connectivity for RDMA-based data transfers (in some cases).

GPUDirect storage examples and use cases include:

Accelerated analytics. GPUDirect Storage significantly reduces data loading times to speed machine learning and AI model training and streamline the entire pipeline.
High-performance computing (HPC). High-speed data access supports HPC simulations.
Scientific research. GPUDirect Storage accelerates data processing and analysis for use in genomics, computational fluid dynamics, and climate modeling.
Real-time video processing. GPUDirect Storage can streamline video data transfers, allowing for real-time encoding, decoding, and processing of high-resolution streams.

GPUDirect Storage vs GPUDirect RDMA

Both GPUDirect Storage and GPUDirect RDMA improve performance without burdening the CPU, but they achieve that goal differently.

GPUDirect Storage specifically enables direct data transfers between storage devices and GPU memory to optimize access for GPU-powered applications. It functions without regard to storage location.

In contrast, GPUDirect RDMA is a broader technology that facilitates high-speed data transfers by enabling direct memory access between GPUs and devices and storage systems across the network. It functions based on which device or storage location it is accessing.

Benefits of GPUDirect Storage

Benefits of GPUDirect Storage include:

Reduced latency. Bypassing the CPU in data transfers reduces latency and 2-8 times more bandwidth, resulting in faster data access, more rapid reads and writes, and performance improvement.
Increased throughput. GPUDirect Storage optimizes data transfer paths, allowing for higher throughput between storage devices and GPU memory, enhancing overall system efficiency.
Improved storage scale. GPUDirect Storage enables efficient data access in multi-GPU and distributed computing environments, supporting large-scale applications.
Lower CPU overhead. Direct data transfers between storage and GPU memory reduce CPU overhead, free up CPU resources for other tasks, and improve overall system efficiency.

WEKA and NVIDIA GPUDirect Storage

The architecture of the WEKA Data Platform is designed for scalability, allowing it to efficiently handle growing volumes of data and increasing workloads without sacrificing performance. WEKA’s massively parallel architecture data simultaneous access across multiple storage nodes, increasing data access speed and reducing latency. These high-speed data access capabilities accelerate data processing tasks, enabling faster analytics, model training, and inference for AI and machine learning workloads.

WEKA among the first companies to implement, qualify, and use NVIDIA GPUDirect® Storage (GDS) in 2020. GPUDirect Storage with WEKA enhances system performance, scalability, resource utilization, and data processing efficiency, particularly for data-intensive applications and AI workloads. This combination allows GPUs to directly access data from WEKA, further reducing latency and improving throughput. This acceleration enhances overall system performance, scalability, resource utilization, and data processing efficiency, particularly for data-intensive applications and AI workloads.

The combination of the WEKA Data Platform and NVIDIA GPUDirect Storage allows customers to use their current GPU environments to their maximum potential, as well as to accelerate the performance of their future AI/ML or other GPU workloads. Data scientists and engineers can derive the full benefit from their GPU infrastructure and can concentrate on improving their models and applications without being limited by the storage performance and idle GPUs.

GPUDirect Storage: How it Works and More

What is GPUDirect Storage?

NVIDIA GPUDirect Storage Design

GPUDirect Storage vs GPUDirect RDMA

Benefits of GPUDirect Storage

WEKA and NVIDIA GPUDirect Storage

Share On Social:

Recommended Resources

Related Assets

Optimizing Infrastructure for Generative AI

The Future of GPU Cloud

Checkmate on Checkpoints in LLM Development