The graphical processing unit (GPU) was originally invented for the purpose of gaming and other graphical intensive environments. GPUs perform vector calculations efficiently and support a high degree of parallelism. Given that, GPUs are used today to support workloads such as genomic sequencing or CryoEM in life science, fraud detection and risk assessment in financial services, as well as AI/ML workloads such as self-driving cars and more.

GPUs require high volumes of data in high-performance fashion. To ensure that storage is not the bottleneck, engineers traditionally have placed the data on a GPU server’s local storage, which is expensive.

What is NVIDIA® GPUDirect® Storage?

GPUDirect Storage is a groundbreaking technology from NVIDIA that allows storage partners like WEKA to develop solutions that offer two significant benefits:

  • CPU and memory bypass–Traditionally, the central processing unit (CPU) loads data from storage to GPUs for processing, which can cause a bottleneck to application performance because the CPU is limited in the number of concurrent tasks it can run. GPUDirect Storage creates a direct path from storage to the GPU memory, bypassing the CPU and memory complex and freeing the sometimes-overburdened CPU resources on GPU servers to be used for compute and not for storage as well a preventing memory copies, thereby potentially eliminating bottlenecks and improving real-time performance.
  • Increased availability of aggregate bandwidth–Using GPUDirect Storage allows storage vendors to effectively deliver considerably more throughput.

Why Legacy Storage Solutions are not Sufficient for Achieving Maximum GPU Performance

Traditional parallel file systems such as Spectrum Scale, Lustre, Hadoop, and others support highly coordinated concurrent access from multiple clients while featuring optimized I/O paths for maximum bandwidth. As a result, these files systems can increase throughput and scalability for large files. However, they are not designed with smaller files in mind and generally do not offer the same performance associated with large files when working on smaller files.

In addition, these file systems are 20 years old and were not architected and designed for advanced technologies, such as GPU processors and NVMe flash memory. At the same time, these parallel file systems are extremely complex because there are many moving parts (metadata servers, multiple storage targets, tunable system parameters, etc.) that require ongoing optimization to run at peak efficiency. Data management in such environments is complicated and constantly ongoing, making it a specialized task that is typically beyond the capability of a traditional storage administrator.

As a result, management of such installations requires dedicated, skilled architects and administrators, which can be a non-trivial expense.

Inadequate performance from the storage systems results in poor GPU performance. GPU-based servers can cost between ~$50k and ~$200k each, an additional non-trivial expense. If the data-hungry GPUs are starved for data, and compute cycles remain idle, that adds up to inefficient use of expensive resources. As such, parallel access for POSIX, NFS, SMB, and HDFS protocols becomes the key, with GPUDirect Storage improving on these as well.

In order to fully saturate the GPU servers’ storage requirement, the storage solution needs to be able to accomodate for massive single client performance as well as aggregate performance in the 100s of GB/s, as well as massive amounts of IOPS that are required in some of these AI/ML pipelines.

How WEKA Enables Groundbreaking Productivity with GPUs

When designing modern workload environments such as AI/ML, big data, IOT, financial analytics, and genomic sequencing, the most relevant consideration is the overall pipeline time. This pipeline time can include the initial extract, transform, and load (ETL) phase, as well as the time it takes to copy the data to the local GPU servers’ local storage or possibly only the time it takes to train the model on the data. As we know, storage performance is an enabler for improving the overall pipeline time by accelerating or completely removing the need for some of these steps (e.g. copy the data to GPU server local storage). Therefore, a storage solution that can provide the required performance to the GPUs and even more than that to a cluster of GPU servers in a fast and efficient manner can reduce the total pipeline time significantly. A Storage solution that supports GPU Direct Storage can potentially improve on the overall pipeline time even further. Thereby allowing researchers to perform significantly more pipelines (e.g. epochs) over a shorter period of time.

File System Throughput (GBytes/Sec) IOPS (random read 4kb)(GBytes/Sec)
Local NVMe
RAID 0 8 Drives
8 Drives
20.6 GB/sec 1,900,000
WEKA with DGX2 23 GB/sec 400,000
WEKA with DGX-A100 using GPUDirect Storage 162 GB/sec 970,000

Table 1: Performance metrics comparing local NVMe, WEKA, and WEKA with GPUDirect Storage

Conclusion

The combination of WEKA and NVIDIA GPUDirect Storage allows customers to use their current GPU environments to their maximum potential, as well as to accelerate the performance of their future AI/ML or other GPU workloads. Data scientists and engineers can derive the full benefit from their GPU infrastructures and can concentrate on improving their models and applications without being limited by poor storage performance and idle GPUs.

Finally, to achieve an effective production environment using GPUs, organizations need to follow a defined process that contains the assessment, the pilot program, and the scalability plan for anticipated workloads. Download this white paper and learn about The Things You Should Know When Assessing, Piloting, and Deploying GPUs.

Additional Helpful Resources

Microsoft Research Customer Use Case: WEKA and NVIDIA® GPUDirect® Storage Results with NVIDIA DGX-2™ Servers
GPU for AI
How to Rethink Storage Architecture for AI Workloads
How GPUDirect Storage Accelerates Big Data Analytics

Story telling

Things to know when Assessing, Piloting, and Deploying GPUs

Scaling your GPU deployment