GPUs are at the core of AI/ML workloads. By Using GPUs, AI/ML/DL models can be trained faster to provide better predictions and gain insights. We see the proliferation of GPUs being used in multiple areas, such as autonomous vehicles, and also being used to process massive amounts of images and videos. They enable autonomous call centers by going over huge amounts of recorded data and analyzing speech—even analyzing customer sentiments. GPUs are also used in manufacturing lines to improve yield. The truth is that every industry can leverage GPUs to accelerate their pipelines. The one thing that all of these have in common are the massive amounts of data that need to be analyzed by the AI/ML models to achieve a degree of accuracy that can then be used to inference on new “real life” data and provide insights.

Additionally, there are multiple non AI/ML use cases that benefit from the usage of GPUs. These would be frameworks and applications that are able to offload computations to the huge amounts of cores that GPUs contain, benefiting from the ability to parallelize computations on the GPUs’ cores to complete tasks faster. One example is NVIDIA Clara Parabricks for genomic analysis, which dramatically decreases the time for genomic analysis. With NVIDIA Clara Parabricks running on GPUs, analysis can be completed in under 30 minutes, compared to 30 hours for 30x coverage of a whole genome on CPUs. Additionally, Clara Parabricks’ GPU workflows show a 99.5% variant call concordance rate compared to clinically validated CPU workflows, which backs up its accuracy claims. Other examples include the cryoSPARC platform, which uses NVIDIA GPUs to enable automated, high-quality, and high-throughput structure discovery of proteins, viruses, and molecular complexes for research and drug discovery. Blazing SQL, which is a GPU-accelerated SQL engine; RAPIDS, which is a high-performance data analytics and machine learning framework, and there are many additional examples. It’s worth noting once again that the common factor among all is the massive amount of data they need to interact with during ingest and compute–plus the importance and value that can be provided by a faster time to insight.

Enabling GPUs to constantly ingest new data to their internal memory buffers is a key factor in making sure the AI/ML or GPU-accelerated workloads run efficiently and complete faster, all achieved by making sure that the GPUs are never idle and waiting on receiving new data. By using the just announced NVIDIA Magnum IO GPUDirect Storage technology (GDS), GPU accelerated servers are now able to communicate directly with the GDS enabled storage resources and receive the requested data directly to GPU memory buffers using GPUDirect RDMA. It does so by bypassing multiple internal servers’ components, such as CPUs, memory bounce buffers, PCIe switches, and by crossing NUMA regions. This immediately results in a massive performance improvement in the amount of data that can be read and/or written to the GPUs, since the IO operations can now bypass multiple server internal components that are bottlenecks. With WekaIO we see improvements both in the throughput of the data that the GPUs can ingest as well as in the small IOPS that the GPUs can perform. Both are important in these workloads.

When working with customers on testing NVIDIA GPUDirect Storage with WekaIO storage we clearly see the performance benefits in throughput and small IOPS,but there are more advantages, such as lower power consumption. Because the CPUs are no longer burdened with handling GPUDirect Storage operations, and because the memory bounce buffer is not used for these IO operations as well, the servers do not need to draw power for these. An additional benefit is releasing CPUs back to the applications. Because CPU cycles are no longer wasted performing I/O operations, they are free to perform additional work This translates to CPU cores that can now be used by the applications. This immediately provides a significant cost saving. For example, in a large environment with multiple servers and hundreds or thousands of CPU cores, returning even as low as 10% of the CPU cores to the applications can result in hundreds or thousands of cores that can be exploited by the OS and applications for even more computational throughput.

An additional crucial component that enables this massive data movement is the network interconnect. Most modern GPU servers are already equipped with 100/200Gb interconnects. With InfiniBand or Ethernet, this usually means multiple network adapters per server to allow efficient data movement to the GPUs. With the new 400Gb/s NDR released by NVIDIA, data can now move two or even four times faster than before. An additional advantage is the potential for doubling the port density in the datacenter, since now with a single Top of Rack (ToR) switch you can grow to 128 connections of 200Gb/s.

By combining a fast network interconnect layer such as the new NDR networking components with the I/O acceleration of NVIDIA GPUDirect Storage as implemented in WekaIO, multiple industries can effectively decrease costs, and perform more work in a shorter period of time, resulting in improved time-to-business results.with a protocol that eliminates I/O bottlenecks, such as NVIDIA GPUDirect Storage with WekaIO that can utilize these technologies, multiple industries can effectively decrease costs and perform more in a shorter amount of time, resulting in an improved time-to-business result.

Additional resources: