How GPUDirect Storage Accelerates Big Data Analytics

Derek Burke. August 20, 2020
How GPUDirect Storage Accelerates Big Data Analytics

Performance of GPU Accelerated AI

GPU-accelerated computing has traditionally been associated with high performance computing (HPC) and more-recently with AI/machine learning. However, RAPIDS suite of software libraries now enables end-to-end data science and analytics pipelines entirely on GPUs.

The performance impact of GPU-accelerated computing & RAPIDS was recently illustrated by NVIDIA’s unofficial TPCx-BB result, where it beat the previous record by 19.5x! TPCx-BB is a big data benchmark for enterprises representing real-world ETL (extract, transform, load) and machine learning workflows. The benchmark’s 30 queries include big data analytics use cases like inventory management, price analysis, sales analysis, recommendation systems, customer segmentation and sentiment analysis.

TPCx-BB benchmark results across 30 queries
TPCx-BB benchmark results across 30 queries

GPU accelerated compute nodes have more than 10x the compute density of CPU-based systems. It is therefore a huge challenge for traditional data storage systems to keep those GPUs busy with data at high performance and low latency. To combat this challenge NVIDIA has a new feature, GPUDirect Storage, that bypasses the CPU & memory enabling GPUs to communicate directly with internal or shared data storage systems.

GPUDirect storage path

GPU-performance-and-ai-storage

[eBook] GPU Performance and AI Workloads

How to Unleash the Latent Knowledge and Commercial Value of Your Data

Download Now

NVIDIA is an investor in WekaIO (Weka), manufacturer of the WekaFS. Today, WekaFS is one of only a handful of storage technologies that support GPUDirect storage. WekaFS also has unique capabilities in its ability to handle small files, random IO and metadata-intensive workloads as simply as it handles large files and sequential IO workloads.

GPUDirect Storage Results

On its recent GPUDirect Storage webinar, NVIDIA chose to illustrate further performance improvements on the TPCx-BB benchmark (separate to the previously referenced results) through the use of GPUDirect Storage and WekaFS. The image below is taken from that webinar and highlights that with WekaFS, GPUDirect Storage provides a further 5x speedup on TPCx-BB results that are the most IO bound.

Accelerating the Data Path to the GPU Webinar
Accelerating the Data Path to the GPU Webinar

These results show that the data storage solution should be considered carefully when transitioning to GPU-accelerated computing. Big data analytics can hit the turbo button with GPU acceleration. But, just like a car’s turbo brings more air into the compression chamber, the right data storage solution can feed more data into the GPUs. WekaFS is GPUDirect Storage ready, container-enabled and built for modern analytics with scale-out performance and scale-out capacity on industry-standard hardware.

WekaIO is seeking beta customers for NVIDIA GPUDirect Storage. If you would like to be considered you can register your interest here.

Related Resources

Video
Video

Snapshots and Data Tiering on AWS

Watch Now
Video
Video

CUBEConversation: Liran Zvibel and Stu Miniman

Watch Now
Use Case
Use Case

Machine Learning/AI Use Case

Download