weka blog detail third level banner

The World’s Fastest GPU requires the World’s Fastest File System

Barbara Murphy
April 2, 2018

It was a hectic week at the Nvidia GPU conference.  Jensen Huang did a marathon 2+ hour presentation on all the innovation that Nvidia is doing around deep learning.

Probably the most interesting announcement was the new DGX2 which Jensen profiled as the world’s largest GPU, 10x faster than the DGX1.  The system sports 16 Tesla V100’s, 81,920 CUDA cores, 8x100Gbit Ethernet or IB networking, 30TB of physical capacity, delivering 2PFlops of compute power and 14.4TBytes/second aggregate throughput.  Weighing in at 350lbs it is the super-heavyweight of processing, with a price tag of $399K.

This got me thinking about how to keep a super-heavyweight DGX2 in prime condition.  One design element that struck me as odd was the amount of physical capacity the system holds. The system packs a powerhouse of compute punch but the data storage is – let’s say banter-weight at best. With an internal capacity of 30TB of NVMe in a RAID 0 configuration – in other words no data protection – it cannot hold enough data for most of the data intensive analytics, so you can only assume that the system is using local storage as a cache and does not care if drives fail or data is lost. The 30TB is distributed across 16 local NVMe drives so the maximum performance you can expect is between 3 and 5GBytes/second bandwidth into the GPU.

A single DGX2 is capable of ingesting data at 100GBytes per second (8 x100Gbit Ethernet or EDR InfiniBand) which is going to require a massively parallel high-performance file system to keep the GPU node saturated.  But when you look at the field of possible candidates only one file system can deliver single client performance to the DGX2 – WekaIO Matrix.

NFS = Not For Speed

The bandwidth requirements being driven by DGX2 are far beyond the limits of what NFS was built to do.  NFS was developed in 1984 for group file shares over a 1Gigabit link, and it did a great job for its intended purpose.  But to feed the DGX2 super-heavyweight requires a radical departure from legacy file storage.  Pure storage benchmarketing claims to get a maximum of 75GBytes/second – but that is aggregate bandwidth out of their system not into a single GPU node.  But let’s suspend reality and pretend they can hit this number over NFS, the DGX2 is still left with 25% more punch that is wasted.  With a price tag of $399K that equates to $100K of wasted processing power at best, and more likely a lot more in reality.

Like a Bat out of Parallel

A little over a week ago, WekaIO demonstrated that it is the world’s fastest file system by delivering a knock-out blow to the very short reigning parallel file system champion IBM Spectrum Scale, doubling the performance IBM could deliver using industry standard servers from Supermicro.  Just 6 Supermicro BigTwin servers will deliver over 112GBytes/second saturating a DGX2.

Here are the vital statistics of a WekaIO reference architecture required to saturate a DGX2 and below that is the maximum scale of Pure Storage Flashblade architecture:-

Doing a side-by-side comparison, WekaIO delivers 50% more performance, with 66% less storage nodes in 44% less space at 50% of the cost.  But more importantly, Pure storage is all tapped out at 75 blades while WekaIO can scale to thousands of nodes and Exabytes of storage. And if you need to grow the namespace, WekaIO can internally tier to disk based storage for best cost storage for the archive catalog.

Put simply, WekaIO packs a knock-out punch with its Matrix software.