The World’s Fastest GPU requires the World’s Fastest File System
Barbara Murphy. April 2, 2018
It was a hectic week at the Nvidia GPU conference. Jensen Huang did a marathon 2+ hour presentation on all the innovation that Nvidia is doing around deep learning.
Probably the most interesting announcement was the new DGX2 which Jensen profiled as the world’s largest GPU, 10x faster than the DGX1. The system sports 16 Tesla V100’s, 81,920 CUDA cores, 8x100Gbit Ethernet or IB networking, 30TB of physical capacity, delivering 2PFlops of compute power and 14.4TBytes/second aggregate throughput. Weighing in at 350lbs it is the super-heavyweight of processing, with a price tag of $399K.
This got me thinking about how to keep a super-heavyweight DGX2 in prime condition. One design element that struck me as odd was the amount of physical capacity the system holds. The system packs a powerhouse of compute punch but the data storage is – let’s say banter-weight at best. With an internal capacity of 30TB of NVMe in a RAID 0 configuration – in other words no data protection – it cannot hold enough data for most of the data intensive analytics, so you can only assume that the system is using local storage as a cache and does not care if drives fail or data is lost. The 30TB is distributed across 16 local NVMe drives so the maximum performance you can expect is between 3 and 5GBytes/second bandwidth into the GPUs.
A single DGX2 is capable of ingesting data at 100GBytes per second (8 x100Gbit Ethernet or EDR InfiniBand) which is going to require a massively parallel high-performance file system to keep the GPU nodes saturated. But when you look at the field of possible candidates only one file system can deliver full performance to a single DGX2 client – WekaIO Matrix.
NFS = Not For Speed
The bandwidth requirements being driven by DGX2 are far beyond the limits of what NFS was built to do. NFS was developed in 1984 for group file shares over a 1Gigabit link, and it did a great job for its intended purpose. But to feed the DGX2 super-heavyweight requires a radical departure from legacy file storage. Pure storage benchmarketing claims to get a maximum of 75GBytes/second – but that is aggregate bandwidth out of their system not into a single DGX2. But let’s suspend reality and pretend they can hit this number over NFS, the DGX2 is still left with 25% more punch that is wasted. With a price tag of $399K that equates to $100K of wasted processing power at best, and more likely double or more in reality.
A little over a week ago, WekaIO demonstrated that it is the world’s fastest file system by delivering a knock-out blow to the very short reigning parallel file system champion IBM Spectrum Scale, doubling the performance IBM could deliver using industry standard servers from Supermicro. Just 6 Supermicro BigTwin servers will deliver over 112GBytes/second saturating a DGX2.
Just 24 nodes of the WekaIO reference architecture is required to saturate a DGX2 with 112GB/Sec of performance, while Pure Storage can never get there to begin with, running out of steam on 75 nodes delivering 75GBytes/second.
At 75 nodes WekaIO would deliver 350GBYtes/second – thats almost 5 times what Pure storage can do with the same amount of hardware.
Just 24 nodes of WekaIO delivers 50% more performance than 75 nodes of Pure Storage, in 44% less space and at 50% of the cost. But more importantly, Pure storage is all tapped out at 75 blades while WekaIO can scale to thousands of nodes and Exabytes of storage. And if you need to grow the namespace, WekaIO can internally tier to disk based storage for best cost storage for the archive catalog.
Put simply, WekaIO packs a knock-out punch with its Matrix software.