Are Your GPU Servers Starved of Data?
David Hiatt. November 17, 2017
As Machine Learning and AI become more pervasive in all aspects of business, corporations are deploying tens to hundreds of specialized servers in their enterprise cloud. These servers, which often cost six figures, utilize GPUs from companies such as NVidia and Intel. GPU-based servers are capable of processing thousands of parallel processes, making them ideal for training and inference; however, the challenge is to make sure the expensive GPUs are not idling. In summary, the objective is to make the problem a compute-bound problem, not an I/O-bound problem.
GPUs are being widely adopted in applications such as training systems for autonomous vehicles. The sensor and camera data can be hundreds of TBs per day and grows exponentially. The underlying storage sub-system must have the flexibility to seamlessly increase in capacity under a single namespace, to automatically tier old data to a lower cost storage tier for future interrogation, and to provide the high throughput required regardless of the storage size.
One of our recent design-ins involved an AI application for a training cloud. The complete system has a hundred GPUs deployed in a single cluster. To ensure the expensive GPU systems are kept busy, the I/O demands tens of GBs of bandwidth (BW) at very low latencies. As this cluster grows to a few hundred GPUs in the future, the challenge with BW and latency will become even more pronounced. The customer’s challenge was to design and deploy a storage architecture capable of feeding the GPU cluster today and scale to accommodate future growth without impacting future performance.
Many storage appliances claim great bandwidth performance; the challenge is delivering great performance at low latency and for machine learning applications you should really care about read latency. In the customer’s case, WekaIO Matrix scale out file system running on commodity storage servers was benchmarked against other all-flash NAS appliances and proved to be the only viable storage platform that could deliver high bandwidth at low latency. In tests, Matrix delivered over 4GB/second bandwidth for 1MB files to a single GPU client.
WekaIO’s patent protected technology includes distributed metadata and virtualization technology, which eliminates most of the latencies associated with the OS kernel. The software scales performance linearly, ensuring it can maintain performance as the GPU cluster grows. Finally, its internal tiering capability can migrate old and cold data to low cost object storage located either on-premises or in the cloud for the best economics.
If you have a large cluster of GPUs to power your machine learning and AI applications, then you cannot afford to have them idling. You need a storage system that keeps the cluster saturated. The only viable storage solution that will match your workload needs, while reducing your overall TCO, is WekaIO’s Matrix scale out file system.