If you have been reading any of my blogs you can see that I am on a soap box to let the world know how bad NFS is for any performance oriented applications. But let’s start at the beginning, the NFS protocol was developed by Sun Microsystems in 1984 to allow client computers to access data over a computer network. I want to be clear, I think NFS is a great protocol and has withstood the test of time for general purpose file sharing. But the computational power of a single server has gone up 100x with the addition of GPUs, – such as NVIDIA’s TeslaV100- that pack tens of TFLOPS of computation power into a single card. It requires an equally powerful storage architecture to feed that beast, and NFS simply cannot keep up with the I/O demands of AI and machine learning applications, delivering at best 1.5GBytes/second of bandwidth per network link.
So you can imagine my reaction when I came across an archived article from 1989 in Unix News, titled “Auspex NFS Server Separates NFS From UNIX Kernel to Increase I/O”. So it’s not just a recent phenomenon, NFS has been problematic since its inception.
The article reads “The growth in raw processing power of Unix workstations has not been accompanied by a corresponding growth in the network I/O processing power of the workstations that have been designated as file servers. The result is that the CPU power of the workstations runs into a bottleneck at the network level, unable to get files into and out of network storage fast enough to keep up with the workstations.”
Wow! All you have to do is replace the word “Unix workstations” with “GPU servers” and the entire rest of the paragraph applies to today’s challenges with the NFS protocol for AI and machine learning.
The problem in 1989 as defined by the Auspex team was that “NFS file servers spend about 85% of their time dealing with NFS traffic instead of processing applications” – a second “Wow”! When we talk to customers who have a metadata intensive workload, NFS operations can bring the application servers to their knees…again not a lot has changed on that front in 19 years. The Auspex 5000 solution bypassed the Unix kernel and did the NFS processing on hardware accelerators inside the server, leading to a 50x increase in performance.
Well, at WekaIO we believe the era of dedicated hardware accelerators is long gone, but our approach in software is similar. Firstly, we ditched the NFS protocol as a performance-oriented protocol and kept it solely for its intended purpose – data access. Secondly, we have parallelized all I/O to and from the storage system at 4K granularity, because single-threaded protocols like NFS cannot scale performance. Thirdly, we have fully distributed metadata across the storage cluster for highest performance and lowest latency operations, eliminating the metadata server as a bottleneck. Fourthly, we wrote a network protocol that leverages modern technologies like DPDK and 100Gbit networking for the lowest latency. Finally, just like Auspex, we bypass the Linux kernel and leverage NVMe-over-fabrics to get the highest performance. All of this is accomplished through our Matrix software that runs on any commodity X86 server, so customers are not tied down to proprietary hardware platforms. Voila, the result is a 10x increase in performance per Ethernet link compared to NFS, and for NVIDIA DGX-1 environments, performance can be aggregated to over 44GBytes/second per DGX-1 server.
The end result is a high-performance parallel file system that satisfies I/O intensive workloads, with the capability to easily share with end users of the data over NFS or SMB.