It’s Time to Replace Those Yugos
Andy Watson. April 1, 2019
NFS performance issues can be clearly illustrated by a product that became notorious, not for its success but for its dismal shortcomings and ultimate failure.
In the late 1980s there used to be a small car sold in the USA under the now-anachronistic name “Yugo”. Manufactured in Serbia based on an enfeebled version of the Italian Fiat design, there’s no shortage of automotive experts who would argue that it was the worst mass-produced commercial automobile ever made. But the price made it irresistible to some people. In fact, it was so inexpensive ($3,990 USD) that some owners considered it “disposable” and never performed any maintenance on it whatsoever.
The Yugo could accelerate to 60 mph in about 14 seconds. That’s not a typo: 14 seconds! With a single (presumably lightweight) person driving, and maybe with the interior seats and spare tire removed to lower the weight. That is not anybody’s concept of performance. Roll a Yugo onto the open road, put the pedal to the metal and don’t let up — eventually it would arrive at its destination if it didn’t collapse into a heap of rust before journey’s end.
Now let’s imagine putting two dozen Yugos onto a very wide highway, each in its own lane all pointing the same direction. (Maybe there really is a freeway that wide somewhere in Los Angeles, but for the sake of this thought experiment, please don’t get hung up on the availability of 24 lanes. And if you simply cannot let go of that issue, suppose instead that our sad array of Yugos will be proceeding parallel across the Bonneville salt flats of Utah.)
And so: at Time Zero our mob of Yugos are idling hubcap-cheek by fender-jowl at the starting line when a green light and waved flag triggers all the drivers to floor it. Each sluggish vehicle carries 5 people: 4 uncomfortably full-sized passengers plus a heavy-duty driver, their cumulative weight reducing the car’s potential acceleration by half. Nothing could be less encouraging than this fleet of crap cars creeping down the wide freeway (or across the salt flat) like a slow-motion wave of despair, a staged metaphor for the inevitability of death and taxes. At the 10-mile marker, though, these awful automobiles have undeniably shifted 120 people a considerable distance.
That is throughput.
Now imagine that the only available transportation is a racing motorcycle piloted by an expertly trained and experienced rider to relatively-safely control its 0-60 acceleration of about 3 seconds, and top speed of about 180 mph. However, unless you are in India, a motorcycle can realistically only carry 1 passenger at a time. To achieve the same effective throughput given the above road (or salt flat) conditions — i.e., conveying 119 people plus the 1 motorcycle driver — it would have to make 118 roundtrips plus a single final one-way trip across the 10-mile track.
Would the extreme speed (the low latency or “response time”) of the motorcycle be sufficient to traverse about 2,370 miles of travel, including the stops and starts, in less elapsed time than required by the glacial pace of the 24 Yugos conquering a single 10-mile crossing? No. But from the perspective of each individual passenger, what is the experience?
That is the importance of latency (sometimes referred to as response time).
In the best of all possible worlds, you would want both. Mapping this analogy onto data storage support for GPU-accelerated workloads (a hot topic for WekaIO these days — and one very important to the world of AI/ML), the 24 Yugos are like using NFS file storage with its excellent stats for aggregate throughput, albeit with each individual mount point delivering insufficient performance to the I/O-starved GPUs. And by comparison WekaIO’s Matrix filesystem is like 120 motorcycles, giving to each mount point — and to each GPU behind each mount point — lower latency and more I/O data access than has ever been possible before by any other method.
And that’s how you can have both massive throughput at scale and amazingly fast response times with the kind of low latency statistics usually associated with raw hardware, even while leveraging all the rich features of a robust modern filesystem.
To learn more about how to remove the I/O bottleneck created by NFS performance issues, click here.