HPC Advisory Council video: Making Machine Learning Compute Bound Again
WekaIO: Making Machine Learning Compute Bound Again
In this video from the Stanford HPC Conference, Liran Zvibel from Weka.IO presents: Making Machine Learning Compute Bound Again.
"GPUs are getting faster on a yearly cycle. Networking was able to catch up and support linear scaling of models that fit in memory. Traditional storage has not caught up to the condensed performance needed by GPU-filled servers. The amount of concurrent clients and the sheer amount of data required to effectively scale modern deep learning models keeps growing.
We are going to present WekaIO, the lowest latency, highest throughput file system solution that scales to 100s of PB in a single namespace supporting the most challenging deep learning projects that run today. We will present real life benchmarks comparing WekaIO performance to a local SSD file system, showing that we are the only coherent shared storage that is even faster than the current caching solutions, while allowing customers to linearly scale performance by adding more GPU servers. Also, we will view the complete ML project lifecycle, from collecting data, cleaning, tagging, exploring, training, validating, and finally archiving, and how customers can use cloud bursting to leverage public cloud infrastructure for improved economics."