Download this White Paper
NVIDIA, Mellanox, and WekaIO are innovative companies that have combined their respective technologies to support AI workloads. NVIDIA is the world leader in accelerated computing, enabling enterprises to speed up DL training up to 96 times faster than a CPU server. Mellanox provides high-bandwidth, low-latency, dense data center switching as the common data fabric. WekaIO is an innovation leader in high-performance, scalable file storage for data-intensive applications. The WekaFS file system transforms NVMe-enabled servers into a low-latency storage system for AI and high-performance computing (HPC) workloads. The combined solution, which is described in this Weka AI Reference Architecture (RA), enables the most demanding DL models to be trained faster, allows enterprises to rapidly iterate on their models, and delivers faster time to insights for greater business agility.
Designing and implementing an AI infrastructure can be time-consuming and daunting, as DL workload demands are similar to those of HPC in how they consume significant data center resources. Additionally, DL workloads access large amounts of small files, which creates a more demanding metadata load than traditional HPC systems are designed to handle. The Weka AI Reference Architecture specifies the building blocks required to deliver a high-performance solution for DL training, leveraging industry-leading technologies while eliminating the complexity of typical HPC environments.