A Transformative Solution Framework for Accelerated DataOps
WEKA is a solutions ecosystem engineered to solve Accelerated DataOps challenges, delivering Reference Architectures and Software Development Kits with leading AI solutions partners. It provides a production-ready storage solution where the entire data pipeline workflow—ingest data, to batch feature extraction, to hyperparameter optimization, and finally to inferencing and versioning—can be run on the same platform, whether running on-prem or in the public cloud. Direct access to data for training and inferencing eliminates data staging at the compute layer and storage silos which results in shorter Epoch and Wall Clock time.
Reference Architectures And Technical Briefs
“Weka IO was the clear choice for our DNN training…standard NAS would not scale and Weka was the most performant of all the parallel file systems we evaluated…we really liked that it was hardware-independent allowing us better control over our infrastructure costs.”Dr. Xiaodi Hou, Co-founder and CTO
“After comparisons with legacy NFS-based NAS storage solutions, Innoviz selected WekaFS because the performance improvements with WekaFS matched the company’s needs. The storage scalability and ability to grow the infrastructure without losing performance, was a key factor in choosing the Weka file system.”Oren Ben Ibghei, IT Manager
“We looked at our legacy architecture and instead of taking an evolutionary step and upgrading every component, we took the revolutionary approach. Weka cost-effectively enables both the use of POSIX and object storage with performance and latency that is far superior to any other solution.”Bridget Collins, Chief Information Officer
“We built a GPU farm, and we needed a high-speed data pipe to feed it. We evaluated open source solutions, HDFS, and the public cloud. We chose Weka for its ability to provide cost-effective, high-bandwidth I/O to our GPUs, product maturity, customer references, and stellar on-demand support.”Paul Liu, Engineering Operations Lead
DataOps Workflow And Related Storage Challenges
Different stages within AI data pipelines have distinct storage requirements for massive ingest bandwidth, need mixed read/write handling and ultra-low latency, often resulting in storage silos, for each stage. This means business and IT leaders must reconsider how they architect their storage stacks and make purchasing decisions for these new workloads.
Needs massive concurrency, write (WR) throughput
Needs labelling, index, search, and cloud bursting
Needs massive read (RD) throughput
Needs large number of streams replay
Needs low latency access
Needs lifecycle, management, versioning, and reproducibility