How Certified Storage is Reshaping the AI Factory Floor

Colin Gallagher. July 17, 2025

From DGX to GB200 and Beyond

AI factories are evolving at an unprecedented pace. What was once considered cutting-edge, like the NVIDIA DGX SuperPOD™ with NVIDIA H100 Tensor Core GPUs, has already given way to a new class of infrastructure powered by the NVIDIA Grace Blackwell Superchip and NVIDIA HGX™ systems with NVIDIA H200, B200, and B100 GPUs. These aren’t just more powerful servers; they represent a shift toward composable, rack-scale AI architectures designed to process trillions of tokens, power intelligent agents, and accelerate generative AI adoption across the enterprise.

But here’s the truth: No matter how fast your GPUs are, if your storage can’t keep up, your AI factory grinds to a halt.

That’s why WEKA’s certification for two new NVIDIA Cloud Partner (NCP) High-Performance Storage (HPS) reference architectures—GB200 NVL72 and HGX H100/H200/B200 systems—is a big deal.

From Proof-of-Concept to Production-Grade AI Factories

In the early days, AI infrastructure was a bit artisanal. Teams cobbled together a few DGX nodes, some fast local NVMe, and crossed their fingers. It was good enough for proof-of-concept training and benchmark runs—but nowhere near the scale needed to fuel the AI wave we’re in now.

Today, that artisanal approach has given way to something industrial.

Massive clusters. Token-scale pipelines. Petabytes of training data. Millisecond-latency demands. And zero tolerance for bottlenecks.

NVIDIA’s Cloud Partner program and the NCP Reference Architectures reflect this shift—raising the bar for what it takes to run AI at scale. And WEKA’s certified architectures provide the data layer to match.

The New Bar for Performance: 1 GB/s Per GPU

Why is 1 GB/s per GPU such a big deal?

Because it’s not just a benchmark—it’s a design principle. A single Blackwell GPU can process and move enormous amounts of data. Starve it of input or slow down output, and you’re wasting not just money, but energy, time, and opportunity.

WEKA delivers one of the few storage systems that can consistently deliver 1.0 GB/s or more per GPU, even at massive scales that are demanded by today’s GPUs:

1,152 Blackwell GPUs (1 SU) → 1.2 TB/s read throughput
4,608 GPUs → 4.6 TB/s read throughput
18,432 GPUs → 18.4 TB/s read throughput and 9.2 TB/s write throughput

That performance is made possible by NeuralMesh™ built on a containerized microservices architecture based on service-oriented design that uses NVMe, a user-space DPDK networking stack, and a parallel file system with virtually distributed metadata. The result? Lightning-fast performance at microsecond latencies, whether you’re slinging random 4KB files or giant multimodal checkpoints.

Designed for the New Era of AI

WEKA’s storage reference architectures aren’t just validated—they’re precision-tuned for the demands of modern AI workloads:

Composable storage clusters that scale linearly with compute, from a few HGX boxes to 18,000+ Blackwell GPUs
Multitenancy with physical isolation enabling cloud providers to deliver AI-as-a-Service without compromise
Certified configurations that deploy easily, for both GB200 NVL72 and HGX H100/H200/B200 platforms, including cabling, rack units, and thermal guidance

The new AI factory isn’t a monolith. It’s a dynamic, software-defined system that needs performance, flexibility, and rock-solid reliability at every layer. That includes the data layer.

High-Performance NVMe with Micron 9550 SSD

An essential element of delivering this level of performance is the right NVMe storage. The Micron 9550 SSD (PCIe Gen5) offers exceptional performance and power efficiency, making it a strong fit for NeuralMesh deployments with NVIDIA Cloud Partners (NCPs) and other large-scale AI environments. Its outstanding sequential and random read/write speeds ensure rapid data movement to and from GPUs, maximizing utilization and throughput across complex AI workloads.

The Micron 9550’s power efficiency also helps reduce energy consumption and operating costs—critical advantages for hyperscale AI factories. Further validating its role in next-generation AI infrastructure, the Micron 9550 SSD has been qualified by NVIDIA for its Recommended Vendor List (RVL) for local storage on GB200 NVL72 systems.

Why This Matters

If you’re building AI infrastructure today, the playbook is changing fast. GB200 isn’t just another generation, it’s the start of a new era of unified memory, CPU-GPU fabric, and hyperscale inference. But none of that matters if your data can’t keep up.

With WEKA’s NCP-certified storage solutions, you can:

Keep your GPUs fed with ultra-low-latency data access
Hit and sustain 1 GB/s per GPU performance
Scale from hundreds to tens of thousands of GPUs
Do it all with a single, high-performance storage system purpose-built for AI.

Welcome to the new AI factory floor. It’s faster, smarter, and now—certifiably better.

Ready to See What Certified Performance Looks Like? Dive into the full reference architectures for the GB200 NVL72 and HGX H100/H200/B200 systems to see how WEKA delivers AI-native storage at scale, validated by NVIDIA.

Discover the Power of WEKA and NVIDIA Together

PRODUCTS

DEPLOYMENT OPTIONS

USE CASES

INDUSTRIES

ARCHITECTURES

Learn AI

RESOURCES

TECHNICAL RESOURCES

ABOUT US

JOIN US

How Certified Storage is Reshaping the AI Factory Floor

From DGX to GB200 and Beyond

From Proof-of-Concept to Production-Grade AI Factories

The New Bar for Performance: 1 GB/s Per GPU

Designed for the New Era of AI

High-Performance NVMe with Micron 9550 SSD

Why This Matters

Popular Blogs From Colin Gallagher

How Certified Storage is Reshaping the AI Factory Floor

From DGX to GB200 and Beyond

From Proof-of-Concept to Production-Grade AI Factories

The New Bar for Performance: 1 GB/s Per GPU

Designed for the New Era of AI

High-Performance NVMe with Micron 9550 SSD

Why This Matters

Share On Social:

Popular Blogs From Colin Gallagher

Related Assets

How To Tackle Business and Technical Challenges Impacting Data-Intensive AI Workloads

The Impact of Storage Architecture on the AI Lifecycle

Scaling Smart: Future-Proofing Your AI Infrastructure