Building Gigascale AI Factories with NVIDIA BlueField-4 and WEKA NeuralMesh


TL;DR: NVIDIA BlueField-4 DPU brings 800 Gb/s networking, 6x more compute power than NVIDIA BlueField-3, and NVIDIA DOCA-powered microservices designed to accelerate infrastructure and zero-trust security at massive scale. WEKA’s next-generation NeuralMesh™, NeuralMesh Axon™, and Augmented Memory Grid™ are being developed to leverage this architecture — offloading data-plane workloads, improving concurrency and latency, and enabling stronger isolation and orchestration for multi-tenant AI environments. Together, NVIDIA BlueField-4 and WEKA’s next-generation NeuralMesh are expected to deliver breakthrough power efficiency — achieving over 100× improvement in tokens-per-watt for enterprise AI factories compared with traditional, CPU-attached storage systems.
NVIDIA BlueField-4: a glimpse of the next-generation AI infrastructure
Earlier today at NVIDIA GTC Washington, D.C., NVIDIA unveiled BlueField-4 DPU, its next-generation data-processing unit (DPU) built to power the era of gigascale AI factories. Delivering 800 Gb/s of networking bandwidth and 6x the compute power of BlueField-3, BlueField-4 introduces new levels of infrastructure acceleration, security, and efficiency for AI workloads through the NVIDIA DOCA software framework.
At WEKA, we view BlueField-4 as a pivotal step toward the future of software-defined, service-oriented AI storage. We are collaborating with NVIDIA and designing the next-generation of NeuralMesh to take advantage of BlueField-4’s architecture — bringing data, compute, and networking together in a simpler, more efficient way than traditional server-plus-storage designs allow.
This alignment validates the architectural direction of NeuralMesh and underscores how the combination of BlueField-4 and WEKA will enable joint customers to accelerate AI pipelines even further — driving higher performance, lower latency, and greater efficiency across the entire data path. As AI factories evolve into gigawatt scale infrastructure, power availability is becoming a critical constraint. By accelerating storage and data services directly on BlueField, WEKA and NVIDIA aim to unlock dramatically better performance-per-watt and transform power efficiency into a new dimension of AI competitiveness.
Why BlueField-4 matters for AI storage (and agents)
GPUs are only as fast as the data and control planes feeding them. BlueField-4 is designed to accelerate those planes with programmable, in-line services for networking, data access, and security via the DOCA framework — all at 800 Gb/s. It will be the right place to terminate, filter, transform, and secure data before it reaches GPU memory.
WEKA has been co-developing toward this vision for years — running early NeuralMesh components on BlueField-3 DPUs and NVIDIA Spectrum-X and Ethernet SuperNICs, integrating with DOCA, and validating improved storage efficiency for training and inference. BlueField-4 will scale that performance for gigascale AI factories, unlocking even greater acceleration potential.
As global datacenters approach power capacity limits, this DPU-driven design will also deliver far more useful AI tokens per watt — enabling enterprises to scale AI compute sustainably.
Three ways WEKA + BlueField-4 will change the game
1) A microservices storage mesh that thrives on
programmable DPUs
Traditional storage systems rely on monolithic controllers that can’t keep pace with modern, GPU-dense workloads. WEKA’s NeuralMesh breaks that mold: a software-defined mesh of storage microservices that dynamically orchestrates data and metadata across the cluster.
With NVIDIA BlueField-4, WEKA’s next-generation NeuralMesh will be able to push key microservices — replication, data-path optimization, encryption, and policy enforcement — directly onto the DPU. BlueField-4’s 6x compute capacity and 800 Gb/s networking will enable these services to execute in-line where packets flow, rather than routing through host CPUs.
The result will be a distributed, DPU-accelerated data plane that reduces latency, increases GPU utilization, and performs multi-tenant isolation at line rate. It’s the opposite of the old filer bottleneck — and exactly what AI factories need as they scale, delivering more tokens and better outcomes per watt by eliminating redundant CPU hops and network overhead.
2) GPU-local persistence with DPU-accelerated control
and security
NeuralMesh Axon aggregates NVMe drives on GPU servers into a unified, high-performance storage pool, keeping data physically close to compute. In its next generation, Axon will be engineered to leverage BlueField-4 to offload NVMe-oF, RDMA, and encryption functions to the DPU, eliminating control-plane overhead and freeing host CPUs for AI workloads.
Through DOCA’s native service-function chaining, BlueField-4 will be able to run storage and security microservices side by side. Combined with BlueField’s security model, WEKA expects to deliver per-tenant encryption, attestation, and isolation — all inline, with no added latency.
Together, Axon with BlueField-4 will provide GPU-local persistence with DPU-level efficiency and security, replacing layers of software proxies with a direct, zero-trust architecture for large-scale AI environments. By removing the need for separate CPU-based storage servers, this approach consolidates compute, networking, and data management functions — dramatically improving both performance density and tokens-per-watt efficiency.
3) Token-aware inference acceleration – at line rate
Augmented Memory Grid accelerates inference by reusing KV-cache and token data across models and sessions. In future iterations, It will leverage BlueField-4’s programmable dataplane to host DPU-resident DOCA microservices that handle cache lookup, eviction, and access control directly on the data path.
This architecture will enable fast, deterministic token reuse while ARC maintains isolation between tenants and workloads. The result: higher concurrency, lower time-to-first-token (TTFT), and improved GPU efficiency across clusters — all without adding CPU or networking overhead. Each reused token represents not just a speed gain, but also measurable energy savings — compounding into the 100× tokens-per-watt efficiency improvement expected from BlueField-4–based NeuralMesh systems.
Why It Matters
WEKA is advancing what’s possible for AI infrastructure with NVIDIA. By offloading data-plane and security services to NVIDIA BlueField, WEKA turns what was once the slowest part of the AI stack into a programmable, parallel fabric that scales with GPU performance.
- No monolithic filer bottlenecks — just a distributed, BlueField-accelerated mesh.
- Higher GPU utilization and lower latency without additional servers or power.
- Zero-trust isolation built in through DOCA microservices.
- Breakthrough efficiency: Expected to deliver over 100x improvement in tokens per watt versus traditional CPU-attached storage — helping enterprise AI factories scale sustainably under real-world power constraints.
BlueField-4 will amplify what WEKA already does best: transform storage from a static subsystem into a high-performance, composable layer that keeps AI factories fast, efficient, and secure.
For more details, read our announcement.
Take the Next Step
Want to help shape the era of DPU-accelerated, energy-efficient
AI infrastructure?
Join WEKA’s BlueField-4 Early Integration Program or schedule a technical briefing with our architecture team to discuss design patterns and roadmap alignment.
Talk to an Expert