Eliminate Infrastructure Sprawl and Accelerate AI on Dell PowerEdge


Scaling AI Requires More Than Just the Ability to Grow Bigger
As enterprises transition from AI experiments to production-scale deployments, they are encountering a new wave of infrastructure challenges. As AI models grow larger, data pipelines are becoming more complex and real-time performance is no longer optional—it’s mission-critical.
Tier 0 storage, where active data is being accessed by thousands of GPUs, including large-scale real-time inferencing, token processing, model loading and high-throughput training, demands data infrastructure that can match GPU speed without adding latency or complexity.
Scaling AI requires solutions that can seamlessly support dynamic workloads with unpredictable I/O patterns, saturate and optimize GPUs across distributed environments and be provisioned quickly enough to keep up with innovation.
That’s why WEKA is delivering a better way forward with Dell Technologies: a powerful, converged AI solution that integrates NeuralMesh™ Axon™ by WEKA with Dell PowerEdge XE9680 servers—some of the most advanced AI servers on the planet.
This solution is purpose-built for large-scale GPU environments and is optimized for Tier 0 use cases—including massive inference pipelines, foundation model training and real-time retrieval-augmented generation (RAG)—where performance, efficiency and simplicity are non-negotiable.
Introducing NeuralMesh Axon on the Dell PowerEdge XE9680 server
Today, we’re thrilled to announce that NeuralMesh Axon can be run embedded on the Dell PowerEdge XE9680. This next-gen offering uplevels AI-ready infrastructure by collapsing compute and tier 0 storage into a single high-density system, dramatically simplifying deployments and delivering world-class performance at scale. This redefines AI infrastructure for the tier 0 layer of the AI data stack, making it faster, more efficient and easier to manage.
What Is NeuralMesh Axon?
NeuralMesh Axon is an advanced deployment architecture that embeds NeuralMesh™ directly into GPU servers, turning idle local resources—like NVMe and CPU cores—into a high-performance, AI-native storage fabric. NeuralMesh Axon runs directly on the same servers that power AI workloads. This approach dramatically simplifies the AI stack, reduces networking overhead and accelerates time to value—while still delivering enterprise-grade performance, data protection and tiering capabilities. With NeuralMesh Axon, every GPU server becomes both a compute and storage node, creating a truly software-defined, high-efficiency infrastructure.
Why It Matters
AI isn’t just a compute problem—it’s a data problem. Training and feeding today’s massive models requires not just GPU horsepower, they demand a data framework capable of saturating those GPUs with ultra-low-latency, high-throughput storage. That’s where NeuralMesh delivers unprecedented advantages:
- GPU Utilization Boosted by 80%
Stop letting your GPUs sit idle. Ensure a continuous data pipeline to every GPU, eliminating I/O bottlenecks. - Zero Storage Footprint and Lower Infrastructure Costs
NeuralMesh Axon runs natively on the internal NVMe, creating an efficient high performance storage tier within the GPU environment leveraging existing hardware. By collapsing storage and compute, this solution dramatically reduces hardware, power, and cooling needs. - Faster Time to AI
From rack to production, deployments are faster, more flexible, and ready for scale.
- Integration with Dell ObjectScale
Snapshot integration with object storage solutions like Dell ObjectScale enables intelligent data lifecycle management, including cost-effective archival, tiering and long-term data retention.
Why Dell PowerEdge XE9680
Designed to deliver performance without compromise, the PowerEdge XE9680 server complements WEKA’s NeuralMesh software. With up to 8 NVIDIA HGX GPUs, 32 DDR5 DIMM slots, high-speed NVMe storage, and enterprise-grade management via iDRAC, it’s built from the ground up to drive demanding AI, ML and deep learning workloads.
Built for What’s Next
Compared to competing architectures, it offers dramatic efficiency gains: 10x lower power consumption, up to 1.75x greater storage bandwidth and zero-footprint storage that eliminates racks of unnecessary hardware. It enables you to do more with less—accelerating time to insight while shrinking your infrastructure footprint.
One of the most powerful aspects of the solution is the integration of WEKA’s Augmented Memory Grid, which extends GPU memory by leveraging ultra-fast local NVMe as a high-speed cache layer for inference workloads. Augmented Memory Grid enables dramatically larger context windows, and more efficient token processing by storing key-value caches and frequently accessed data close to the model with near-memory performance. With Augmented Memory Grid, customers can achieve up to 30% more tokens per second per node while maintaining flat per-token latency, even as context windows scale to hundreds of thousands or millions of tokens. They also see up to 41 times faster time-to-first-token (TTFT) on long prompts, delivering smoother, more responsive AI experiences.
From LLM training to inference at scale, this solution future-proofs AI infrastructure. It supports elastic scaling, hybrid cloud integration, and massively parallel access patterns—making it ideal for next-gen applications.
Learn More
Want to learn more or schedule a demo? Talk to your WEKA or Dell rep today.
Visit https://www.weka.io/resources/solution-brief/weka-neuralmesh-axon-solution-brief/