Turn Your AI Factory Into a
Lean, Mean Token Machine
Extend GPU memory into a persistent Token Warehouse™ with 1000x more
memory capacity and radically increase token throughput.
1000x More KV Cache Capacity
Get 1000x more capacity than DRAM for KV cache data that remains persistent across sessions and node failures.
6.5x Higher Token Throughput
Sustain up to 6.5x more input-token throughput under concurrent inference load, keeping GPUs fed even as working sets grow far beyond
DRAM capacity.
10x More Concurrent Users
Scale to 10x more simultaneous long-context sessions on the same GPU cluster, without adding hardware or sacrificing throughput.
“The economics of large-scale inference are a major hurdle for enterprises. WEKA’s Augmented Memory Grid directly confronts this challenge. The 20x improvement in time-to-first-token we observed in joint testing on OCI isn’t just a performance metric; it fundamentally reshapes the cost structure of running AI workloads. For our customers, this makes deploying the next generation of AI easier and cheaper.”
Purpose-Built for AI Inference
Scalable, Persistent, Efficient
By offloading and persisting KV-cache data to a token warehouse in NeuralMesh, Augmented Memory Grid expands effective memory capacity by 1000x beyond DRAM. This eliminates redundant prefill computations, sustains high cache-hit rates, and significantly improves GPU efficiency.
Microsecond Data Path with RDMA and GPUDirect Storage
By leveraging NVIDIA Magnum IO GPUDirect Storage (GDS) and RDMA, Augmented Memory Grid enables GPUs to fetch tokens directly from the NVMe-backed token warehouse with microsecond-scale latency and DRAM-class throughput.
Open Source Ecosystem Integrations
Integrates natively with NVIDIA Dynamo and NIXL through WEKA’s open-source NIXL plugin, and our open source projects that support TensorRT-LLM and LM Cache, enabling seamless integration into existing inference pipelines.
Next-Gen AI Runs on NeuralMesh
Articles and Resources
Serve 10x More AI Users on the Same GPU Footprint
Production-scale benchmarks on OCI show how WEKA Augmented Memory Grid removes KV cache bottlenecks to unlock higher throughput, persistent context, and better token economics for agentic AI.
Break the AI Memory Barrier with Augmented Memory Grid
AI builders can now streamline long-context reasoning and agentic AI workflows, transforming inference workloads into profitable business value.
Fuel Your Inference Workloads
See how NeuralMesh with Augmented Memory Grid delivers high-efficiency memory design
that transforms inference scale, cost, and performance.