Since launching in October 2025, WARRP (the WEKA AI RAG Reference Platform) has sparked significant momentum across the AI landscape—helping enterprises build scalable, production-grade Retrieval-Augmented Generation (RAG) infrastructure with unprecedented ease. Today, we’re excited to announce the next evolution of WARRP, packed with powerful new capabilities that make it faster to deploy, easier to scale, and even more aligned with real-world inference needs.

Why WARRP?

Inference is no longer a back-end concern—it’s the engine of enterprise AI. Whether you’re serving dynamic API requests, powering intelligent agents, or integrating RAG into customer-facing applications, fast, scalable inference is what turns your trained models into real business value.

But stitching together vector databases, embedding models, GPU orchestration, and storage pipelines is complex and costly—especially at scale. That’s why WEKA built WARRP: a modular, fully integrated reference architecture that lets enterprises move from experimentation to production with confidence.

From Blueprint to Turnkey: What’s New

Originally introduced as a powerful “build-your-own” reference architecture, WARRP gave enterprise AI teams a clear blueprint for deploying RAG at scale. Now, with this latest release, WARRP evolves into a fully automated, turnkey deployment solution.

One-Line Deployment with Wundler

The new Wundler bundler tool lets you deploy the entire RAG stack—on top of NeuralMesh™—with a single command.

Spin up your environment, ingest your own data, and get to inference in minutes—not months. No more assembling components or wrangling infrastructure.

The Benefits of Automation

  • Faster time to value – Deploy and scale with minimal effort
  • Lower operational burden – No specialized teams needed to integrate, tune, or maintain the stack
  • Repeatable and portable – Get the same architecture and performance across on-prem and cloud environments
  • Confidence in production – Built-in reliability, observability, and enterprise-grade support

Expanded Model and NVIDIA Ecosystem Support

WARRP now supports NVIDIA NIMs, NV Ingest, and inference with popular LLMs like DeepSeek and Llama v3, with a plug-and-play architecture that makes adding new models simple.

It also supports database-native inference, letting you serve intelligent responses directly from your enterprise data.

Built for Production, Accelerated by WEKA

What sets WARRP apart is that it’s built on top of NeuralMesh—delivering ultra-low-latency storage, massive parallelism, and GPU acceleration that eliminates the typical bottlenecks in RAG pipelines.

From Time to First Token (TTFT) to cost-per-token efficiency, WARRP is tuned to meet the performance demands of today’s most advanced AI applications. And because the architecture is identical across cloud and on-prem environments, your AI workflows stay portable, performant, and production-ready—anywhere.

Pilot WARRP-as-a-Service: Help Shape the Future of Enterprise Inference

WEKA is currently offering an exclusive opportunity for a select number of enterprise customers to join our WARRP-as-a-Service Pilot Program. As a pilot customer, you’ll get hands-on access to WARRP deployed as a fully managed service—installed, configured, and supported by WEKA.

You bring your data. We bring the full stack.

  • One-line deployment
  • Inference-ready infrastructure
  • Full support for model integration and GPU orchestration
  • Managed by WEKA for speed, scale, and simplicity

Ready to Lead the Next Wave of AI?

If your organization is building AI factories, deploying intelligent agents, or seeking a faster path to scalable inference, WARRP-as-a-Service is the accelerator you’ve been waiting for.

Contact us today to apply for the WARRP-as-a-Service Pilot Program. Spots are limited.

Apply Now for the WARRP-as-a-Service Pilot