For WEKA and NVIDIA, Securing Agentic AI Starts at the Data Layer

Betsy Chernoff

Jun 1, 2026

Monochrome server hardware with a CPU icon, overlaid with a digital network, against blurred city lights.

Agentic AI doesn't work like the applications your traditional security stack was built to protect.

Consider what an AI agent actually does during a single inference session. It retrieves documents from a vector store. It reads tool call outputs left by a previous agent. It loads a reasoning trace from a prior conversation. It writes new context back to shared memory that three other agents will read within seconds. It accesses model weights, evaluates policy, and makes decisions. All of this is happening without a human in the loop, at machine speed, across AI infrastructure that was never designed to treat any of this as a security surface.

Security now has to move directly into the AI data path. Storage is no longer a passive repository behind an application. It's a real-time system deciding which agents access which data, which writes change the state future decisions depend on, and which paths stay isolated. Data exfiltration, poisoning, and unauthorized access don't stay contained in agentic systems. They flow into the agent's next action, the model's next decision, and the business process downstream.

NVIDIA Vera BlueField-4 STX, powered by NVIDIA DOCA, puts security controls directly in the data path, in silicon, and outside the host trust domain. Paired with NeuralMesh and Augmented Memory Grid from WEKA, the result is a foundation that secures agentic AI where it actually runs: persistent inference memory from WEKA, real-time data-path policy enforcement from NVIDIA, and defense in depth for the AI factory.

New Security Innovations for Agentic AI: NVIDIA Vera BlueField-4 STX & NVIDIA DOCA

NVIDIA Vera BlueField-4 STX is the foundation, bringing together AI-native storage, networking, and in-silicon security for systems built with NVIDIA Vera BlueField-4 . NVIDIA STX is the foundation for NVIDIA CMX platforms, where KV cache, agent memory, and long-running context move across distributed GPU systems. As that context becomes shared inference infrastructure, the security boundary has to move with it. NVIDIA DOCA is how that detection and enforcement happens, through three new and enhanced capabilities that operate or observe the data path.

DOCA Argus: runtime visibility for inference and agents

DOCA Argus monitors AI workload behavior at runtime across agent integrity, data access, network activity, and execution patterns. Agentic AI can turn a single bad action into a chain of downstream effects. Argus gives infrastructure teams a vantage point close to where agents interact with data.

DOCA Vault: policy where agents meet files

DOCA Vault enforces granular authorization on every file access request, inline and in silicon, independent of the host and storage system. The policy layer governs which programs can execute, prevents unauthorized file creation, and blocks model exfiltration, and stays in the data path even if the host is compromised.

DOCA Flow: network isolation at AI speed

DOCA Flow provides line-rate policy enforcement and segmentation across agents, tenants, and inference pipelines at up to 800 Gb/s, giving that traffic a security boundary that keeps pace with the infrastructure.

Together, DOCA Argus, Vault, and Flow give the AI factory a data-path visibility and enforcement layer that operates in silicon, outside the host trust domain, and at the speed the infrastructure demands. But enforcement without a governed memory layer leaves half the problem unsolved.

NeuralMesh: The AI-Native Foundation for Secure, Stateful Inference

That's where NeuralMesh comes in. AI data doesn't stay in one place anymore, and that's where governance breaks down. NeuralMesh closes that gap at the data foundation, connecting to the identity systems organizations already trust (Active Directory, LDAP, and Kerberos) and applying consistent access policies everywhere AI runs, whether that's on-premises infrastructure, AWS, Azure, GCP, or Oracle Cloud.

NeuralMesh validates identity at both the user and client-server level using a dual-token system: short-lived access tokens for API access and file system mount, and long-lived refresh tokens for session continuity. Cluster membership uses joint secret authentication, so only containers with the correct key can join. All stored data is encrypted with AES-256, with key management through KMIP 1.2+ compliant systems and HashiCorp Vault. Organizations keep control of their encryption keys regardless of which cloud hosts the data. All data in transit is secured with TLS. Network segmentation isolates AI workloads from one another and restricts which systems can reach which workloads, keeping a shared infrastructure model from becoming a shared risk model. And compliance rests on evidence: NeuralMesh maintains a complete, immutable record of every access event and integrates directly with SIEM platforms, giving security teams the lineage they need to respond to incidents and demonstrate regulatory alignment.

Augmented Memory Grid: the memory layer that agents depend on

In most inference architectures, context state is constrained by GPU HBM or host DRAM. When memory pressure rises, teams face the same bad trade-offs: evict cache too early, pin sessions to specific GPU hosts, duplicate state across replicas, or recompute expensive prefill work. Augmented Memory Grid turns NVMe into a persistent token warehouse for KV cache, built on NeuralMesh and served back to GPU hosts over NVIDIA GPUDirect Storage and RDMA.

Three advantages follow for secure, stateful inference:

Persistent context: Long-running agent workflows preserve working memory instead of rebuilding full context on every call.
Session-free scaling: Any authorized GPU host can retrieve the right KV cache blocks, eliminating sticky routing across distributed GPU infrastructure.
Governed reuse: Cache reuse is scoped to the right tenant, project, or policy boundary, rather than an uncontrolled shared state.

In an AI Factory, the secure path also has to be the fast path. NeuralMesh with Augmented Memory Grid has demonstrated a direct token warehouse-to-GPU path at over 350 GB/s reads per host. In joint testing with Oracle Cloud Infrastructure, it supported up to 7x more token throughput and 10x more concurrent users than DRAM-only in the tested configuration.

Together, the architecture is straightforward: NeuralMesh with Augmented Memory Grid makes stateful inference memory persistent, fast, and governable. NVIDIA Vera BlueField-4 STX and NVIDIA DOCA make the infrastructure around that memory enforceable in silicon. Neither is sufficient alone, but together they make the secure path the high-performance path.

What a secure AI factory looks like with NVIDIA STX & NeuralMesh

A secure AI factory is not built around one control plane. It is built around many control points working together: identity to know who is acting, governance to decide what is allowed, persistent memory to preserve context, and in-silicon detection and policy enforcement to protect the data path where agents actually operate.

With NeuralMesh and NVIDIA STX the result is a security model that maps to how agentic AI actually works, across data, memory, inference, and network movement.

Layer	Components	What it does and why it matters
Identity and governance	Customer AI platform, security gateway, or orchestration layer	Authenticates requests, maps them to tenant/project/workload identity, and applies policy. This binds data access and inference memory to the right workload context.
Inference runtime and KV cache manager	NVIDIA Dynamo, NVIDIA TensorRT LLM, vLLM, SGLang, LMCache, NVIDIA Dynamo KV Block Manager, or similar	Creates, stores, retrieves, and reuses KV blocks during prefill and decode. With identity-scoped cache keys, warm reuse stays inside the authorized boundary.
Persistent inference memory	Augmented Memory Grid	Persists KV cache in a NeuralMesh-backed token warehouse and serves it to GPU hosts over NVIDIA GPUDirect Storage and RDMA. This keeps state fast and available without unmanaged HBM pressure or per-replica duplication.
AI-native data foundation	NeuralMesh by WEKA	Provides the high-performance data layer for model weights, checkpoints, datasets, agent memory, and context memory across STX and CMX-style environments.
In-silicon data-path policy enforcement	NVIDIA Vera BlueField-4 STX and NVIDIA DOCA	DOCA Vault governs file access, DOCA Argus monitors inference and agent behavior, and DOCA Flow isolates network paths. Enforcement stays outside the host trust domain and close to the AI data path.

* NVIDIA Dynamo and NVIDIA TensorRT LLM

With NeuralMesh and NVIDIA STX, those control points form a security model that maps to how agentic AI actually works.

The Bottom Line

AI needs a security architecture that matches the way agents behave: continuous data access, persistent context, autonomous action, and distributed inference. WEKA and NVIDIA are building that architecture at the infrastructure layer, not as a patch on top of it.

With NeuralMesh and Augmented Memory Grid, WEKA gives stateful inference a persistent, high-performance memory tier. With NVIDIA Vera BlueField STX and DOCA, NVIDIA gives that tier real-time, in-silicon protection for data, agents, and context memory.

Together, they make the secure path the high-performance path. That's what AI infrastructure needs now. Explore how WEKA and NVIDIA are solving some of the toughest problems in AI, and learn how Augmented Memory Grid can help you build security-first AI infrastructure.

What's Next

Scale Production AI Faster with NeuralMesh

Your models aren't slow. Your data is. Fix AI bottlenecks with high-throughput infrastructure.

Watch Product Tour Contact Sales

Layer

Components

What it does and why it matters

Identity and governance

Customer AI platform, security gateway, or orchestration layer

Authenticates requests, maps them to tenant/project/workload identity, and applies policy. This binds data access and inference memory to the right workload context.

Inference runtime and KV cache manager

NVIDIA Dynamo*, NVIDIA TensorRT LLM*, vLLM, SGLang, LMCache, NVIDIA Dynamo KV Block Manager, or similar

Creates, stores, retrieves, and reuses KV blocks during prefill and decode. With identity-scoped cache keys, warm reuse stays inside the authorized boundary.

Persistent inference memory

Augmented Memory Grid

Persists KV cache in a NeuralMesh-backed token warehouse and serves it to GPU hosts over NVIDIA GPUDirect Storage and RDMA. This keeps state fast and available without unmanaged HBM pressure or per-replica duplication.

AI-native data foundation

NeuralMesh by WEKA

Provides the high-performance data layer for model weights, checkpoints, datasets, agent memory, and context memory across STX and CMX-style environments.

In-silicon data-path policy enforcement

NVIDIA Vera BlueField-4 STX and NVIDIA DOCA

DOCA Vault governs file access, DOCA Argus monitors inference and agent behavior, and DOCA Flow isolates network paths. Enforcement stays outside the host trust domain and close to the AI data path.

Your AI Stack Is Hitting a Wall and Most Teams Aren’t Ready

The Inference Economy Is Here. Your Infrastructure Wasn't Built for It.

When Everything Is Scarce, Density Is the Only Lever You Have