We Don’t Speak Milliseconds

Product Marketing Team

May 6, 2025

At WEKA, we operate in microseconds (one thousandth of a second)—because milliseconds are too slow for modern AI and data-driven workloads. When your GPUs are experiencing millisecond response times, your insights are delayed, and your budget is burning. That’s why we built NeuralMesh to accelerate data by eliminating latency at every level of the stack—getting the absolute most out of your compute, storage, and network investments.

Why Microseconds Matter

Today’s AI and HPC workloads don’t operate at millisecond pace. They demand instant data delivery—across petascale storage, GPU farms, and global networks. If your infrastructure isn’t moving in microseconds, it’s holding you back.

NeuralMesh delivers true microsecond latency by optimizing every part of the I/O path—from application layer to flash media. Unlike other storage platforms which hand over the responsibility of delivering I/O to the network and compute infrastructure, NeuralMesh takes ownership of end-to-end I/O delivery. The latency improvements at each stage in the I/O journey add up to a meaningful improvement in overall round-trip latency reduction. In doing so, NeuralMesh eliminates bottlenecks that result in long latencies for other systems.

Bypass the Client Operating System

Operating systems are excellent at maintaining compatibility across lots of different hardware choices but they were never designed for low latency performance. For applications that require significant interaction with the data, the kernel is a major bottleneck. NeuralMesh bypasses the operating system kernel completely, allowing it to focus on other tasks while NeuralMesh’s data-centric operating system performs the data-specific activities that other solutions depend on the kernel for.

Zero-Interrupt I/O with SPDK and DPDK

Since operating system kernels are interrupt-driven, many cycles are wasted waiting for the kernel to wake up and respond to a stimulus. Thus the kernel also introduces latency for network traffic. NeuralMesh uses polling-based I/O built on SPDK (Storage Performance Development Kit) and DPDK (Data Plane Development Kit) to bypass the kernel and avoid costly interrupts or context switches. That means no overhead, no CPU contention—just pure, direct, low-latency throughput.

Parallel Everything: Smart Queue Depths, Load-Balanced I/O

Every read and write in NeuralMesh is automatically balanced across all available resources—across drives, threads, and nodes. We dynamically manage queue depths to prevent congestion or idle cycles, ensuring uniform performance across the cluster, even under extreme concurrency.

Direct GPU Access

By leveraging Magnum IO™ GPU Direct Storage and direct pathing, NeuralMesh connects your storage to GPUs without detours. No staging, no bouncing between buffers—just high-speed, high-efficiency data delivery from NVMe to inference engines or training jobs.

Flash-Native Efficiency with 4K Alignment

NeuralMesh aligns I/O to 4K boundaries to squeeze every drop of performance out of NVMe SSDs. Having made the decision to support NVMe, NeuralMesh was able to take advantage of the fact that memory pages and NVMe blocks are exactly the same size. This means that it can pin system memory directly to NVMe blocks with no modifications required. By contrast, vendors that support larger block sizes are forced to modify the page size, leading to a performance hit. Combine that with support for modern fabrics like 400GbE, RoCE, and InfiniBand, and you’ve got a data path that can actually keep up with your GPUs.

Distributed, Microservices-Based Metadata

Our metadata engine isn’t a bottleneck—it’s an accelerator. NeuralMesh breaks apart metadata services into lightweight, containerized microservices, and runs them in parallel across all nodes. Multiple virtual metadata processes per node ensure low-latency access under even the most intense I/O operations—no single points of contention, ever.

The Result? Microseconds That Maximize Your Stack.

Every microsecond NeuralMesh shaves off translates into better GPU utilization, faster pipelines, and more value from your infrastructure. Whether you're scaling inference, training foundation models, or supporting real-time analytics—we keep data flowing with zero waste.

Ready to upgrade your latency language?

What's Next

Scale Production AI Faster with NeuralMesh

Your models aren't slow. Your data is. Fix AI bottlenecks with high-throughput infrastructure.

Contact Sales Watch Demo

Why Microseconds Matter

Bypass the Client Operating System

Zero-Interrupt I/O with SPDK and DPDK

Flash-Native Efficiency with 4K Alignment

Distributed, Microservices-Based Metadata

The Result? Microseconds That Maximize Your Stack.

Ready to upgrade your latency language?

Manufacturing AI Has a Storage Problem. Here's What It's Costing You.

Watt's Really Holding AI Back

Token Demand Went Exponential. Your Infrastructure Didn’t Get the Memo.

We Don’t Speak Milliseconds

Why Microseconds Matter

Bypass the Client Operating System

Zero-Interrupt I/O with SPDK and DPDK

Parallel Everything: Smart Queue Depths, Load-Balanced I/O

Direct GPU Access

Flash-Native Efficiency with 4K Alignment

Distributed, Microservices-Based Metadata

The Result? Microseconds That Maximize Your Stack.

What's Next

Manufacturing AI Has a Storage Problem. Here's What It's Costing You.

Watt's Really Holding AI Back

Token Demand Went Exponential. Your Infrastructure Didn’t Get the Memo.

Scale Production AI Faster with NeuralMesh