We Don’t Speak Milliseconds

Colin Gallagher. May 6, 2025

If that’s how you measure latency, talk to someone else

At WEKA, we operate in microseconds (one thousandth of a second)—because milliseconds are too slow for modern AI and data-driven workloads. When your GPUs are experiencing millisecond response times, your insights are delayed, and your budget is burning. That’s why we built the WEKA® Data Platform to accelerate data by eliminating latency at every level of the stack—getting the absolute most out of your compute, storage, and network investments.

Why Microseconds Matter

Today’s AI and HPC workloads don’t operate at millisecond pace. They demand instant data delivery—across petascale storage, GPU farms, and global networks. If your infrastructure isn’t moving in microseconds, it’s holding you back.

WEKA delivers true microsecond latency by optimizing every part of the I/O path—from application layer to flash media. Unlike other storage platforms which hand over the responsibility of delivering I/O to the network and compute infrastructure, WEKA takes ownership of end-to-end I/O delivery. The latency improvements at each stage in the I/O journey add up to a meaningful improvement in overall round-trip latency reduction. In doing so, WEKA eliminates bottlenecks that result in long latencies for other systems.

Bypass the Client Operating System

Operating systems are excellent at maintaining compatibility across lots of different hardware choices but they were never designed for low latency performance. For applications that require significant interaction with the data, the kernel is a major bottleneck. The WEKA Data Platform bypasses the operating system kernel completely, allowing it to focus on other tasks while WEKA’s data-centric operating system performs the data-specific activities that other solutions depend on the kernel for.

Zero-Interrupt I/O with SPDK and DPDK

Since operating system kernels are interrupt-driven, many cycles are wasted waiting for the kernel to wake up and respond to a stimulus. Thus the kernel also introduces latency for network traffic. WEKA uses polling-based I/O built on SPDK (Storage Performance Development Kit) and DPDK (Data Plane Development Kit) to bypass the kernel and avoid costly interrupts or context switches. That means no overhead, no CPU contention—just pure, direct, low-latency throughput.

Parallel Everything: Smart Queue Depths, Load-Balanced I/O

Every read and write in WEKA is automatically balanced across all available resources—across drives, threads, and nodes. We dynamically manage queue depths to prevent congestion or idle cycles, ensuring uniform performance across the cluster, even under extreme concurrency.

Direct GPU Access

By leveraging Magnum IO™ GPU Direct Storage and direct pathing, WEKA connects your storage to GPUs without detours. No staging, no bouncing between buffers—just high-speed, high-efficiency data delivery from NVMe to inference engines or training jobs.

Flash-Native Efficiency with 4K Alignment

WEKA aligns I/O to 4K boundaries to squeeze every drop of performance out of NVMe SSDs. Having made the decision to support NVMe, WEKA was able to take advantage of the fact that memory pages and NVMe blocks are exactly the same size. This means that it can pin system memory directly to NVMe blocks with no modifications required. By contrast, vendors that support larger block sizes are forced to modify the page size, leading to a performance hit. Combine that with support for modern fabrics like 400GbE, RoCE, and InfiniBand, and you’ve got a data path that can actually keep up with your GPUs.

Distributed, Microservices-Based Metadata

Our metadata engine isn’t a bottleneck—it’s an accelerator. WEKA breaks apart metadata services into lightweight, containerized microservices, and runs them in parallel across all nodes. Multiple virtual metadata processes per node ensure low-latency access under even the most intense I/O operations—no single points of contention, ever.

The Result? Microseconds That Maximize Your Stack.

Every microsecond WEKA shaves off translates into better GPU utilization, faster pipelines, and more value from your infrastructure. Whether you’re scaling inference, training foundation models, or supporting real-time analytics—we keep data flowing with zero waste.

Ready to upgrade your latency language?

Skip the Wait

PRODUCTS

DEPLOYMENT OPTIONS

USE CASES

INDUSTRIES

ARCHITECTURES

Learn AI

RESOURCES

TECHNICAL RESOURCES

ABOUT US

JOIN US

We Don’t Speak Milliseconds

Why Microseconds Matter

Bypass the Client Operating System

Zero-Interrupt I/O with SPDK and DPDK

Parallel Everything: Smart Queue Depths, Load-Balanced I/O

Direct GPU Access

Flash-Native Efficiency with 4K Alignment

Distributed, Microservices-Based Metadata

The Result? Microseconds That Maximize Your Stack.

Popular Blogs From Colin Gallagher

We Don’t Speak Milliseconds

Why Microseconds Matter

Bypass the Client Operating System

Zero-Interrupt I/O with SPDK and DPDK

Parallel Everything: Smart Queue Depths, Load-Balanced I/O

Direct GPU Access

Flash-Native Efficiency with 4K Alignment

Distributed, Microservices-Based Metadata

The Result? Microseconds That Maximize Your Stack.

Share On Social:

Popular Blogs From Colin Gallagher

Related Assets

The Impact of Storage Architecture on the AI Lifecycle

2024 AI Trends: Scaling Innovation, Generative AI, and Infrastructure Challenges

Scaling Smart: Future-Proofing Your AI Infrastructure