Generative AI / Artificial Intelligence
Luma AI

Building Intelligence That Sees and Creates

Generative AI company Luma AI achieves major efficiency gains with NeuralMesh™ by WEKA®, reaching 40x faster training startup times while gaining easier scaling and improved reliability.

“What excites me most is working with the team to scale up our training runs. In the last 12 months alone, we’ve made significant strides and enabled capabilities that weren’t possible before.”

Thomas Neff, Head of Systems Research & Engineering, Luma AI

Building Tomorrow’s AI on Yesterday’s Systems

As Luma AI’s research progressed, their infrastructure struggled to keep up. The company operates hundreds of NVIDIA H100 nodes for training across multiple clusters, processing petascale datasets. But as experimentation accelerated, critical bottlenecks emerged.

The multi-modality challenge was fundamental. “For natively multi-modal models, there’s pretty much nothing out there,” Neff noted. “We need to build a custom stack for both training and inference that supports these capabilities.”

Story telling

While language models and pure video models have abundant infrastructure support, Luma AI’s pioneering work combining video, audio, and language required building everything from scratch.

Additionally, performance bottlenecks plagued their training infrastructure. FSx for Lustre, which initially provided adequate IOPS, was now topping out during large-scale training jobs. Environment loading caused long delays during training runs as Luma AI’s numerous H100 nodes sat idle, burning computational cycles and energy as they waited.

Story telling

Further, managing Python environments across hundreds of nodes had become troublesome. “Everyone tries to do something slightly differently. They need to change one small thing in their environment, then run a script that pushes it to all the nodes—which took 20 minutes,” Neff recalls. “Beyond being frustrating, it was extremely difficult to support. People contacted our experts daily because their environments weren’t working. It was painful.”

The impact was clear: “If it takes a researcher half a day to debug an environment setup issue, that delays a model training run or experiment by a day or more. These issues can directly impact shipping deadlines and product releases.”

The team needed infrastructure built for the demands they were placing on it.

NeuralMesh Provides Performance Without Compromise

The team at Luma AI knew they needed a different approach. They needed storage that could handle thousands of processes simultaneously and something that would make environment management simple. That’s when they first began exploring using NeuralMesh.

The team ran a proof of concept, and the results spoke for themselves. The pain points they’d been living with—the environment distribution headaches, the IOPS limitations, the support burden—began to evaporate.

“Once we started the POC with WEKA, we saw that it resolved the issues that we had. Back then, there were not a lot of people to manage or even maintain what we had. WEKA freed us up so we could work on more important things.”

Thomas Neff, Head of Systems Research & Engineering, Luma AI
Story telling

NeuralMesh provided what Luma AI had been looking for: an adaptive, high performance storage system that could be accessed from all nodes, enabling seamless sharing of Python environments, code, and experimental data—all while maintaining the performance needed for intensive training workloads.

“We maintained the convenience of sharing environments on distributed storage without constantly moving files to the local disks of every node,” Neff notes. The system handled massive throughput without breaking a sweat—thousands of processes simultaneously pulling data without performance degradation.

The implementation was straightforward, with strong support from the WEKA team throughout. The solution integrated smoothly with Luma AI’s existing infrastructure, working alongside their S3-based data pipelines for training data. “We had an excellent experience getting it set up and resolving our previous concerns,” Neff recalls.

Liberating Research from Infrastructure Limits

NeuralMesh transformed both the numbers and the daily experience of Luma AI’s researchers.

The performance gains were dramatic. Environment loading times became 40 times faster, going from 10-20 minutes to 30-60 seconds. Training runs that used to require 20 minutes to launch now start to run a minute or less. H100 GPU nodes that used to frequently sit idle waiting for the environment to be loaded now achieve 95-100% utilization much faster, with GPUs consistently drawing 650-700 watts per run, delivering maximum computational efficiency and return on investment.

NeuralMesh resolved the rate-limit issues that had previously plagued Luma AI’s large-scale training jobs. The team can now run more experiments in parallel and iterate faster on model development, working seamlessly across their datasets.

“We can take advantage of all the scaling laws by launching larger deployments without having to worry about startup times,” Neff noted.

The business impact extended beyond raw performance. Researcher productivity improved dramatically. New team members become productive quicker with access to shared environments—no complex onboarding, no waiting for environment configuration.

“When our research moves faster, we can iterate on our products faster, which directly impacts what we ship.”

Thomas Neff, Head of Systems Research & Engineering, Luma AI
Story telling

“I don’t think we’ve had many or any issues with it at all,” he says. NeuralMesh interfaces like any other file system but delivers the speed of high-performance network storage.

The transformation freed the systems team from firefighting to focus on innovation. Beyond direct improvements, Luma AI anticipates indirect cost savings through more efficient compute utilization, reduced downtime, and eliminated idle GPU time.

The Next Frontier of Creative Intelligence

For Luma AI, the partnership with WEKA is about building the foundation for the future of worldbuilding and multimodal intelligence. As they continue pioneering AI systems that enable creative expression for millions worldwide, infrastructure purpose-built for petabyte-scale workloads becomes essential—not just for storing data, but for unlocking the breakthroughs that expand human imagination.

“Working with NeuralMesh helps me and my team run experiments faster,” Neff concludes. “If I could go back and tell my past self one thing about choosing NeuralMesh, it would be this: it’s faster than you think.”

Achieve Greater Efficiency

Learn more about Luma AI and WEKA