Breaking the MSA Storage Bottleneck to Accelerate AlphaFold


TL;DR For many AlphaFold users, overall runtime is limited not by GPU inference but by the Multiple Sequence Alignment (MSA) step. MSA workloads are dominated by highly concurrent, small-block, metadata-intensive reads against multi-terabyte reference databases—an access pattern that traditional file systems struggle to serve efficiently. In practice, this creates a “storage wall” that stalls pipelines, wastes CPU/GPU resources, and limits throughput. By contrast, WEKA’s NeuralMesh™ architecture is designed for exactly these I/O patterns, allowing MSA performance to scale with concurrency rather than collapsing under it.
DeepMind’s AlphaFold transformed biological research by taking a simple string of amino acids—the protein’s genetic sequence—and predicting its complex 3D shape with high accuracy. By computationally determining this structure, AlphaFold eliminates the months of grueling laboratory work typically required by methods like X-ray crystallography and cryo-electron microscopy.
This capability is now the backbone of modern drug discovery, targeted therapeutics, and novel protein engineering. However, turning this breakthrough into a high-throughput workflow calls for more than just raw GPU power; it requires a powerful storage architecture that can keep up with the pipeline’s most demanding stages.
The AlphaFold workflow begins with the generation of an Multiple Sequence Alignment MSA, which provides the evolutionary context the neural network relies on for accurate structure prediction. While the MSA stage is widely recognized as the slowest step, its performance is largely dictated by storage architecture—an often overlooked factor that can dramatically affect how quickly a lab can move from sequence to discovery. In this article, we examine why storage becomes the dominant limiter for MSA generation, and how removing that constraint changes the scaling behavior of the entire AlphaFold pipeline.
Why Storage Bottlenecks MSA Performance
During the MSA stage, a query protein sequence is searched against massive reference databases such as UniRef and MGnify to identify homologous sequences. Tools such as JackHMMER and MMseqs2 perform this search, each with unique performance profiles. In our analysis, we focused on MMseqs2 because it is optimized for speed, parallelism, and better CPU utilization. However, even with these advantages, bottlenecks persist.
A Conflicting I/O Pattern
At a high level, MMseqs2 processes large sequence databases in a fixed number of logical “chunks” in a defined, sequential order to fit within available system memory. However, once a chunk is loaded, the search within that chunk is no longer sequential but driven by fine-grained, query-specific access. This creates two distinct—and conflicting—access patterns:
- Macro Level: A structured and sequential progression as the tool iterates through database chunks.
- Micro Level: Fine-grained, effectively random access within each chunk as the algorithm probes indices and associated sequence data.
This combination generates a massive volume of irregular I/O operations. Because each lookup is independent and the file system cannot use sequence metadata to productively predict access, traditional file system caching becomes largely ineffective.
Ultimately, this means the performance of the entire pipeline is determined not by the speed of compute, but by the storage system’s ability to retrieve these small data fragments and handle intense metadata volume. In most real-world deployments, this imbalance makes the MSA step the dominant contributor to total runtime: Traditional storage layers often cannot supply data fast enough to keep the alignment engine busy.
Benchmarking AlphaFold MSA on WEKA NeuralMesh
Test Environment
To evaluate how a modern parallel file system handles the demands of high-throughput MSA, we ran MMseqs2 and HHsearch on a WEKA NeuralMesh Storage cluster. We used the UniRef50 and PDB70 databases, totaling 2.7 TB, which mirrors standard high-accuracy AlphaFold and OpenFold configurations.
The testing infrastructure was hosted in the AWS public cloud, utilizing Elastic Kubernetes Service (EKS). WEKA’s NeuralMesh Storage Cluster ran on 6 x i3en.2xlarge EC2 instances, while the compute clients ran on m5.8xlarge and m5.4xlarge EC2 instances and mounted the WEKA file systems using the WEKA CSI driver. Argo Workflow was the scheduler used to orchestrate multi-protein processing and MSA nodes for CPU processing.
MMseqs2 I/O-Dominated Behavior
In our experiments, MMseqs2 was configured to process the UniRef50 database in six chunks based on available memory. We observed a striking trend: MMseqs2 execution time remained nearly constant regardless of protein length.
Whether the input sequence was a 15-amino-acid fragment or a 500-amino-acid chain, MMseqs2 consistently required approximately 1,050 seconds to complete. This strongly indicates that runtime is dominated by database access rather than alignment computation.
HHsearch: CPU-Bound by Design
HHsearch, which identifies structural templates using the smaller PDB70 database, exhibited overwhelmingly CPU-bound characteristics. Its runtime was strongly dependent on query length: a 500-amino-acid protein took seven times longer than a 15-amino-acid protein.
However, because PDB70 is orders of magnitude smaller than UniRef50, HHsearch generates far less random I/O. Even for long proteins, it contributes only about one-sixth of the pre-folding runtime. This highlights that while HHsearch is compute-intensive, it is not the primary factor limiting the pipeline’s overall throughput.
Evaluating End-to-End MSA Behavior
The Concurrency Challenge
Single-query benchmarks tell only part of the story. In real research environments, multiple users often launch AlphaFold jobs simultaneously, creating synchronized access to the same reference databases. On traditional NFS or legacy parallel file systems, these “thundering herd” scenarios typically result in severe metadata contention, elevated latency, and a sharp increase in per-job runtime.
To evaluate this effect, we launched seven concurrent MSA queries, each independently scanning the same database chunks.
Observed Behavior on WEKA NeuralMesh
Under this synchronized load, WEKA NeuralMesh demonstrated stable and predictable performance:
- Zero Performance Degradation: Each concurrent query completed in the same time as a standalone run.
- Aggregated Throughput: Peak read bandwidth reached approximately 1 GB/s, with an average sustained bandwidth of 260 MB/s.
- Massive Parallel I/O: WEKA handled over 900 IOPS of small-block random reads across the concurrent jobs without hitting a “metadata wall.”
| Number of Proteins | Per Protein Read Performance | Total Read Performance |
|---|---|---|
| 78 | 66MB/s | 5.1GB/s |
| 120 | 7.9GB/s | |
| 178 | 11.7GB/s | |
| 12,120 | 800GB/s |
WEKA’s near-linear aggregate scaling enables researchers and organizations to unlock performance levels that are otherwise impractical on traditional systems. As shown in the results above, over 12,000 proteins can be processed concurrently while sustaining an aggregate read throughput of 800 GB/s.
Even at the level of a single researcher operating within a centralized lab infrastructure, eliminating I/O contention allows multiple workloads to run concurrently without queueing delays. With storage no longer a reason to serialize experiments, the pipeline scales in direct proportion to available CPU and GPU resources.
From Storage-Limited to Compute-Scaled AlphaFold
Together, these results demonstrate that the real-world performance of the AlphaFold pipeline is largely determined upstream of the neural network. While MSA generation includes both CPU-bound and I/O-bound components, overall throughput is dominated by storage-intensive database access during MMseqs2 searches.
By removing this storage bottleneck, WEKA NeuralMesh allows AlphaFold pipelines to scale in proportion to available CPU and GPU resources. For centralized research platforms, this means fewer queueing delays, higher resource utilization, and predictable performance even under heavy multi-user concurrency.
“By breaking the MSA storage bottleneck, WEKA helps transform AlphaFold from a workflow dominated by unpredictable stalls into a scalable, high-throughput engine for discovery.”
— Ali Syed, SVP Infrastructure, Danish Centre for AI Innovation (DCAI)
In practice, this shifts AlphaFold from an infrastructure-limited workflow into a repeatable, production-grade system—one that provides predictable performance and reliable throughput for research teams operating at scale.
Learn more about how NeuralMesh can help power life sciences research and accelerate discoveries, and see how organizations like the Swiss Institute of Bioinformatics and Atomowise put it into practice.