Why Genomics England Chose WekaIO for its 5 Million Genomes Project
Barbara Murphy. January 23, 2020
Barbara Murphy, Vice President of Marketing at WekaIO, shares the key reasons why Genomics England chose WekaIO for its 5 Million Genomes Project.
What do you do when you absolutely have to get something done, but you don’t have a way to do it? You have to invent or find a way to get it done!
Necessity is the mother of invention.
– Plato, as quoted in the dialogue Republic (from Dictionary.com)
Plato’s saying shows deep insight from way back in history. A strong need drives us to invent creative ways to meet that need. That’s exactly what brought Genomics England (GEL) and WekaIO together.
WekaIO identified major issues and unmet needs in the data storage industry and addressed those needs with an innovative, breakthrough product; and GEL was at an impasse in its genetics research projects because it ran into a barricade of the same storage issues. It needed a way to meet data storage needs that were beyond the capabilities of its existing infrastructure.
Genomics England (GEL), headquartered in London, England, is owned by the UK Department of Health and Social Care and tasked to run the 5 million genomes project. This project was announced in 2018 and aims to sequence 5 million genomes from National Health Service (NHS) patients with rare diseases, their families, and patients with common cancers. With an ambitious goal of completing the sequencing within five years, GEL has already acquired 21 petabytes of genome data and is projected to amass over 140 petabytes by 2023. The research conducted requires access to the entire data set and must allow researchers to query the data in a highly randomized fashion. Therefore, all data has to be stored in a single storage system.
GEL had implemented a scale-out NAS solution from a leading vendor to support its original 100,000 genome project; however, it had already hit its limit on storage node scaling, and performance suffered when the system was near capacity. In addition, the existing solution had no viable disaster recovery strategy, as backing up all 21 petabytes of storage was infeasible. Key national data would be in a vulnerable position if a major disaster were to occur. The GEL infrastructure team determined that it needed a new storage strategy to support the anticipated growth through 2023.
“The Genomics England cluster required a new solution to allow scaling of the company’s DNA data bank in line with the anticipated five-year growth. Already at 25 petabytes, our existing Isilon solution had already reached its limit and performance had deteriorated. We needed a modern solution that could scale to 100s petabytes while maintaining performance scaling, and it had to be simple to manage at that scale.”
— David Ardley, Director of Infrastructure Transformation at GEL
GEL had to find a new solution, and it had to meet several key criteria:
- Scale in line with the anticipated growth to 140 petabytes in a single storage system
- Meet the demanding performance requirements of the bioinformatics pipeline and core research
- Provide a disaster recovery solution
- Be easy to manage while delivering the required enterprise features
- Fit within the allotted budget.
Genomics England evaluated all the major contending solutions through a rigorous RFP process. They rejected parallel file systems because of their complexity and lack of enterprise features. They rejected all-flash scale-out NAS because the economics were not viable for a project at the scale of GEL’s requirements.
After an extensive evaluation process, GEL chose WekaIO because it was the only vendor that could deliver a solution that met all the performance and capacity scaling requirements within the budget constraints in a single architecture. In addition, the institute needed a system that could deliver a complete enterprise feature set, with the highest levels of data security and public cloud integration. WekaIO was the only vendor that met all of the stringent requirements of the RFP while also delivering the best cost economics.
“We needed something that’s much more scalable than existing NAS solutions – an infrastructure that could grow to hundreds of petabytes. Our existing solution couldn’t provide that scale and wasn’t performing as well in these magnitudes – that’s what drove us to Weka. With its clever combination of flash for performance and object store for scale, Weka has proven to be a great solution.”
-David Ardley, Director of Infrastructure Transformation, Genomics England
This customer success story clearly illustrates that, indeed, necessity is the mother of invention. Necessity motivated WekaIO to innovate a brand new storage solution for the modern world, and necessity compelled GEL to find and implement that solution.
As a result of choosing WekaIO, GEL gained the following benefits and return on investment:
- GEL enjoyed a 10x+ increase in performance over its legacy NFS-based NAS. The current system is capable of delivering over 135 GBytes/second from the NVMe tier and performance will continue to scale as the cluster grows.
- The storage cost per genome has dropped from £52 to £13, a 75% reduction in storage cost, and is anticipated to plunge to £2 by 2023, achieving a 96% reduction in cost. This has been accomplished because of the ability to integrate low-cost disk-based object storage into the solution.
- GEL can survive a major disaster at the primary site and still maintain access to the data. The object tier is geo-distributed across three sites, each 50 miles apart. Should the primary site fail, a small disaster recovery cluster is available on a second site and can re-hydrate the file system from the most recent snapshot.
- Critical data sets are fully encrypted from the high-performance compute cluster all the way to the permanent data store, with integration to a key management system. Performance measurements showed no discernable degradation in application performance with encryption enabled. In addition, the system is protected from rogue security threats through a robust authentication mechanism.
The following graph shows the dramatic reduction in storage cost per genome that GEL was able to realize with WekaIO.
GEL’s Reduction in Storage Cost per Genome with WekaIO
How would you like to realize similar cost reductions and savings in data storage? In what ways is this true-life account of GEL’s experience with WekaIO applicable and relevant to your organization? Are there storage necessities that are compelling you to explore a new storage paradigm? If so, consider WekaIO. We saw a necessity and invented WekaFSTM, a brand new file system designed for today’s storage needs!