overview

Weka allows Genomics England to scale to extreme performance and capacity in their mandate to sequence 5 million genomes by 2023.

Genomics England (GEL) aims to sequence 5 million genomes from National Health Service (NHS) patients with rare diseases. A team of over 3,000 researchers use the DNA data acquired from NHS for medical research. GEL expects the data to grow to over 140 Petabytes by 2023. The research conducted requires access to the entire data set and must allow researchers to query the data in a highly randomized fashion. Therefore, all data has to be stored in a single storage system.

quotes

“We needed something that’s much more scalable than existing NAS solutions — an infrastructure that could grow to hundreds of petabytes. Our existing solution couldn’t provide that scale and wasn’t performing as well in these magnitudes — that’s what drove us to Weka”.

David Ardley, Director of Infrastructure Transformation

Challenge

In 2018, Genomics England approved a new storage platform to support the projected growth to 5 million genomes by 2023. Previously, GEL had implemented a scale-out NAS solution from a leading vendor to support the 100,000 genome project; however, it had already hit its limit on storage node scaling, and performance suffered when the system was near capacity. In addition, the existing solution had no viable disaster recovery strategy, as backing up all 21 petabytes of storage was infeasible. Key national data would be in a vulnerable position if a major disaster were to occur. The GEL infrastructure team determined that it needed a new storage strategy to support the anticipated growth through 2023.

THE NEW SOLUTION HAD TO MEET SEVERAL KEY CRITERIA:
Capacity

Scale capacity in a single namespace to store 140 petabytes of data

Data

Devise a strategy to protect valuable national data from a major disaster

Budget

Comply with tight budget constraints

Watch

Improve storage performance to accelerate innovation

SOLUTION

THE WEKA FILE SYSTEM SOFTWARE ON INDUSTRY STANDARDS SERVER INFASTRUCTURE

WekaFS delivered a two tier architecture that takes commodity flash and disk-based technologies and presents it as a single hybrid storage solution. The primary tier consists of 1.3 Petabytes of high performing NVMe-based flash storage which supports the working data sets. The secondary tier consists of 40 Petabytes of object storage to provide a long-term data lake and repository. Weka presents the entire 41 Petabytes as a single namespace. Should GEL require more performance on the primary tier, it can do so independently of the data lake.

Diagram
diagram small

Benefits and ROI

Genomics England was able to realize several benefits and tremendous
return on investment by choosing WekaFS:

No limit on capacity scaling

Improvement in performance

REDUCTION IN STORAGE COST PER GENOME

Data
Full disaster recovery strategy in place

Cloud
Integration with public cloud for compute elasticity

Next Generation Storage System for I/O-Intensive Workloads
About Weka Overview

Next Generation Storage System for I/O-Intensive Workloads

Download
Modernizing Enterprise Storage
White Papers

Modernizing Enterprise Storage

Download
ESG Performance Validation Report
Analyst Report

ESG Performance Validation Report

Download

Start Solving the Big Problems

Get Started