overview

Weka provides the best I/O throughput to GPU-enabled servers reducing research job time by 10x

Oklahoma Medical Research Foundation (OMRF) manages 950TB of data—100TB active on Weka, 250TB on a slower scale out NAS with standard file shares to satisfy a medium-term data retention strategy, and 600TB in an object storage archive. The cluster consists of standard x86 CPUs from Supermicro connected via 100Gb Ethernet switches. The system was architected to stream data from the lab to the Weka storage via SMB. The team, who has begun experimenting with GPUs, now has peaceof-mind knowing that Weka provides the best I/O throughput to GPU-enabled servers renderring their updated cluster architecture future-ready.

quotes

“Our scientists can more easily complete their analysis and no longer have to deal with the storage. Our old workflow was complex and time-consuming, involving staging data to a compute nodes local SSD for better application performance. With Weka we have so much performance and expandable capacity that we don’t have to think about it”.

Stuart Glenn, Assistant Director of Infrastructure & Research Computing, OMRF

Challenge

The key focus of the research computing team at OMRF is supporting high performance computing (HPC) for general bioinformatics work. One very common workflow is Next-Gen Sequencing (NGS) analysis using GATK pipeline for sequence alignment and variant calling. However, the cluster supports numerous
research jobs running simultaneously with unique toolsets. The challenge for the OMRF research computing team was to architect a system for scientists that have growing informatics needs for their research: more compute, more storage, faster storage, and “bigger” data.

THE NEW SOLUTION HAD TO MEET SEVERAL KEY CRITERIA:
Exscale Capacity

Increase in workload size, complexity, and throughput

Integrated Disk-Based Tiering

Tier seamlessly to object storage archive

Budget

Comply with tight budget constraints

Watch

Improve storage performance to accelerate innovation

SOLUTION

WEKA LIMITLESS DATA PLATFORM ON SUPERMICRO SERVERS

By implementing the Weka Limitless Data Platform, the OMRF team was able to achieve better throughput and run more research jobs concurrently without negatively impacting other jobs or workloads. In addition, the turn-around time is better, because the jobs finish faster the results get to the scientist quicker which accelerates the next stage of their research. The research workflows are greatly simplified, because using Weka eliminated the complexity of staging-in and -out data into a compute node’s local SSD. Research outcomes are no longer limited by how much data can be stored on a compute node’s local SSD, with WekaFS acting as a front-end they have faster and easier access to their object archive tier ensuring the applications have access to all the data. Ultimately, OMRF now has so much performance and expandable capacity available to all nodes that nobody has to think about storage any longer—instead focusing on saving lives.

Benefits and Return On Investment

OMRF research jobs were reduced 10X, one job was reduced from 70 days to 7 days; another common analysis workflow was reduced from 12 hours to 2 hours.

No limit on capacity scaling

Improvement in performance

icon-4-1
REDUCTION IN STORAGE COST

Top 5 Reasons Why WekaIO for Life Sciences
Solution Brief

Top 5 Reasons Why WekaIO for Life Sciences

Download
Weka for Genomics
Solution Brief

Weka for Genomics

Download
WekaIO Architectural Whitepaper
White Papers

WekaIO Architectural Whitepaper

Download

Start Solving the Big Problems

Get Started