Scaling Genomic Sequencing Performance On-Premises or in the Cloud
Shimon Ben David. January 29, 2020
Shimon Ben David, Field Chief Technology Officer at WekaIO, shares his perspective on the company’s impressive achievements in this blog titled “How I Learned to Stop Worrying and Love the BOM.”
Disclaimer – be advised that this blog post contains a picture from and references to the 1964 movie “How I Learned to Stop Worrying and Love the Bomb.”
When we started WekaIO some years ago, we wanted to create a fully software-defined file system that would not be dependent on a highly rigid hardware bill of materials (BOM) as most other software-defined companies have come to rely on. We looked to the stars and made a bet on the cloud.
Fast forward a couple of years — we made it; we actually created a file system that can be installed on practically any server — physical or virtual — that contains a flash SSD, whether it resides in the data center or in the cloud. We wrote our own network protocol and RTOS that enable us to tightly control the journey of every input/output (I/O) and metadata operation starting from the compute server application through all of our software-defined storage layers down to that I/O landing in parallel on multiple flash devices. We accomplished all of that without being tied to a specific BOM. We kept our “Purity of Essence” (POE) and saw great success.
We turned standard AWS instances into a storage platform that is fitting for any supercomputer and was placed #1 on the IO-500 benchmark. Running on HPE Apollo servers, we helped a customer with a huge AI project shorten Epoch time from two weeks to four hours. We took the Penguin FrostByte and submitted results for the STAC benchmark in the financial markets and broke 8 records (With the KDB+).
10x Improvement in Genomic Sequencing Pipeline
Weka’s parallel file system allowed Genomics England (GEL) to leverage Supermicro hardware and create a solution that reduced the storage cost per genome by 75% while increasing their genomic pipeline by 10x. We allowed another genomic customer to now run Weka on top of Dell Technologies servers. Some of the examples listed above are strictly applicable to our flash system and some are using our native capability to expand the namespace to an on-premises and/or cloud Object Store, all managed by Weka software.
And while customers want a software-defined solution, we saw that our customers prefer to purchase our software-defined storage with the white glove experience of an appliance (slight pause here to ponder the last sentence…), especially from our channel partners and OEMs. Like the unstoppable B-52, we made our way and gradually worked with multiple OEMs starting with HPE and AWS, then Penguin Computing, followed by Supermicro and Lenovo and finally BOMs away. We now have a reference architecture BOM with all of the mentioned OEMs where our customers can order a WekaIO BOM pre-sized for different capacities and performance, supplying one owner for the hardware and software whether it is in the data center or in the cloud.
You can learn about genomic sequencing and use of GPUs here.
For more details on all that we have achieved with partners, here are some links to documents on the solutions we delivered with our partners, including partner BOMs:
Here is how you can get started with our Weka storage solution:
It’s amazing that our storage solution can now be ordered through a BOM but without any of the rigidity, restrictions, and limitations that were associated with the BOMs of legacy storage solutions. Truly, we give you reasons to share your own story of “How I Learned to Stop Worrying and Love the BOM!”
And as the song says, “We’ll meet again…”.
AI AND HIGH VELOCITY ANALYTICS NEED A NEW FILE SYSTEM
Lectus arcu bibendum at varius vel pharetra vel. In cursus turpis massa tincidunt.