Health and Life Sciences (HLS) organizations’ storage systems, infrastructure, and applications have grown organically over time as new data sources and the associated workflows were developed to meet the demands of innovative projects like genomic sequencing, drug discovery, and Cryo-EM.  These changes have led to new discoveries and revenue that have also had the effect of putting a lot of pressure on the management, flexibility, and cost of owning and operating these environments. Legacy solutions no longer meet the minimum operational needs, which constrains research in ways that are unacceptable in several regards. One example, the most expensive elements of a research environment are typically the instruments that create data, like microscopes and sequencers.  It is unfortunate when the supporting infrastructure for these expensive devices is responsible for their limited utilization.  

Notes from the Field – Eliminating Bottlenecks 

We recently helped a customer who struggled with their storage platform daily. It was simply never designed to scale to the capacity, performance, and load their researchers were asking from the environment.  From the start, it was adequate, but the growth outstripped the system’s ability to deliver performance.  The net effect was limited performance and poor availability due to the constant mixed application load on the system.  

After diagnosing the root cause, we worked hand-in-hand with the customer to understand their needs and goals.  We proposed a system, installed it, and they were able to run jobs without issue. For example, CryoSPARC, Relion, Gaussian, Schrödinger, and R jobs all took less time returning results and researchers got more done. With Weka up and running, they immediately realized the bottleneck had moved from storage back to their CPU/GPU layer. Once this new bottleneck was identified they were able to expand the environment facilitating more research and discoveries.  

Where Flexibility Begins 

First and foremost, Weka is a software company and that is where the flexibility begins. We create innovative and integrated solutions with server vendors, partners, and public clouds. This enables customers to leverage their existing environment whether on-premises, hybrid, or cloud-native. We do NOT lock customers into specific hardware platforms or previous generations of technology. In fact, we expect the hardware and cloud technologies to move forward quickly and want customers to take advantage of these innovations so their business and research can benefit.  

Here is a quick example. PCI-Gen4 and AMD EPYC processors, and 200Gb networking were all recently released. In conjunction with our technology partners, Weka contributed and completed testing prior to the solutions hitting the market. 

Weka and our partners were ready to go as we did not have to spin new hardware platforms, silicon, or begin a massive software endeavor. Our new offerings doubled in performance. This meant our customers and potential customers did not have to wait for boat anchor appliances and code of yester-year to be updated.  

Tiering in the modern era is not about SSDs vs. hard drives 

Institutions need to be able to run their applications on a single platform. A modern platform must be able to service applications which are small file, large file, read and write, and metadata intensive, while being able to handle parallel or embarrassingly parallel workflows where small files are next to large files in the same deep directory being accessed by a tremendous number of hosts and jobs at capacities well into the 100’s of PB or EB’s. In addition, modern solutions must be able to place data on the appropriate tier of storage, but when it comes to a “tier” in today’s world it’s far less about the hardware but more about the location, portability, scale, and cost to store and process data.  

Often, I hear other companies talking about “no tiers” or “no tiers needed” or “put it on all-flash” but that limits organizations as it does not allow them to decide where to store and process their data at their desired cost model. Think about this for a second, of course, an appliance vendor is telling you to store and process your data on their platform because it serves them,  not you. This is likely why people are in the predicament they are in. The solutions of the past are limiting, inflexible, and most importantly cannot run mixed workloads that were designed long before someone said, “Hey what’s this cloud thing.”  And long before the hardware advancements, we see in the marketplace today.  

Simplifying Data Ownership

Weka data placement is a zero-copy architecture with one namespace, write data once into the platform and we will manage it for you. End-users and applications will always be able to get to their data as they first wrote it. Since Weka is software and built in the cloud, running in the cloud or on-premises is easy for us, it was built this way from the beginning. Weka was built expecting and understands today’s applications are often a meat grinder where anything goes all the time and platforms are expected to deal with it and perform without tuning. To deal with today’s meat grinder workloads and data types you must be able to leverage modern technology with software written to take full advantage of the server and cloud platforms simultaneously. We support all the shared protocols applications are using such as NFS, SMB, POSIX, S3, and GPUDirect Storage to the same data and a CSI plugin for Kubernetes.     

Weka’s flexibility and ability to leverage today’s data sets, applications, and choice of infrastructure (physical or cloud) allows customers to be first to market, accelerate results, or the first to discover on their terms. Weka is a strategic advantage. 

Have questions? Email me, Greg Mazzu, at greg@weka.io 

Story telling

Want to remove data silos

Learn more about Weka multi-protocol support

Additional Helpful Resources

WekaFS for Life Sciences: Accelerate the data pipeline in Life Sciences
Modern Workloads in Pharma and Life Sciences
AI-based Drug Discovery with Atomwise and Weka on AWS
Accelerating Cryo-EM & Genomics Workflows
Accelerating Genomic Discovery with Cost-Effective, Scalable Storage
Accelerating Discovery and Improving Patient Outcomes With Next-Generation Storage
How to Analyze Genome Sequence Data on AWS with WekaFS and NVIDIA Clara Parabricks