Genomics and Cryo-EM Workflows in Life Sciences
Shimon Ben David. November 10, 2020
What is Cryo-EM?
Cryo-EM which is short for Cryogenic Electron Microscopy is part of a larger field of research called Structure Based Drug Design (SBDD). This is the process taking organic tissues, freezing them, then bombarding them with radiation. This generates multiple pictures of the proteins themselves. During this process the tissue can move as well, even though it is frozen, and that is why it can be compared to taking multiple pictures of an object from multiple angles while it is moving, with all of the challenges associated with it.
Scientists can then use these 2D pictures to generate 3D images of the proteins. Since the protein is moving during this time period it actually creates a movie like output. Using that, researchers can then design a drug that can bond better with that protein.
Cryo-EM Workflows (Bioimaging) Explained
Cryo-EM workflows includes 3 stages:
- Input: Very large image data is captured from slides. Cryo-EM microscopes generate giga and tera-pixel images, Average size per capture would be a 2-4TB output, since these scopes are in high demand they are usually operator 24/7
- Image Processing: Image distortions are corrected, 2D to 3D conversion Analysis: Images are analyzed to measure size, composition etc.
- Output: Image composite and measurement data is generated. Multiple 2D images are stitched to 3D images, since particle are moving during capture this eventually can generate a 3D “Movie”
NOTE – In both CryoEM and Genomic sequencing the RAW data is valuable and should be retained as possible since there is little or no possibility to generate it again (get another sample or look at the same tissue/material). The data needs to be saved since over time new methods and algorithms are developed that can give new insights from the original RAW data.
Learnings from Benchmarking Cryo-EM data storage and processing
Run on modern CPU/GPU platforms
Cryo-EM pipelines have adapted to take advantage of modern compute capabilities such as GPUs and newer instruction sets within CPUs. multiple software frameworks such as Relion and CryoSPARC are already adapted to offload part of the processing to a GPU based environment thereby significantly accelerating the time to completion of the CryoEM pipeline. Additionally compiler frameworks are able to improve performance by leveraging new CPU instruction sets such as AVX-512 (This can result in significant improvements).
CryoEM workload on WekaIO
WekaFS is being utilized by multiple companies in the life science area for Genomics and CryoEM pipelines.
Life Science workloads such as genomics and CryO-EM pipelines are characterized by large number of files that vary in size and number with each step in the pipeline. WekaFS is the only parallel filesystem that is optimized for mixed workloads, accelerating time to insight and supporting concurrent pipelines without any performance impact.
Click here to learn more about how Weka accelerates CryO-EM pipelines.
The WekaFS data management capabilities are also important for life science use cases.
The Ability to perpetually retain RAW data in a cost effective way, without the need to manually move it and manage it as well as the snapshot to object store capability that provides backup and disaster recovery in a simple and scalable way is hugely important