HPC Storage Use Cases
Barbara Murphy. August 5, 2020
This is a 3-part series on High-Performance Computing (HPC) Storage:
HPC encompasses a range of use cases with a common thread of the need for floating-point calculations, interconnected servers and storage, and fast access to data before, during, or post-simulation. The following HPC use cases all require an HPC storage system that can respond to application demands.
- Life Sciences
- Scientific Research
- Energy Exploration and Extraction
- Financial Services
Historically, the use of simulation within manufacturing companies has been critical in designing optimized products. The use of Computer-Aided Engineering, which includes finite element analysis (FEA) and computational fluid dynamics (CFD) uses mathematical techniques to simulate the characteristics of a set of objects and how the physical object will perform in real-world environments. These types of applications store TB of data per time step and, since the applications may run for days, and intermediate data must be checkpointed and stored reliably. CAE applications are highly parallel, scaling well into the hundreds of computing cores.
Drug design is an area that affects almost everyone, especially in times of facing a pandemic, and the world anxiously awaits a vaccine. Large databases are available that describe known compounds and are then brought into applications that simulate their interactions with viruses or other molecules. When developing a new drug, for instance, many thousands of simulations are needed to zero in on potential solutions. The amount of data per compound is in the TB range and must be brought into the compute cluster in parallel.
Rapid genomic sequencing has created significant amounts of data that scientists need for further research. Affordable methods are required in order to store, manage, and share the data between teams that are working collaboratively to understand diseases and work on cures. Due to the need to keep drug discovery data available for very long periods, storage solutions that are integrated with the compute clusters and managed globally are critical to this emerging industry.
Learn more about HPC for Life Sciences: A modern file system that accelerates the data pipeline, whether it is next-generation sequencing, Microscopy or bio-imaging, lowers the cost of research, and keeps data secure.
HPC plays an extraordinary role in scientific research. The fastest supercomputers today dedicate significant computing cycles and hundreds of thousands of cores to unlocking the mysteries and origins of the universe, exploring new energy possibilities, and simulating the health of the United States nuclear stockpile. Extensive climate simulations are used to determine the possible effects of global warming using increasing amounts of data. All of these require massive amounts of data to be brought into the compute systems and stored as both hot and cold data during long-running applications.
HPC is used extensively in locating new energy deposits deep within the earth. Seismic waves are sent into the ground, and resulting reflective acoustic energy waves signatures are collected. Massive amounts of data are then collected from these seismic wave rebounds, which then need to be collected and analyzed. Different materials below the earth’s surface reflect these waves that enable geologists to determine the content deep below the earth’s surface. The data is then combined to determine the ability and the amount of oil to extract.
Ultra-low latency programmed trading has become the norm for many financial institutions. Better decisions are then made with more historical data that can be fed into buy or sell algorithms. High-frequency trading depends on fast data access from a number of databases and sources. The information is then fed into sophisticated algorithms that determine very quickly whether to buy or sell. The competition is so fierce that many of these financial institutions are placing their data centers closer and closer (physically) to the stock exchange data centers to reduce latencies by single microseconds. The latency budget to retrieve data is measured in microseconds as well, necessitating the need for high-performance parallel file systems to retrieve the data.
Learn more about HPC for Financial Services: Dramatically reduce time to trade with a high bandwidth low latency file system for Financial Analytics
Protecting sensitive data in transit or residing on storage tiers is of critical importance. With industrial espionage on the rise, protecting valuable data is on the mind of all CIOs. If proprietary information is stolen, competitors can quickly jump to the front of the innovation line. As the amount of data generated each year is increasing and becomes more distributed throughout an organization, there are more opportunities to steal the data.
Advanced authentication and encryption are suitable for many of the most demanding enterprises. Various KMS types should be supported for HPC Storage environments so that organizations can use familiar tools and processes.
Additional Helpful Resources
FSx for Lustre
HPC Architecture Explained
BeeGFS Parallel File System Explained
Learn About HPC Storage, HPC Storage Architecture and Use Cases
Worldwide Scale-out File-Based Storage 2019 Vendor Assessment Report
5 Reasons Why IBM Spectrum Scale is Not Suitable for AI Workloads
Isilon vs. Flashblade vs. Weka
Gorilla Guide to The AI Revolution: For Those Who Are Solving Big Problems
NAS vs. SAN vs. DAS
Network File System (NFS) and AI Workloads
Hybrid Cloud Storage Explained
Block Storage vs. Object Storage