GPFS Parallel File System Explained

Barbara Murphy. October 31, 2020
GPFS Parallel File System Explained

What is GPFS?

IBM Spectrum Scale, formerly the General Parallel File System (GPFS) is high-performance clustered file system software developed by IBM in 1998. It was originally designed to support high-streaming applications in the media and entertainment space. It can be deployed in shared-disk or shared-nothing distributed parallel modes, or a combination of these two. The predominant deployment is as a shared-disk solution utilizing SAN block storage for persistent storage.

Advantages of a Modern Parallel File System over GPFS

Although IBM Spectrum Scale (aka IBM GPFS) is a parallel file system, it was developed over 20 years ago to support the high throughput needs of multimedia applications. And for the subsequent 15+ years it was optimized to deliver the highest large file bandwidth performance from hard disk drives. But the world has changed dramatically in the last ten years, flash technology has revolutionized the data center and workloads have changed from large sequential to metadata intensive small File I/O intensive big data workloads. The technology designed and fine-tuned to solve streaming applications is not suitable for the age of big data and analytics. IBM continues to develop its Spectrum Scale product, yet the file system still can not deliver performance for small file, low latency, I/O intensive workloads.

A modern parallel file system leverages new technologies such as NVMe, high speed low latency networking to service small file I/O in a massively parallel manner. In addition to managing small files efficiently, the metadata demands for AI and GPU workloads is orders of magnitude higher. A modern parallel file system utilizes a distributed metadata model where metadata is spread evenly across all the storage nodes and each node contains equal amounts of data and metadata. By contrast, the traditional GPFS model maintains metadata in separate metadata servers. This creates a performance bottleneck when the system contains millions of files.

Advantages of modern file systems

  • Best choice for GPU workloads which contain millions of files, large and small
  • Full featured, Easy to use, easy to manage, no need for specialized administrators to configure, manage and tune the storage system
  • High throughput, high IOPS, with ultra-low latency, fully saturating the network and delivering applications linear performance at exabyte scale, especially for mixed workloads

GPFS vs. Modern Parallel File System (WekaIO)

 

Hardware Spectrum Scale (on ESS 3000) WekaFS (COTS Hardware)
Server Nodes per Chassis 1 in 2 RU, Custom 1 in 1RU or 4 in 2RU, COTS
# of Server Nodes 1 to Thousands 8 to Thousands
Server Generation Intel Skylake Intel Cascade Lake Refresh, AMD EPYC
Memory per Node 768GB 192GB, 6 channels
SSD Interface NVMe (still has SAS/SATA baggage) Natively NVMe
Front-End Network Up to 100G Up to 200Gb
NVMe over Fabric Requires 3rd-Party Add-On Native
Protocol Support
POSIX Yes Yes
GPUDirect No Yes
NFS Yes Yes
SMB Yes Yes, SMB 2/3
HDFS Yes Not Required: Mount from POSIX
S3 Yes Yes, via Gateway
Performance1
Read Throughput1,2 40GB/s 56GB/s
Write Throughput1,2 32GB/s 20GB/s
Read IOPS1 No Data from Vendor 5.8M
Write IOPS1 No Data from Vendor 1.6M
Single Mount Point, Full Coherency No Data from Vendor 82GB/s
#1 on IO500 and SPEC No Yes

1 At smallest cluster configuration
2 Spectrum Scale on ESS 3000, performance will vary on other hardware

Feature Comparison: GPFS vs. Parallel File System (WekaIO)

 

Feature Spectrum Scale (on ESS 3000) WekaFS
Ease of Use Hard Easy
Snapshots Yes, with Performance Impact Yes – Instantaneous
Snapshot-to-Object Store/S3 No Yes
Tiering/Hybrid Disk Storage No, Tiering Impacts Performance Yes
Independent Cap/Perf Scaling No Yes
Thousands of Nodes Yes Yes
Dynamic Performance Scaling No Yes
Encryption In-Flight and At-Rest In-Flight and At-Rest
Replication Yes Yes (via Snapshot)
Data Protection N+3, Reed Solomon EC only on ESS Appliance N+4 Distributed Data Protection with High Performance EC
Compression/Deduplication Limited/No Future Release
S/W Only, H/W Independent Yes Yes
End-to-End Checksum No Yes
Active Directory and LDAPP Authentication Yes Yes
Compression/Deduplication Limited/No Future Release
S3 via Swfi via Gateway
SMB Limited/No Future Release
User Quotas Yes Yes
Authenticated Mounts Yes Yes
High Availability Bonding Yes Yes
IB Support (Ideal for GPUs) Yes Yes

Learn how Weka’s file system delivers the highest performance for the most data-intensive workloads.

Additional Helpful Resources

Worldwide Scale-out File-Based Storage 2019 Vendor Assessment Report
5 Reasons Why IBM Spectrum Scale is Not Suitable for AI Workloads
NAS vs. SAN vs. DAS
Learn About HPC Storage, HPC Storage Architecture and Use Cases
What is Network File System (NFS)?
Network File System (NFS) and AI Workloads
Block Storage vs. Object Storage
BeeGFS Parallel File System Explained
Introduction to Hybrid Cloud Storage

A Buyer's Guide to Modern Storage

3 new rules for selecting scalable storage solutions

Download Now

A Buyer's Guide to Modern Storage Solution

3 new rules for selecting scalable storage solutions

Get the Guide

Related Resources

Webinars
Webinars

[Webinar] Accelerating Cryo-EM & Genomics Workflows

View Now
Webinars
Webinars

[Webinar] Accelerating AI Training Models

View Now
White Papers
White Papers

A Buyer’s Guide to Modern Storage

Download