BeeGFS Parallel File System Explained

Barbara Murphy. August 25, 2020

What is BeeGFS?

BeeGFS is a parallel clustered file system, developed with a strong focus on performance and designed for very easy installation and management. It originated as an internal program at the Fraunhofer Center for HPC in 2005 and was originally known as the Fraunhofer filesystem.
If I/O intensive workloads are your problem, BeeGFS is often proposed as a solution because of its parallelism. A BeeGFS based storage system is currently ranked #7 on the IO500 behind Lustre, WekaFS and Intel DAOS

Why BeeGFS?

BeeGFS transparently spreads user data across multiple servers. By increasing the number of servers and disks in the system, you can simply scale performance and capacity of the file system to the level that you need, seamlessly from small clusters up to enterprise-class systems with thousands of nodes. Similar to the Lustre file system, BeeGFS separates data services and metadata services. When a client has received the metadata information from the metadata servers, it can directly access the data. Unlike traditional NAS systems, this provides for higher performance.

Disadvantages of BeeGFS

BeeGFS is an open source project which is designed to cater to academic HPC environments, but it lacks many of the features required in an enterprise environment. The following provides a summary of limitations that BeeGFS suffers from

Does not support any kind of data protection such as erasure coding or distributed RAID.
Does not have file encryption, at rest or on-the-fly
No native support NVMe-over-Fabric. Need to pay extra for 3rd-party NVMe-over-Fabric layer
Needs separate separate management and metadata servers
Limited by legacy storage interfaces such as SAS, SATA, FC
Does not support enterprise features such as snapshots, backup, data tiering,
Does not support enterprise protocols such as NFS or SMB (requires separate services)

BeeGFS & AI

As noted previously, BeeGFS separates data and metadata into separate services allowing HPC clients to communicate directly with the storage servers. This was a common practise for parallel file systems developed in the past and is similar to both Lustre and IBM Spectrum Scale (GPFS). While separating data and metadata services was a significant improvement for large file I/O, it created a scenario where the metadata services then became the bottleneck. Newer workloads in AI and machine learning (ML) are very demanding on metadata services and many of the files are very tiny (4KB or below), consequently the metadata server is often the performance bottleneck and users will not enjoy the design benefits of a parallel file system like BeeGFS. Studying the IO500 numbers for BeeGFS, it is evident that it could not hit high IOPS performance, achieving a lower number on the md test (metadata test) than on the bw test (bandwidth test).

AI and ML workloads also require small file access with extreme low latency, unfortunately BeeGFS does not have support for new network protocols like NVMe-over-fabrics or NVIDIA® GPUDirect® Storage which deliver extremely low latency to GPU based systems. The result is that expensive GPU resources are starved of I/O resulting in long epoch times and inefficient utilization of very expensive GPU resources.

Additionally, most main-stream enterprise customers expect a certain level of data protection that BeeGFS was never designed for. BeeGFS is commonly referred to as a scratch-space file system, which means if there is a major crash then the analysis is simply restarted with no consideration for data protection. For many ML use cases, the cost of data acquisition is so high that it has to be fully protected. Imagine if the entire training set for an autonomous vehicle was lost? It would take millions of dollars and many man years to replace. Consequently enterprise customers look for some table stakes features that BeeGFS does not offer.

Some common enterprise tasks that are not possible with BeeGFS are,

  • User authentication – imagine if a disgruntled employee deleted a whole training set – it happens
  • Snapshots – Commonly are used as a way to save specific training runs for comparison with others
  • Backup – Immutable copies of data that can be retrieved at a later date
  • Backup – Saving data from major disaster and ensuring it can be recovered from
  • Encryption – Protect sensitive data (maybe patient MRI or XRay) from threat or rogue actors
  • Containerization – Integrate with container services for stateful storage
  • Quotas – Ensure groups are not consuming excessive storage services due to bad practises

Bottom line, BeeGFS was designed as a research environment file system but does not scale to the needs of commercial high performance computing (HPC), one of which is AI and ML

Comparing Parallel File Systems BeeGFS vs. WekaFS

Architecture ThinkParq (BeeGFS) WekaIO (WekaFS)
Small Footprint Configuration 5 servers in 9RU 8 servers in 4RU
# of Server Nodes 2 to hundreds 8 to Thousands
Supported Storage Interfaces Legacy SAS, SATA, FC Natively NVMe
NVMe over Fabric 3rd-Party Add-on Built-In
Optimized for Mixed Workloads No Yes
Protocol Support
POSIX Yes Yes
GPU Direct Storage No Yes
NFS No Yes
SMB No Yes, SMB 2.1
S3 No Yes
Filesystem
Directories per Directory No Data from Vendor 6.4T
Files per Directory No Data from Vendor 6.4B
File Size No Data from Vendor 4PB
Filesystem Size No Data from Vendor 8EB (512PB on Flash)
Snapshots No Data from Vendor Thousands
CSI Plugin for Kubernetes No Yes
Security
Data Encryption No At-Rest and In-Flight
Performance
Read Throughput 25.2GB/s, 20 servers 56GB/s, 8 Servers
Write Throughput 24.8GB/s, 20 servers 20GB/s, 8 Servers
Read IOPS No Data from Vendor 5.8M
Write IOPS No Data from Vendor 1.6M
Single Mount Point, Full Coherency No Data from Vendor 82GB/s
#1 on IO500 and SPEC No Yes

Learn how Weka’s parallel file system delivers the highest performance for the most data-intensive workloads.

Related Resources