WekaIO Matrix and Univa Grid Engine – Extending your On-Premise HPC Cluster to the Cloud
David Hiatt. June 21, 2019
Weka.IO Matrix and Univa Grid Engine – Extending your On-Premise HPC Cluster to the Cloud
This blog on Weka.IO Matrix and Univa Grid Engine is being republished with permission, courtesy of Univa. It was originally published by Bill Bryce, VP of Products, Univa, on the Univa website here.
For Univa Grid Engine customers deploying hybrid clouds, getting the storage environment right is a key challenge. While some applications require very high-bandwidth (CFD or crash simulations reading and writing massive files) other applications (think life sciences and machine learning) are much more IOPS intensive, needing to process millions of reads and writes per second with very low latency.
In an earlier article, I discussed a variety of file systems and data synchronization solutions and their tradeoffs. In this article, I wanted to take a closer look at WekaIO Matrix™, and explain how Matrix works with Univa Grid Engine and Navops Launch to deliver high-performance compute and storage clusters.
About WekaIO Matrix
WekaIO is a Univa partner and the creators of WekaIO Matrix, a distributed high-performance file system. Matrix is exciting to HPC users for several reasons:
- It is exceptionally fast delivering both high-bandwidth and IOPS
- It is designed specifically for modern SSD drives and interconnects
- It is flexible, supporting bare-metal, containerized or virtual environments and hybrid clouds
- It sidesteps much of the complexity usually associated with high-performance file systems
Much like Univa Grid Engine, WekaIO Matrix is deployed in many industries requiring high-performance compute and filesystem services including EDA, Life Sciences, and Machine learning/AI. While file system performance depends on the number of cluster nodes and how they are configured, at the time of this writing, Matrix is the clear performance champ beating all comers on the latest Q1, 2019 SPEC SFS benchmarks.
Performance and reliability
Unlike distributed file systems that use 3-way block replication, Matrix stripes data across between 4 and 16 SSDs and data is protected by either 2 or 4 parity drives. N+4 data protection (where N is the number of data drives) delivers much better reliability than triple replication schemes, and N+2 protection delivers similar reliability to 3-way replication but is more space efficient.
Part of the reason that Matrix is so fast is that it uses small 4K block sizes that match the block size of underlying SSD devices. Coupled with the low latency facilitated by an embedded real-time operating system (RTOS), the small block size helps Matrix efficiently read and write data. It also avoids the “write amplification” problem found in competing file systems where even small changes in data cause large blocks to be unnecessarily over-written, thus extending the life of SSD drives in the cluster.
Deploying WekaIO Matrix
Matrix is designed to run on standard x86 Linux servers using 10GbE or faster network components. It supports off-the-shelf SATA, SAS or NVMe SSDs found in most computer systems and cloud instances. Customers can deploy Matrix in multiple configurations:
- Matrix can be deployed on dedicated storage nodes equipped with suitable SSD drives. Storage clusters can range in size from a minimum of six servers to clusters with thousands of servers. If a dedicated Matrix cluster is deployed, Univa Grid Engine hosts will need to run the WekaIO System Client to mount the Matrix file system(s) served by the Matrix cluster as a POSIX filesystem to achieve maximum performance. Clients can also access the filesystem through NFS or SMB network protocols,
- If the Univa Grid Engine cluster already has suitable SSD drives, Matrix can also be deployed directly on the Univa Grid Engine hosts. This is referred to as a “converged model” because no additional hardware is required to provide high-performance file system services for Grid Engine. The Matrix real-time operating system (RTOS) is deployed in an LXC container making it easy to deploy and avoid host compatibility issues on Grid Engine clusters.
When deploying Matrix on a Univa Grid Engine cluster, administrators can control the resources on the host assigned to the Matrix container (# of cores, memory, network interface cards, and SSDs). Matrix will only use the resources assigned to it, ensuring that it doesn’t interfere with existing Grid Engine jobs.
The capacity of the file system will depend on the number of cluster hosts, and the size and number of SSDs on each host. For example, if I need 100TB of shared usable storage, I can accomplish this by using a cluster of 12 x i3.16xlarge storage dense instances on AWS where each instance has 8 x 1,900GB NVMe SSDs for a total of 15.2 TB of SSD storage per instance. A twelve node cluster, in theory, would have 182.4 TB of capacity, but if we deploy a stripe size of eight (6N + 2P) approximately 25% of our capacity will be dedicated to storing parity. 10% of the Matrix file system is recommended to be held in reserve for internal housekeeping and caching so the usable capacity of the twelve node cluster is ~123 TB (assuming no backing store).
182.4TB * (6/(6+2)) * 0.9 = ~123 TB
As described above, WekaIO Matrix provides a lot of deployment flexibility. It can be deployed on dedicated hosts or in a converged model with different storage technologies and interconnects. Matrix also supports transparent management of tiered-storage where the SSD cluster stores warm data but is backed by a local or remote S3 compatible object store. By augmenting the SSD cluster with object storage, the capacity of the file system can grow arbitrarily depending on the amount of available object storage. With tiered storage, users can manage tradeoffs related to performance, cost, and capacity. The management of tiered-storage is automatic and is transparent to users and applications.
Hybrid Cloud Deployments with Univa Grid Engine and Navops Launch
In a Univa Grid Engine environment, a sample hybrid-cloud deployment with Matrix is illustrated below. In this example, the Matrix software is installed on each on-premise Univa Grid Engine host. Whether this is practical for your environment will depend on the storage available on your cluster hosts. Customers can optionally setup a dedicated storage cluster as described above.
In the cloud (Matrix supports AWS), there are similar deployment options. It’s possible to set up a converged environment in the cloud also, but since Univa Grid Engine users often scale compute capacity dynamically based on application demand, it’s more practical to deploy a dedicated storage cluster on storage-dense AWS i3 cloud instances.
WekaIO provides an online calculator to help select optimal storage configurations and estimate IOPS and bandwidth. If I needed 100TB of usable capacity in the cloud for my Univa Grid Engine applications, but assume that 80% of this capacity can be on S3, I can deploy ten i3.4xlarge instances in the cloud backed by S3. According to the Matrix calculator, this configuration would cost $21.22 per hour (a relatively modest $15K per month) but would deliver over 1.2M IOPS and 6 GB/sec of bandwidth to my Grid Engine cluster. Depending on the bandwidth, IOPS, and back-end storage capacity used, users can select cluster sizes and instance types tailored to their workloads.
Using Navops Launch to scale compute and storage
Using the configuration above, I can automatically deploy different cloud instance types depending on the nature of the workload under Grid Engine control. For example, using cloud bursting policies in Univa Grid Engine, when there is an application workload that can’t be fulfilled by the on-premises cluster, Grid Engine can signal Navops Launch to auto-provision AWS hosts using a custom Amazon Machine Image (AMI) that has Grid Engine, Docker, and the Matrix client software pre-installed along with other pre-requisites. These instances can be up and running in minutes and be bound to a cloud or on-premises cluster expanding capacity.
The new applet functionality in Navops Launch is a gamechanger for hybrid cloud environments. Rather than simply scheduling workloads to optimize the use of compute and storage, the built-in automation engine and applet facilities provide dynamical marshaling of cloud storage services based on application metrics collected from Univa Grid Engine and usage and cost metrics extracted from the cloud provider. User-defined applets can make decisions at runtime related to data locality and data movement, optimizing performance, and cost and even provisioning, de-provisioning, or scaling the Matrix storage cluster.