Why Multiprotocol Matters

Joel Kaufman. June 15, 2022
Why Multiprotocol Matters

I’m just going to come right out and say it: singular high-performance compute (HPC) workloads don’t need multi-protocol access. Your typical database and even most HPC scientific computing workloads usually run under a single protocol.

However – we mustn’t confuse workload with workflow. High-performance workflows typically involve moving data through a pipeline with various steps handing it off from one application to another to perform essential processing.

In many of these high-performance workflows, the applications are disparate and speak different languages because they are optimized for a single use case. Many of these workflows also have performance requirements across these multiple applications that force customers to copy data to an optimal location for each application.

Let’s look at a couple of examples:

In the bio-pharma world, one of the most data-intensive tools in use today is the Cryo-Electron Microscope (CryoEM). The most popular version of the CryoEM uses the SMB protocol during ingest and can write out 1 million high-resolution images per run pushing up to 10GB/sec. However,  in most cases, analytics tools are Linux-based and use AI/ML algorithms to perform computer vision processing to identify the best match for things such as protein fold receptors. For these tools, POSIX or NFS is the protocol of choice.

A second use case that is growing significantly is in the area of IoT sensor data. Large manufacturers of everything from LCD screens to automotive components (and even finished vehicles) are producing enormous amounts of data and sending it in a native S3 format, creating an immediate data lake repository. This data may need to be accessed by multiple tools over different protocols for analytics, pre-processing, and even relatively mundane things such as data archiving.

So… how does WEKA fit in? While we’ve offered multiprotocol data access for a long time, with the fourth generation WEKA Data Platform release, we’re now offering a high-performance SMB-W stack, support for NFSv4.1, and improved S3 capabilities:

  • Our new scale-out SMB-W stack enables data ingest from SMB-3 clients at full wire-speed by utilizing SMB multi-channel, enabling the world’s highest performance applications, such as the aforementioned CryoEM systems, video editing and CGI animation suites, GPU-centric AI/ML analytics, and more to utilize a common data platform.
  • New support for NFSv4 will give our customers a broader range of methods to access their data and share it within the WEKA Data Platform.
  • The WEKA Data Platform’s enhanced S3 protocol delivers the industry’s first multiprotocol access for high-performance workflows, offering massive amounts of ingest performance in the native S3 language that many applications and sensors are moving to, and then letting other applications use that data via POSIX or NFS.

WEKA 4 delivers a complete multiprotocol stack, including POSIX, S3, NFS, and SMB, giving customers the ability to centralize their data and eliminating the need to copy data between systems so that different applications can more easily access and act on it. I previously wrote, Making and Breaking Records: Do Benchmarks Matter?,  where I explained the ability to deliver high performance across multiple types of workloads without additional tuning is a foundation for a zero-copy architecture. Multiprotocol access complements this performance, preventing data silos from being created by simultaneously enabling access to data across all protocols. This zero-copy philosophy simplifies data operations and improves time-to-value for the data itself.

Want to know more? Ping me at joel@weka.io, on Twitter @thejoelk or contact info@weka.io for more information.