WEKA 4 – Multiple Dimensions of Choice 

Colin Gallagher. June 15, 2022
WEKA 4 – Multiple Dimensions of Choice 

Choice.  Human beings have an interesting relationship with choice. One the one hand, we crave it. We like to be able to choose to control our own destiny, and we generally believe that having more choices makes people more satisfied – that choice represents freedom. Many studies support this, showing that when denied choice in areas like investments and healthcare, people become dissatisfied and often refuse to choose.

On the other hand, too much choice can, paradoxically, lead to analysis paralysis and buyers’ remorse. A famous study in 2000 showed that while a display of 26 jam types attracted more shoppers to browse the choices, a 6-choice display actually increased sales more.

The challenge with bringing products to market is to find the right balance of providing choice without being overwhelming, but offering enough to grant that control of destiny. With our fourth-generation release of the WEKA Data Platform, we have done just that.

Choice of Clouds

The WEKA Data Platform was born in AWS and launched at ReInvent 2017. It is a fully containerized, cloud-native, software-defined platform that combines compute, storage, and high-speed networking to deliver a seamless, simplified, highly performant data management experience with best-in-class economics. For on-premises deployments, we leverage the same code that we do in the cloud to deliver the same features on our customers’ choice of hardware. But many of you have been telling us that you want a choice of cloud providers. If you’re like most companies, you have a multicloud strategy and you need access to vendor-specific capabilities of different clouds. 

With WEKA 4, we are delivering the industry’s first multicloud data platform for AI and next-generation workloads. We’re extending the same epic performance, scale, and reliability that our customers have enjoyed for years on-premises and in Amazon Web Services (AWS) to Google Cloud Platform (GCP), Oracle Cloud Infrastructure (OCI), and very soon Microsoft Azure. For the first time, organizations can benefit from a single, unified data platform that performs consistently across edge, core, and multicloud environments. Unlike competitive offerings, the WEKA Data Platform provides the same code base across all deployments, standardizing management and capabilities across all clouds.

Each cloud provider has different philosophies on building and automating their clouds, be they highly virtualized/containerized or composable bare-metal infrastructure. We have done the work to integrate the WEKA Data Platform with each major cloud provider to give you a full set of capabilities in the cloud of your choice without having to choose different types of software.

For example, WEKA provides deep integration with AWS CloudFormation scripts and APIs, whereas in GCP we adopt a terraform-based deployment and automation models are the norm. And in OCI, we support a scripted deployment on composed bare metal hardware level that delivers insane performance up to 2TB per second.

With each provider, WEKA seamlessly combines the low latency performance of NVMe-enabled cloud compute instances with the massive scalability of their native object storage – all presented in a single namespace for best economics at hyperscale.  

More Choice within AWS

In WEKA 4, we also expanded our AWS integration, adding support for S3 Glacier Instant Retrieval. This new storage class improves cold data economics for rarely accessed data that still needs immediate access in performance-sensitive use cases like medical imaging, media assets and genomics. It is Amazon’s lowest-cost storage for long-lived data that is rarely accessed but requires retrieval in milliseconds. With S3 Glacier Instant Retrieval, you can save up to 68% on storage costs compared to using the S3 Standard-Infrequent Access (S3 Standard-IA) storage class, when your data is accessed once per quarter.

New Cloud Use Cases

Cloud is a deployment model, not a feature set or a single use case, and there are many reasons to leverage the cloud to meet your business needs and technology strategy. Maybe you just want to limit your cloud use to backup or disaster recovery; maybe you are just now migrating workloads to the cloud; or maybe you are already all-in on cloud. With WEKA, you don’t have to choose between different data infrastructure approaches to support all of your different use cases – we support them all to streamline and accelerate how you manage data at every stage of your cloud journey, on whichever cloud you choose.

With the WEKA Data Platform you can:

  • Move or back up data to clouds
  • Use the cloud for data tiering from on-premises
  • Run natively on the cloud
  • Tier and reduce data within a cloud
  • Migrate or replicate data between clouds or availability zones

Greater Hybrid Cloud Flexibility

The WEKA Data Platform gives you the flexibility to run on-premises or in the cloud – it also enables you to easily run sophisticated hybrid workflows across both. By augmenting a WEKA cluster deployed on-premises, you can easily create an entire usable copy of data in the cloud that can be used concurrently for compute and burst workloads like high-performance data analytics, medical imaging, ETL and video production. When no longer required, this cloud dataset can be turned off when your work is complete.

Some WEKA customers are even building sophisticated hybrid cloud workflows where they start processing data on-premises, then use the WEKA Data Platform to move it to the cloud for final analysis, and vice versa.

New in WEKA 4, we are simplifying updates of data for these hybrid workloads with the ability to incrementally and continuously promote data changes to the cloud – updating the “remote” cloud dataset non-disruptively. Only changed files or changed parts of files are updated, without having to unmount and remount the filesystem in the cloud. This both minimizes the bandwidth needed and simplifies hybrid workflows, keeping them running non-stop.

More Ways to Lower Your TCO 

The WEKA Data Platform’s tiering capabilities deliver the best economics at hyperscale. Because you can scale performance and capacity independently, it is typically more than twice as cost-effective as other storage services at comparable performance levels. In the WEKA 4 release, we are introducing two new ways to be even more cost effective: 

Data Reduction

First, new file system-wide data reduction – on-premises and in the cloud – makes deployments even more affordable, while maintaining the high performance our customers rely on. WEKA 4 offers enhanced data reduction that maintains exceptional performance while delivering significant reduction on a range of workloads. The WEKA Data Platform will look for similar blocks of data (they don’t need to be 100% identical like traditional data reduction techniques) and reduce them, storing any differences separately.

Data reduction can be enabled per filesystem. Obviously compression ratios will be workload dependent, but we expect excellent compression with text based data, large scale unstructured datasets, log analysis, databases, code repositories, and sensor data. We are providing a Data Reduction Estimation Tool (DRET) that can be run on existing file systems to calculate the reduction rate of your datasets. It runs on any filesystem and will soon support scanning of S3 buckets as well.

QLC Support

For on-premises customers with significant capacity needs that also require flash-level performance, we now support QLC drives that lower the cost of the flash tier. As we see customers deploying increasingly larger environments, QLC can help to make higher density systems more economical. Initially, we are supporting select Intel QLC drives but will roll out others in the future. And yes, we did the work to ensure that we aggregate writes from our native 4k block size to match the 64k writes needed to get the best endurance from these drives. 

Expanded Protocol Support

The WEKA Data Platform delivers broad multi-protocol support, allowing multiple applications (Windows, Linux, Mac, POSIX, or S3 native) to simultaneously share and collaborate on a single dataset, with full data consistency across protocols. This enables you to stop making copies of data for each application and reduce data movement to decrease the complexity of your environment.

WEKA 4 gives customers more choice in protocols for their data pipelines. We now offer NFS v4 support, which is rapidly becoming a key protocol for HPC, database, and highly virtualized applications. We also offer a new scale-out SMB stack (SMB-W – ‘W’ is for WEKA) optimized for high throughput low latency with Windows applications that remain common in video and imaging workflows. SMB-W can achieve multi-gigabyte per second throughput. In the words of one of our beta customers: “I am seeing 9 GB/s on Windows – that’s bananas. But I can do it on WEKA.”

For more details on multi-protocol use cases, check out the “Why Multiprotocol Matters” blog post from my colleague Joel Kaufman.

Better Management Choices

We provide multiple ways for you to manage your WEKA Data Platform system: robust API and CLI capabilities for automation, integration with your IT ecosystem, and a modern graphic user interface (GUI) for quick and easy access to information and configuration.

The WEKA 4 GUI has been completely redesigned to allow a single administrator to quickly and easily manage hundreds of petabytes of data without any specialized storage training. Co-developed with some of our largest customers who were asking for better ways to view and manage their exascale data environments, the new GUI provides better, faster visibility into system status and simplifies finding the information you need. It features a fully revamped dashboard and filtering tools, and lays the foundation for easily managing multi-tenant environments.

The Right Amount of Choice

With WEKA 4, we have delivered a balanced set of choices to help you leverage the WEKA Data Platform in more ways, for more applications and use cases, with better economics and improved usability.

  • WEKA 4 delivers the speed, scale, and simplicity customers have enjoyed on-premises – now in their choice of cloud.
  • Offering a choice of data tiering, data reduction and media types, WEKA 4 delivers class-leading economics, both on-premises and in the cloud.
  • An expanded choice of protocols enables more applications in your environment to take advantage of the benefits of the WEKA Data Platform.
  • WEKA 4 gives you a simpler way to monitor and manage at hyperscale.

We can’t wait to see what you create with it.

Related Resources

White Papers
White Papers

Top 5 Misconceptions About GPUs for AI

Download
Case Studies
Case Studies

How Atomwise Accelerated and Innovated Drug Discovery and Time to Market with Weka and AWS

View Now
eBook
eBook

[eBook] Accelerating Next-Generation Workloads on the Cloud

View Now