Redefining Scale for Modern Storage

Shani Shoham. March 26, 2021
Redefining Scale for Modern Storage

Data today is diverse, dynamic, and distributed, which means the companies need to plan for growth in the volume of data, the types of data, as well as the number of users accessing that data. With the exponential growth in data, companies need to grow their infrastructures to meet these demands. That is where the ability to Scale, or Scalability, comes in. Gartner defines Scalability as the measure of a system’s ability to increase or decrease in performance and cost in response to changes in application and system processing demands.

Scale Up vs. Scale Out Storage

Traditionally, customers have looked at scale along two dimensions:

  1. Vertical scaling, or scale up
  2. Horizontal scaling, or scale out.

This is true for servers as well as storage.

Vertical Scaling or Scale Up

As the name implies, in this mode if you need more capacity or processing, you buy a bigger system with more memory, more processing power, more bandwidth, etc. While this seems simple, it involves a rip-and-replace of existing infrastructure, and even if can be done non disruptively, it presents challenges that need to be overcome.

Horizontal Scaling or Scale Out

Scale out is an alternative approach to scaling in which you connect several smaller machines and make them act like one large machine. If you need more processing or bandwidth or capacity, you add another node, and the system gets more powerful. While this has its benefits, it also needs more sophistication in the software to ensure that you get to linear scaling.

So, we have these two modes of scaling, scale-up vs scale-out, so are we done? That would make for a very short blog post, wouldn’t it? Cloud, flash, and GPU compute advances in the last decade have had the industry rethinking the dimensions of scale as we know it.

Scalability for Modern and Hybrid Workloads

When you think about modern workloads—as opposed to those 20-30 years ago—what are the ways in which data centers need to scale to accommodate the workloads and modern applications, and what are the new dimensions of scale that customers are grappling with today?

Weka’s highly scalable storage solution is at the forefront of helping customers to scale in every dimension possible, especially as they think about incorporating cloud into their environments. Here are some use cases in which WekaIO shines as customers look to scaling to the cloud:

Scaling for Disaster Recovery

One of the common reasons that customers scale to the cloud is for the purpose of disaster recovery (DR). Instead of keeping all of their backup copies in a data center they can keep a full copy in the cloud so that they can save their data in case of a disaster. In fact, many vendors can move a copy of your data to the cloud, but is your data still in the same format that it was in the data center with metadata? Can you actually use it to recreate the environment, or do you have to bring back all of your data on-premises to restart in case of a data loss? Of course, Weka can keep a backup copy in cloud, allowing you access to data in a disaster as well, but what makes it even better is that you can assign nodes to it so that you can start an implementation of your instance in the cloud. This is huge. It’s almost like taking disaster recovery and putting it into business continuity mode. As such, we offer this new dimension of scalability. Did we mention that we don’t charge separately for the software to copy to the cloud?

Scaling for Capacity

Let’s say that you have a certain number of terabytes of data stored on-premises, but you’re experiencing a spike in data, and you need to ingest more data and process it. In this case, for a short period of time you need more capacity. The choices are (1) to buy more on-premises storage or (2) to extend to the cloud and pay for just what you need and only for the time that you need it. This means that your system should be able to extend seamlessly and tier between on-premises storage and the cloud for capacity. In fact, tiering between on-premises storage and the cloud is something that very few vendors can do, and even fewer vendors can actually do it well and address all of it in a single, unified namespace. This ability is often called bursting for capacity, a situation in which you’re using the services of the cloud for a short time but not changing any of your processes nor infrastructure to accommodate for the spike in data action.

Bursting for Compute

Similarly, there may be a time when, instead of an extra need for storage, you find yourself with an extra need for compute instances. This scenario might take place when you’re running very large batch processing or HPC applications but you don’t have enough cores in your data center and you want to leverage cloud compute. Weka enables you to burst to the cloud for compute and continue your business on-premises after the spike is done.

Scaling Across Availability Zones (AZ)

This ability is pretty unique with Weka, although the term “availability zones” is AWS terminology. It involves a situation in which you have different regions and different availability zones within those regions. It is common practice that when requesting a large number of compute cores for a project, the cloud vendor may allocate these across different “zones” either due to physical limitations or economic reasons. That is, if you ask for 10,000 cores you may not get all of them in one AZ–or at least for the price that you want—so you opt to span two or three AZs to get to the 10,000 cores. The challenge then lies in the fact that your actual data resides in one availability zone. Most file systems fall to their knees and show a significant drop in performance in such a situation. Weka does not. Your performance with the Weka platform stays intact when you scale across AZs, and that is unique. In fact, one of Weka’s large Pharmacology customers is currently bursting across availability zones while working on a project that is of great importance to the world today, and from their experience, Weka was the only company that enabled them to burst for capacity and compute without limiting their performance.

Scaling Across Mixed Workloads: IOPS, Bandwidth, Large, and Small Files

With that in mind, there is also a different dimension of scale in which we’re not necessarily using the cloud. Where Weka really shines is in the ability to scale across different workload needs, whether you’re looking for IOPS or bandwidth or both. Today’s workload consolidation has resulted in putting a large number of diverse workloads onto a single system. Weka is unique in this aspect because the Weka File System (WekaFS™) can manage truly diverse workloads that need IOPS but also workloads that need bandwidth. We’re talking about large files and small files. The intelligence in our Weka software can manage all of your workload needs, no matter however diverse they are.

Modern Storage Solution: Scalable Across NVME Flash, Disk, File, Object, and Cloud

When you are looking for a highly scalable storage system there will be vendors who show up and tell you to rip and replace all components of your data center and move to an all-flash solution. That’s great if you can financially support that, and Weka can certainly support your all-flash system. Maybe in the future flash will provide the same economics as disk for all workloads, but until that time Weka can live in harmony with a certain percentage of your data living on flash and the rest living across different, more affordable storage media. All of this is transparent to the application, so Weka truly scales across the different characteristics of storage tiers: NVMe flash and disk.
Weka also supports file and object, another key area in which our customers benefit. They have access to files and objects through one namespace. In some ways Weka actually makes the object storage usable and more performant across different object sizes predictably. One of our large customers, Genomics England, needed to scale to 140PB, but they had less than 21PB as a high-performance file sitting on NVMe flash, and the rest of it was all sitting on an object. The user is completely oblivious about where the data is coming from, either a high-performance flash or a slower object tier because the intelligence in the data platform software takes care of each function: bringing the data in and providing consistent, predictable, high performance for all data.

WekaFS: Redefining Scale

All of these attributes make up the new definition of scalability based on what customers look for as they manage their modern workloads to be as successful as possible in a truly hybrid cloud environment. Weka is unique because we provide all of these attributes at the same time to the customer. You may find a vendor who does one or the other. However, the question is this: what are the limits that are being imposed upon you when using these solutions? Weka truly eliminates the limits imposed on customers—no matter which dimension of scale they want and need.

WekaFS Architecture White Paper

Read now

POPULAR BLOGS FROM Shani Shoham