Kubernetes Storage Provisioning: What you should know before deploying containerized applications

WEKA. September 3, 2020

Historically, storage has been a big challenge in the IT industry. And storage comes with its own challenges like consistency, retention, replication, or migrating large data sets. All of these challenges aren’t new. And they haven’t gone away, especially now that we have modern distributed systems using containers.

Kubernetes has become the orchestrator of choice to run containerized applications. But containers themselves are ephemeral. This ephemerality represents a big challenge for certain types of workloads (e.g., relational databases like PostgreSQL). Therefore, there are still companies that are skeptical about using Kubernetes for all of their workloads.

In this post, we’ll discuss the importance of architecture around storage, specifically when deploying containerized applications. I’ll cover the peculiarities of working with storage in Kubernetes along with key concepts you need to understand and how they influence your application’s architecture.

Let’s start!

Peculiarities of Storage in Kubernetes

The idea of containers is that your applications become immutable, meaning that when a container is running, its state doesn’t change. If you want to make a change in the application, you have to spin up a new version of that container. What this means in terms of storage is that storage becomes ephemeral (stateless). When the container stops running, you can pretty much say that its storage goes away, or at least eventually will—you can start a stopped container, but it’s unnatural.

To solve the ephemerality of storage containers, you have volumes to decouple storage from the container. When you spin up a new container with an existing volume, all data from a previous container is still there—without any hacks. Kubernetes inherits the volume functionality from containers, plus a few other features, because Kubernetes is a container orchestrator.

To understand better how storage works in Kubernetes, and why some say it’s difficult to implement, let’s take a look at the storage provisioning workflow in Kubernetes.

Storage Provisioning Workflow in Kubernetes

Currently, there are two ways to provision storage for a pod in Kubernetes: statically and dynamically. The main difference relies on the moment when you want to configure storage. For instance, if you need to pre-populate data in a volume, you choose static provisioning. Whereas, if you need to create volumes on demand, you go for dynamic provisioning.

But before you can even start provisioning storage, your Kubernetes cluster needs to have a storage class. By creating a StorageClass object you are effectively telling Kubernetes which storage technology you want to use, and you configure the retention policies, mounting options, and binding modes, among other settings. Once you have a storage class defined, you can proceed with storage provisioning.

So the static provisioning workflow is as follows:

Create a PersistentVolume that has volume properties like capacity, permissions, and class.
Create a PersistentVolumeClaim to request storage and bind to a persistent volume.
Configure a Pod to use the volume claim to mount a volume in a container.

And the dynamic provisioning workflow is similar, but with a small difference:

Create a PersistentVolumeClaim.
Configure a Pod to use the volume claim.
Kubernetes takes care of actually creating a PersistentVolume, and coordinates the process necessary to bind that PersistentVolume to the actualto actual storage entity, which might involve external components, e.g. provisioning of a dedicated LUN or filesystem on a 3rd- party storage vendor product.

Notice that the main difference is whether you want to create a persistent volume in advance or if you want to create it on demand. With static provisioning, you have to make calls to the storage provider to create a new storage volume. Then you create a representation of that volume with a PersistentVolume object in Kubernetes. In contrast, dynamic provisioning does that for you on demand.

So how do these types of provisioning change the way you think about storage?

Dynamic Storage Provisioning Changes Everything

Dynamic provisioning not only simplifies storage management in Kubernetes, but the storage provisioning is no longer tied to the lifecycle of a pod. Additionally, dynamic provisioning hides the implementation details from your workloads. For your applications, the storage technology you’re using in the cluster is transparent. Therefore, you can even use different technologies depending on the workload type.

Moreover, you can use different providers depending on whether your workloads are running on-premises, in the cloud, or with a hybrid approach. The cluster administrator has the freedom to choose the technology that best answers the needs of a particular workload–in terms of capacity, performance, exclusivity of access etc.

Containers quote

However, dynamic provisioning doesn’t just affect how users think about storage. Storage providers like WekaIO also have to think about how to make it easier for users to provision storage. Initially, storage providers had to modify the Kubernetes source code—also known as “in-tree”—to include their solution and adhere to the Kubernetes lifecycle release process. That’s why Kubernetes developed a container storage interface (CSI) that providers can implement without changing the Kubernetes source code. Today, CSI bundles it under a single umbrella, so it’s easier to perform dynamic and static provisioning. Then uUsers simply install the volume plugin within the cluster and use it with a StorageClass object.

So, with all these options, which one should we choose?

CSI Provisioning Is the Way to Go

CSI is becoming the preferred method of provisioning. Based on the post for the general availability announcement of CSI, the plan is to move all “in-tree” volume plugins to CSI. In other words, users need to install a storage provider’s plugin within the cluster. For future versions of Kubernetes, if you don’t have a CSI provider in the cluster, you won’t be able to provision storage for pods, either statically or dynamically. In regard to storage providers, this means that they’ll have to implement the CSI features. To provision storage with a product from a specific certain storage vendor, you will need to get a CSI plugin compatible with that product, like the Weka CSI Plugin.

So, from now on, always go for the CSI plugin option.

Containers

Now, in regard to dynamic or static provisioning, as previously stated, it depends on what your requirements are. If, in your company, you have/want to implement a separation of duties and have clear permission boundaries, go for static provisioning. You can configure, in advance, storage settings or even prepopulate volumes with data.

On the other hand, if you want to reduce the number of YAML files and provision volumes on demand, go for dynamic provisioning. However, there are a few default configurations, like the retention policies, that you also need to consider. For instance, the default behavior is that when you remove the persistent volume claim, the volume is deleted too, although you can always change any of these settings after the volume has been provisioned

Another factor that should be taken into account is the number of discrete volumes provided to pods. Obviously, when only several volumes are used by the cluster, static provision could be the right answer for you. But when hundreds or thousands of different workloads run in parallel, dynamic provisioning would make life much easier.

No matter whether you go with the static or dynamic approach, you might want to create a CI/CD pipeline to define storage settings through YAML manifests. From this perspective, you can treat your storage layer in the same way as you do with your application or infrastructure code.

Stateful Workloads Can Run Kubernetes

The good news is that stateful workloads can run in Kubernetes. However, as with everything, there’s still a place for improvement. But you can rest assured that the community is working on improving storage management in Kubernetes. Proof of that is the development of the CSI, which, for the end user, is somewhat transparent as the only prerequisite is to install the storage plugin in the cluster.

If you want to have a better understanding of how all concepts discussed here fit together, there’s no better way than participating in hands-on labs. Weka has some self-guided labs that you can use to practice static and dynamic provisioning using a CSI plugin.

This post was written by Christian Meléndez. Christian is a technologist that started as a software developer and has more recently become a cloud architect focused on implementing continuous delivery pipelines with applications in several flavors, including .NET, Node.js, and Java, often using Docker containers.