Block Storage vs. Object Storage: When to Use Each
WekaIO Inc. February 22, 2021
There are multiple ways of storing data in computer systems. Some of the most common ways include using different types of file systems or block devices, using object stores, using different types of databases, and a host of other methods. Each approach has its advantages, disadvantages, and requirements, along with the use cases to which they will apply. Moreover, each approach has evolved in parallel to the technologies that enable it (e.g., block devices with Fibre channel, network filesystems with NFS protocol, etc.).
To compound the complexity, different storage media types exist, each with its own performance, durability, and price, such as Hard Disk Drives (HDDs), Solid State Drives (SSDs and NVMes), and more.
Storage companies have designed and built storage solutions implementing these approaches and adding on top of them, e.g. SAN (Storage Area Networks) appliances or All Flash NAS (Network Attached Storage) appliances, while IT and Storage admins would usually need to evaluate their internal workloads’ needs and use the most suitable storage appliance or even multiple different storage appliances for their different organization needs.
This blog takes a look at block storage and object storage, discusses the different technologies, and eventually discusses the relevant use cases of where each would apply.
What Is Block Storage?
As we know, computer data is written in units called “bits”: 8 bits are called a “byte,” 1024 bytes are called a “kilobyte” (KB), increasing in size with megabytes (MB), gigabytes (GB), terabytes (TB), petabytes )PB), and so on.
A “block” is actually several “bytes” of data grouped together. A traditional block was 512 bytes on older storage systems, while the currently more accepted block size is 4K.
For example, if you have a picture that is 128KB in size, that picture will be saved on 32 blocks of 4KB each. In this way any storage media (HDD, SDD, NVMe, other) can usually be exposed to the computer using it as a number of blocks (e.g., an 8TB HDD or SSD would be exposed as more than 2 billion blocks).
Since individual block devices such as HDDs and SSDs have only specific sizes and performance levels, storage companies created storage appliances that are composed of multiple block devices that can then be “exposed” to servers as a different number of block devices. These are called LUNs (Logical Unit Numbers).
For example, let’s consider a storage appliance with 10 X 10TB HDDs that is exposing a 1 X 100TB block device (LUN) to a server or 100 X 1.0TB block devices (LUNs) to 100 different servers. This scenario allows servers to “see” and use block devices that are bigger than any single individual block device on the market. It also allows for better performance, as utilizing multiple devices concurrently provides higher performance than a single device, as well as additional sophisticated features (such as protection and more) that the storage appliance can implement.
A block storage appliance can expose multiple “block devices” (LUNs) to a server, while the server would work with these LUNs as if they were local only to it (that is, as if they are its local HDD or SSD). Therefore, these LUNs cannot be (easily) shared between different servers, and each server would see only its own assigned LUNs and no others.
Because a LUN is actually a group of blocks, the servers will usually (but not always) choose to create a file system on top of it, which means that some of the blocks in the LUN will be used for data and some for metadata (data about the data).
An example of that would be saving a 128KB image under the user’s pictures directory, using 128KB for the image and an additional 4KB for data about the image file (i.e., creation time, access time, permissions, etc.), as well as 4KB for the directory structure. An important distinction is that the block device is not aware of which of its used blocks include data and which include metadata; therefore, accessing this data can be used only through a server that is familiar with the file system that is used on this block device. That is, a LUN that is formatted with an ext4 filesystem can be used only by servers that are familiar with the ext4 filesystem, and even then, as mentioned above, no two servers will use this LUN concurrently.
Additionally, servers would need to be connected to the block storage appliances in some way. Traditionally that would be using Fibre Channel optical cables or using Ethernet connectivity.
Different protocols can be used here, mostly FCP or ISCSI, while more modern NVMe-based block storage appliances will use NVMe over Fabrics (NVMe-oF).
Block Storage Advantages and Disadvantages
The ability to aggregate multiple block devices and expose them to multiple servers with different sizes and additional capabilities is a huge benefit to organizations that want to centralize all of their storage devices into a smaller number of block storage appliances connected using high-speed interconnects, yet there are some disadvantages as well.
Here are the benefits of block storage:
- Centralization: This case presents no need for multiple/different storage devices for servers, as each different storage device presents a challenge in terms of operations, cost, and protection. By centralizing all data into a smaller number of block storage appliances the IT and storage admins gain better control and utilization of the storage.
- Backups and replication: Because the data is centralized it is easy to use the block storage appliances’ advanced features to create point-in-time snapshots of specific LUNs, to replicate LUNs to remote storage appliances, or even to use backup protocols such as NDMP to more efficiently back up the data.
- Performance: Because block storage appliances contain multiple devices and lay out the blocks across multiple devices in parallel, the performance that the servers experience from their LUNs is much faster than that of a single block device. These devices are considered to be the most performant storage type in terms of throughput and latency, as they perform only simple reads and writes and what was considered a high-performance interconnect.
- Advanced features: Some applications can implement advanced block commands to improve specific repeated operations on a block storage appliance. For example, the VMware(R) vSphere Storage APIs Array Integration (VAAI) enables the ability to tell the storage to perform a copy operation or to write zeros to multiple block ranges without the need to go through the host (client).
Here are the disadvantages of block storage:
- Dedicated interconnect layer: Traditionally, block storage appliances eventually required their own interconnect network, which was Fibre channel, and each server that connected to the storage would need a Fibre Channel optical Host Bus Adapter (HBA). This also meant that the customer would need to purchase Fibre Channel Switches and deploy Fibre Channel cables, essentially creating a Fibre Channel interconnect fabric environment to be used only for accessing the storage. It was a very expensive purchase and very difficult to manage (multiple FC zones, zonesets, WWNNs, and WWPNs zoning). Newer configurations now use Ethernet instead of Fibre Channel, which simplifies operations somewhat.
- Complicated management: Eventually, block storage devices map specific LUNs to specific servers. The IT and Storage admin needs to make sure that all servers can see only their dedicated LUNs, as at some point two servers might write to the same LUN, which could result in data corruption.
- Non Shared: As mentioned previously, even if two servers are connected to the same storage appliance they can not easily share their LUN. This is even more problematic if there is a need to share data between different operating systems (for example, for a MAC on Windows and Linux to see the same data).
What Is Object Storage?
Many modern use cases require a location for data placement without the relationship of folders and directory structures. That’s where object storage comes in handy. Additionally, sometimes there is no need for low-latency performance access to the data, while there is usually a need for high capacity at a very reasonable price.
One example might be a backup environment in which a backup application takes multiple files from a highly performant, expensive environment and backs them up to a more reasonably priced environment. Another example could involve multiple surveillance cameras that are periodically sending the last several minutes of a recording as one file to a centralized-capacity location.
Object storage was designed for these use cases and more.
Simply put, an object store is a location that can accept and provide objects. An object can be a file or multiple files aggregated to a single object. Each object has a unique object name, but there is no directory or folder relationship between the objects. For example, a 128KB image can exist as an object with object name “image1.” It does not reside under any directory for any user or group. The permission scheme for accessing objects is simpler than on file systems: a user just needs to provide the correct credentials to access an object.
The object storage appliance can retain metadata (data about the data) for every object. For example, all images that have a car in them can have the metadata tag “car” applied to them so that the object storage can then return all of the images with the tag “car.”
Object stores contain logical entities called buckets. While each bucket contains objects, a bucket can also be mirrored, or erasure coded, across multiple object storage appliances and data centers, as well as have its own credentials.
Object storage is most suited for large objects (objects that are in the MBs size or larger) and can sometimes provide high throughput, but still at a higher latency than block devices.
The most commonly used access protocol for object stores is called S3 (originally created by Amazon). This protocol describes a number of connectionless commands for objects, including PUT, GET, LIST, DELETE, and more. Recently, there has been a growing acceptance for the S3 protocol where applications can natively use S3 as their storage (without the need of a file system).
Object Storage Advantages
High-capacity, inexpensive storage can be used in multiple ways, even if its performance is not as fast as that of a block storage appliance.
- Scalability: Object storage appliances are built to accommodate for multiple petabytes of capacity with billions of objects, partially due to the fact that they do not need to accommodate for many things that filesystems do need to handle. Additionally, object storage appliances are very cost effective, with a price per gigabyte that makes them ideal for capacity tiering with some types of data.
- Protection: Data in the object store is usually erasure coded, which means that even if the object store increases in capacity, the data is still protected and can be serviced across multiple hardware failures that might occur–making it a reliable form of storage.
- DR capabilities: Object stores are built to mirror across data centers as well. Because there is no high performance expectation for the data, the fact that some of it should traverse between the WAN and the data center is not a limitation. This means that object stores are ideal for data types that should be available in case of a data center failure, allowing for continued work on DR in the data center.
- IOT device support: Because S3 protocol is a connectionless protocol, it is widely used with IOT devices that cannot create and maintain a file system connection (NFS, SMB, etc.) in situations when devices simply need to open a connection to upload or download objects and then close the connection.
When To Use Block Storage
Block storage is highly efficient in the following scenarios.
- Single-server, high-performance: For data that does not need to be shared across multiple servers and that is accessed from a single server (such as some DBs, Virtual environments, etc.) block storage provides fast, high-throughput, low-latency access, such as in databases and hypervisors running connections to block storage appliances.
- Sync-mirror data: Since most storage appliances support synchronous block mirroring between multiple storage arrays in different data centers, a highly used scenario for block storage is to have servers accessing data on a LUN in one data center that is block-sync mirrored to a remote data center. While on the remote data center there are servers connected to the replicated LUN, in case of a data center failure the remote LUN will become active, and applications will continue working on a remote data center, thereby ensuring business continuity even during a datacenter failure.
- Centralization: Sometimes it’s essential to centralize data from multiple applications into a single location that is protected, resilient, and uses advanced features such as snapshots, deduplication, compression, and more. In many cases, even applications that are built to distribute across multiple local storage appliances (such as Cassandra, splunk, Elastic search) are being centralized on centralized storage for ease of management.
When To Use Object Storage
Object storage is ideal for the following workloads.
- High scalability at cost effective price: Object stores provide massive scalability with affordable economics.
Backup and DR: Due to their efficient mirroring and erasure coding capabilities across data centers, as well as their versioning capabilities, object stores are suitable for backup and DR scenarios that require data to be restored or used following objects’ deletion or data center failures.
- Core-to-edge use cases: Due to the connectionless properties of the S3 protocol, a common use case involves multiple-edge IOT environments that are pushing (and sometimes pulling) data from the core to a centralized object store. That data is then analyzed in the core environment, possibly analyzed directly on the object store or possibly copied first to a high-performance storage system.
The Future of Storage
An ideal storage appliance combines all of the above properties and more. It has the ability to provide a shared file system between multiple servers in a highly performant manner (similar to a block device but without all of the complication associated with managing multiple LUNs, mapping, and FC). Additionally, it has the economics and scalability of object storage–but without the performance limitations. WekaFS was designed with all of this in mind, with record breaking performance and scalability across NVMe devices, as well as scalability across object storage appliances–all under a single namespace for capacity as well as backup and DR capability–while maintaining ease of use and management.
Additional Helpful Resources
Weka Announces Cloud-Native, Unified Storage Solutions for the Entire Data Lifecycle
FSx for Lustre
BeeGFS Parallel File System Explained
Learn About HPC Storage, HPC Storage Architecture and Use Cases
Worldwide Scale-out File-Based Storage 2019 Vendor Assessment Report
5 Reasons Why IBM Spectrum Scale is Not Suitable for AI Workloads
Isilon vs. Flashblade vs. Weka
Gorilla Guide to The AI Revolution: For Those Who Are Solving Big Problems
NAS vs. SAN vs. DAS
Network File System (NFS) and AI Workloads
Hybrid Cloud Storage Explained