Time Series Database (How It Works & Use Cases)
December 8, 2021
Interested in time series databases? We explain what time series databases are, how they work, and give examples of what they can be used for.
What are time series databases? Time series databases are a system to store and retrieve metrics or events associated with a specific timestamp, a “time series.” For example, time series data could be from applications such as manufacturing facility sensors, autonomous cars sensors, real-time financial market data, and IoT data.
What Is a Time Series Database?
Traditional databases store data tables, and that data is connected through different relational schemas. More importantly, that data is relatively static, holding critical information that can be retrieved at any time by connected applications.
This kind of data representation has several benefits. However, one of the limitations is the ability to efficiently and effectively represent time series data. This type of data is further broken down into two types of organization:
- Linear: Any data point can be viewed as the result of past data and as a precursor for future data on a time continuum.
- Nonlinear: Data points measured through nonlinear equations, asymmetric cycles, and other approaches to measurement.
Time series data is information that represents repeated measurements over time. Some examples of this data include:
- Financial Data: Stock trading and speculation require modeling data based on time—how trends change over a period of time. This information is particularly useful for modern automated trading platforms that compute trends as part of their decision-making processes.
- Machine Learning Models: Many machine learning systems measure changes over time to model the work they interact with, such as for systems that operate self-driven cars or physics models.
- Debugging and Trace Calls: Programmers developing complex applications will use tracing subroutines to track bugs or other aspects of the flow of execution. Time information is critical to understanding where and when critical events happen.
- Healthcare: Critical health information, such as electrocardiograms, are measured on a time axis so that practitioners can measure changes in patient health status.
- Internet of Things: IoT collection devices will often regularly gather data based on time and date stamps to support accurate modeling and data management.
The most important thing to understand is that series data measures change over time, and as such, the information must be stored to make indexing and processing easier.
A time series database emphasizes organization information related to time, using data objects like timestamps. Strictly speaking, this doesn’t exclude using a relational database as a time series database, but the latter will often include different kinds of information, data types, or structural components.
What Are the Components of a Time Series Database Architecture?
While a time series database can, technically, be a traditional relational database, these databases don’t necessarily have the ideal infrastructure for the applications where time series data is the most useful.
Some of the ideal components included in these databases include the following:
- Columnar layouts: Relational databases are structured in table rows, with each row representing an entry and each column representing different attributes of that entry. Like those used for high-frequency stock trading, some applications rely on column layouts. The combination of column-oriented storage and time series information means that many specialized databases emphasize columnar organization.
- ACID Transactions: “Atomic,” “consistent,” “isolated,” and durable” are principles of database transaction management that ensure data integrity during high-volume transactions— the types of workloads most associated with time series applications.
- Time Series Analysis: Time series databases should support special types of analysis related to time series workloads for finance, retail, or data analysis applications.
- Native Data Types: Using timestamps, such a database will organize information based on timed events. This approach allows applications to stream information based on timestamps. While relational databases can create timestamps, time series databases can house system-specific data types that represent timestamps.
- Time Series Compression: Systems managing time series data often collect immense quantities of information to the tune of hundreds of gigabytes per day. Time series platforms and databases can deploy compression methods that leverage time series attributes. For example, delta encoding can minimize data storage sizes by compressing data based on the differences between database objects from one timestamp to the next.
What Are the Benefits and Challenges of a Time Series Database?
Like any other technology, a time series database has several advantages and disadvantages. It isn’t a technology that works across all applications but fits into a specific high-demand niche.
Some of the advantages that come with time series databases include the following:
- Support for High-Performance Analytics: The key benefit of a time series database is that, for administrators of applications handling time series data, they provide the storage logic and organization needed to support those applications. While a relational database may fit that bill, it may not do so to its time series counterpart’s capacity.
- Scalability and Volume: Because these databases support more rapid insertion and deletion from the database, these can scale much more readily with demand. Scalability is a significant issue when dealing with these workloads and not a space where developers and administrators should cut corners.
- Accuracy: Because time series databases are geared for time-based information, they help developers and analysts get more accurate measurements of change over time—a critical aspect of these workloads.
- Efficiency: Queries made against time series databases are lightning fast and often remain so even with data compression implemented. Since performance is such an important part of time series applications, these databases are critical to their effectiveness.
With all of these being said, some challenges come with time series databases:
- Data Input Limitations: While these databases scale well, they don’t scale infinitely. Accordingly, it is up to admins to curtail high-volume data input depending on the source so the database isn’t overwhelmed or corrupted.
- Optimizing Reads and Writes: While ACID principles can protect data integrity, it’s still up to the database management to understand how to optimize the frequencies by which reads and writes occur without overlapping. If there is significant overlap, there will be a slowdown and loss of performance.
High-Performance Cloud Computing from WEKA
One of the best ways to support a time series database is to deploy scalable and high-performance systems to run it on. Such modern data platforms can not only provide critical scalability and efficiency, but they can also give you tools to help you manage these databases with analytics and robust administrator features.
WEKA supports high-demand databases with the following capabilities:
- Streamlined and fast cloud file systems to combine multiple sources into a single high-performance computing system
- Industry-best, GPUDirect performance (113 Gbps for a single DGX-2 and 162 Gbps for a single DGX A100)
- In-flight and at-rest encryption for governance, risk, and compliance requirements
- Agile access and management for edge, core, and cloud development
- Scalability up to exabytes of storage across billions of files
Contact us to learn more about WEKA cloud platforms and how we can support heavy-duty time series workloads.