CASE STUDY: How WekaIO Delivered High Performance and Best Cost for kdb+ on AWS
Barbara Murphy. August 14, 2018
KDB+ is Possible on AWS with WekaIO Matrix™
Kx has built the world’s fastest time series database for ingesting, analyzing and storing massive amounts of data. Its flagship product kdb+ is a leading solution for time series databases, machine learning, streaming analytics, operational intelligence and IoT analytics.
Several of Kx’s customers migrated their IT infrastructure to the public cloud which prompted Kx to investigate ways to provide on-premises like performance with public cloud resources. Kx observed more performance variations in a virtualized cloud environment and the company has needed to create carefully architected cloud solutions that ensure their users still get cutting-edge performance.
Kx discovered WekaIO by way of the AWS solutions architecture team, who positioned WekaIO as a solution that could address end customer pain points when running kdb+.
Matrix is a distributed file system that supports full POSIX semantics. Files are broken into chunks and distributed across a cluster of EC2 instances (r3 or i3). After an initial review, the Kx team began an extensive evaluation of Matrix. In particular, the Kx team evaluated a unique feature of Matrix that tiers data to S3 within a single namespace and includes operator selected tiering rules. This feature allows users to store a very large data lake on S3 at considerable cost saving compared to block storage, while still enabling high performance flash tier to manage the working data set. This eliminates the need to copy data back and forth from S3, as the Matrix file system manages all data movement.
The Kx team tested Matrix extensively and observed that there were latency issues when data had to be read from the S3 tier to the SSD flash tier. The result was a reduction in overall performance. Kx worked with WekaIO development to create an innovative solution which provides “pre-fetch” hints to data on the S3 tier and pre-stages them on the flash tier for optimal application performance. This innovation allowed the best combination of performance to the kdb+ database while providing great capacity scaling on S3.
Kx was impressed with the performance of the combined AWS EC2 instances and WekaIO Matrix file system. In particular they were impressed with the ability of the file system to scale linearly as the number of nodes scaled up.
The team measured a streaming bandwidth of 1.03GBytes/second representing wire speed to a single kdb+ client node. The following chart shows how the bandwidth scaled from 1 to 8 kdb+ nodes, delivering an aggregate bandwidth of over 9GBytes/second.
WHY KDB+ WITH MATRIX ON AWS?
In summary, the Matrix file system on AWS provides a high-performance solution for customers using the kdb+ database from Kx. The addition of advanced features to accelerate performance from S3 to SSD further enhanced the solution to enable customers to leverage an S3 data lake with high performance flash-based storage into the applications, all managed as a single namespace.
Read more about migrating a kdb+ historical database to the Amazon Cloud.
For 10 years, Amazon Web Services has been the world’s most comprehensive and broadly adopted cloud platform. AWS offers more than 90 fully-featured services for compute, storage, databases, analytics, mobile, Internet of Things (IoT) and enterprise applications from 42 Availability Zones (AZs) across 16 geographic regions in the U.S., Australia, Brazil, Canada, China, Germany, India, Ireland, Japan, Korea, Singapore and the UK. AWS services are trusted by millions of active customers around the world monthly — including the fastest growing startups, largest enterprises, and leading government agencies — to power its infrastructure, make it more agile and lower costs. Learn more about AWS, visit aws.amazon.com.
WekaIO helps companies manage scale and future proof their data center so that they can solve real problems that impact the world. WekaIO Matrix™ is the world’s fastest shared parallel file system that leapfrogs legacy storage infrastructures by delivering simplicity, scale, and faster performance for a fraction of the cost. In the cloud or on-premises, WekaIO’s NVMe-native high-performance software-defined storage solution removes the barriers between the data and the compute layer, thus accelerating artificial intelligence, machine learning, genomics, research, and analytics workloads.