Highest query performance and concurrency with flash and economics of HDD object

Financial organizations who have adopted a Cloud-First strategy, increasingly leverage public cloud for its elasticity, scalability, and ease of use for quantitative analytics, back testing, and algorithmic trading. However, these are very latency-sensitive workloads and require consistent high performance to reliably execute trading strategies. WekaFS, with its record-breaking STAC performance on AWS, has proven that latency-sensitive financial workloads can be effectively run on AWS, while providing institutions with the elasticity and scalability of Amazon EC2. With this capability, financial institutions can run more complex models, back testing, and algorithmic trading to derive actionable intelligence and match machine trading requirements.

Tick Data Analytics and its Use Cases

Tick data—This deep, granular time series of market prices and transactions is the lifeblood of trading in liquid markets. Tick data is used for a wide range of purposes, from developing and backtesting trading strategies to assessing execution quality and measuring risk. Recent trends like the growth and sophistication of automated trading and the proliferation of new regulations place a premium on technology that can accelerate the analysis of tick data or broaden its use at a lower cost. Tick data analytics is run on historical data sets as well as real-time streaming data from stock exchange feeds like NYSE and NASDAQ, as well as aggregators like Bloomberg and Thomson Reuters. Trading organizations need to understand the potential of new technologies to help, whether those relate to storage (including non-volatile RAM, parallel file systems, and advanced storage architectures), servers, or tick-database software.

Some of the use cases where high performance Tick data analytics is used are as follows:

  • Risk Analytics – Banks, Insurance, Retail, Exchanges for credit risk and fraud detection
  • Trading Systems – Brokerage, Hedge Funds, Exchanges for quant trading, backtesting
  • Banking Systems – Banks, Clearing houses for electronic payment processing, compliance reporting etc.

Challenges with Existing Approaches for Time-series Financial Analysis on AWS

Several technology strategies have been used to run time series financial analytics on AWS including block-based solutions leveraging EBS, database sharding leveraging EC2 instances with local NVMe storage, and cloud file systems. Each solution carries significant tradeoffs, in cost, performance, the size of the data sets and the complexity of the models.

Block Storage-based Solutions—Existing approaches leverage block solutions based on AWS EC2 + AWS EBS. They can be expensive at scale and do not support highly concurrent workloads.

Database sharding—The database divides the dataset across direct attached servers (or AWS EC2 instances) by time intervals. Timeseries-based database sharding works well when the dataset sizes are smaller. Some queries that fall within a time interval could be served well by individual servers; however, others that traverse across would be incredibly slower as the data needs to be aggregated before response. This problem worsens as models start looking to multi-year datasets.

NAS and Parallel File System-based Solution–NFS suffers from poor performance due to the limitations of the NFS protocol, which was not designed to cater to performance workloads. Parallel filesystems, which are optimized for throughput at the expense of latency and IOPs, perform very poorly for latency sensitive timeseries tick analytics.

Weka Data Platform – STAC-M3 Antuco and Kanaga Benchmark Performance

Weka, along with its partner KX Systems, recently participated in the STAC M3 benchmark with its cloud native time series database kdb+ 4.0, also known as KX Insights. This STAC M3 benchmark was run in AWS on the Weka shared parallel filesystem, proving it can meet the latency, performance, and concurrency challenges mentioned above. The results for both Antuco (1-year dataset) and Kanaga (5-year dataset) benchmarks with WekaFS testing were exceptionally good, outperforming previous records set by WekaFS with on-premises hardware and setting 6 new records overall in AWS.

The following chart provides a comparison of WekaFS on AWS with a popular open-source parallel file system, a cluster of direct-attached servers, and all flash scale-out NAS. In all cases, Weka outperformed the on-premises solutions on I/O intensive benchmarks.

The multi-year high bid benchmark query returns the highest bid price for a certain 1% of symbols over a range of years. The range for 2YRHIBID is from the first day of 2011 to the last day of 2012 and the 5YRHIBID is from the first day of 2011 to the last day of 2016. The benchmark is I/O-intensive and stresses the storage system significantly in its randomness. In the following graph, you can see that WekaFS on AWS outperformed a Lustre-based parallel file system appliance and the direct-attached servers with Optane. WekaFS was up to 20x better than the Optane solution.

Running many concurrent queries is a common trading strategy to improve time to results, and the Kanaga volume weighted bid (VWAB) benchmark tests a storage system’s ability to respond to concurrency. The VWAB test is the most demanding storage stress test, as the I/O test is highly random. In the following chart, WekaFS on AWS was compared to the Lustre appliance and the DAS with Optane solutions listed above. On the single client test WekaFS was over 40% faster than the other solutions, and when the number of concurrent threads was increased to 100 clients, Weka was over 3x faster than the other two solutions.

The results prove that high performance trading strategies can be executed in the public cloud on AWS with even more performance than bare-metal on-premises infrastructure. WekaFS outperformed the other solutions running on native cloud services, and with the ability to dynamically add additional compute and storage resources, it is an ideal platform to respond to market shifts in a highly elastic and agile manner.

Weka Data Platform – Cloud Native Performance and Data Management Solution

WekaFS was born in the cloud and is architected for low-latency, high-throughput, and high-concurrency workloads, such as kdb+. WekaFS provides the best of all three storage approaches: the simplicity of NAS solutions; the performance of block; and the scale, economics, and durability of object stores. Some of the key feature highlights are as follows:

  • #1 Fastest file system on AWS–Set IO-500 record using WekaFS on AWS at SC’2019
  • Presents a file interface, with highest IOPS and best latency at S3 economics
  • Offers on-premises and hybrid cloud on the same software bits
  • Can separate capacity scaling and performance scaling in a single namespace
  • Supports autoscaling storage for on demand performance
  • Seamlessly integrates performance tier with S3 storage in a single namespace with full back and forth movement of data
  • Supports shutting down and re-starting file system on demand–from flash to S3 and back
  • Protects cloud encrypted data with on-premises KMS
  • Has a rich protocol set (POSIX, SMB, NFS, S3, GDS) with complete data shareability

WekaFS is now available on the AWS marketplace and can service customers from as small as 24TB all NVMe flash to multi-petabyte hybrid NVMe and S3 solutions. Customers can utilize their enterprise discount programs against Weka AWS marketplace offerings. For customers enjoying WekaFS on-premises, we offer a BYOL (bring your own license) for hybrid cloud environments.

For more information on Weka’s record-breaking results go to:
Press Release: Weka Sets 6 Records on STAC-M3 with WekaFS Parallel File System on Amazon EC2

STAC Research Report: https://www.stacresearch.com/kdb210507

Additional resources: