Accelerating Machine Learning for Financial Services
Shimon Ben David. October 6, 2020
Top Use Cases in Machine Learning (ML) for Finance
The development of high performance computing and the reduction of cost per storage has led to the introduction of a number of data intensive workloads in banking, hedge funds, trading companies and stock exchanges. Some of these data intensive processes include:
- Quantitative analysis – Analyze using complex statistical models that are not aware of company specific information such as board members, executive team, location etc.
- Monte Carlo simulator – Model the probability of an outcome in a process that cannot be easily predicted due to random variables. Requires testing the model with a high volume of random numbers to generate a graph of probabilities by which one can make a decision.
- Predictive analytics – Use of data, statistical algorithms and machine learning to analyze the likelihood of a future outcome based on historical data.
- Backtesting – Testing the accuracy of a strategy or a model based based on ex-post data to develop confidence in the model moving forward.
- Fraud detection – The emergence of multi-channel banking and the sophistication of fraud detection leads to the need to develop self-learning detection models that inject enormous amounts of behavioral data, passive data, 3rd party data etc.
The introduction of Time series databases
Financial institutions need to ingest massive amounts of data on a frequent basis and at scale. Time series databases are purposely built to support fast high frequent queries at scale as compared to rational or NoSQL databases. Time series solutions include KDB+, InfluxDB and TimescaleDB.
What are the challenges of machine learning based workloads in financial services
We work with many financial institutions and see the following key challenges:
- Speed – every investment strategy has a limited lifetime. Faster analysis can results in faster trading command and increase profitability
- Data sources – Need to ingest multiple data sources into their models for the purpose of analysis. Data can be in different formats (e.g XML, PDF, XBRL).
- Capacity – Data ingested can be in TBs per day. More data, more sources leads to better models
- Data quality – Data needs to be useable, secure and management
- Data silos – Different types of data come from different sources into various systems and need to be integrated into a single source.
The usage of FPGA in financial services
Field Programmable Gate Arrays (FGPA) is a specialized electronic circuit designed to rapidly manipulate and alter memory. It can be programmed for specific tasks and accelerate your simulations of near realtime decision making such as regulatory compliance, sentiment analysis and risk modeling . It could do a small amount of tests very fast. We see more and more domains such as risk escalation, sentiment analysis, regulatory fraud, and more.
Accelerating machine learning workloads using Weka
Weka is the market leading shared parallel file system solution for quantitative analysis, risk modeling and Time series database use cases . Weka has performed multiple benchmarks for how it performs on various workloads and file sizes (STAC, SpecFS, IO-500) . Attached is the most recent IO-500 performance benchmarks.
Learn more about WekaIO for Financial Services Analytics.