Accelerated DataOps with Weka AI
Edge to Core to Cloud Pipelines – Part 3
Shailesh Manjrekar. April 21, 2020
Shailesh Manjrekar, Head of AI and Strategic Alliances at WekaIO, shares his perspective on Accelerated DataOps with Weka AI in the final blog of a three-part series titled “Accelerated DataOps with Weka AI for Edge to Core to Cloud Pipelines.”
Weka is excited to be launching Weka AI, a transformative solution framework for Accelerated DataOps. In this final entry of my three-part series leading up to our launch of Weka AI, I will explain how new workloads are driving the need for modern underlying architectures and also explain how Weka AI enables Accelerated DataOps.
Weka AI for Accelerated DataOps
Weka AI is architected to enable Accelerated DataOps by solving these storage challenges and delivering production-ready solutions with Reference Architectures and Software Development Kits (SDKs). Weka AI empowers Accelerated DataOps by breaking storage silos, enabling convergence across HPC, HPDA and AI workloads, Business Intelligence (BI), and Artificial Intelligence (AI) on the same storage substrate. Weka AI delivers operational agility with versioning, explainability, reproducibility, governance, and compliance with in-line encryption and data protection. Working with technology alliance partners, Weka AI provides a production-ready solution where the entire AI data pipeline workflow — from data ingestion to batch feature extraction, to training, to hyperparameter optimizations, and finally to inference and versioning — can be run on the same storage platform, whether running on-premises or in the public cloud.
Explainable AI (XAI) – Integration with Valohai’s Deep Learning Pipeline Management
Explainable AI is paramount when it comes to use cases such as autonomous driving, healthcare, and genomics as they have social impact. Deep Neural Networks (DNNs) for the most part are black boxes comprised of several hidden layers. The only way an experiment can be explained is by examining the dataset that it was trained on, audit trails, and lineage.
Weka demonstrates this with our integration with Valohai – a Deep Learning pipeline management system. This demo outlines how Valohai and WekaFS are integrated in an AWS Virtual Private Cloud (VPC) to run a pipeline for image classification, using the popular CIFAR-10 database and TensorFlow model. Data scientists can use the popular Jupyter Notebook or the Valohai GUI for pipelines to do:
- data transformation and model training
- hyperparameter optimization and finally
Valohai DLMS seamlessly integrates Weka’s powerful snap2object capabilities, where the data science experiment is version-controlled with the code, data, audit logs, and lineage and can be easily reproduced and explained whenever needed. Additionally, in-line encryption capabilities of Weka, with leading Key Management Systems such as HashiCorp Vault, provide data security, compliance, and governance.
Solution Reference Architectures Powered by Weka AI
Weka AI is based on proven deployments with customers and Weka Innovation Network (WIN) Partners and addresses several vertical use cases:
- ADAS – semantic segmentation for annotating Automated Driver Assistance System datasets
- Deep Learning pipeline management solution with Valohai
- Life Sciences – next-generation sequencing solution with Parabricks and HPE
- Healthcare – integrated medical imaging solution with NVIDIA Clara
- FSI – STAC M3 testing with Kx Systems and HPE
- Retail – RAPIDS and BlazingSQL-based solution, fraud analytics
- Oil and Gas – HPE AI DataNode with Weka and Scality
- WeKa AI Reference Architecture for training and inferencing
- Public sector HPC solution with Penguin.
This list will continue to expand as we work with additional partners and use cases.
Weka AI benefits new personas as follows:
Chief Data Officers (CDOs), Chief Analytics Officers (CAOs), and Line of Business Data Scientists
- Reducing epoch times from days to hours, while delivering the lowest inferencing times and maintaining the highest images/sec benchmarks. This is enabled by industry-best GPUDirect storage performance of 80 GB/sec to a single DGX-2 client
- Explainability and reproducibility for experiments using instant, space-efficient snapshots
- Hybrid workflows – Dev and Test experiments in the public cloud and seamless movement to on-premise for production
- Data Compliance and Governance with in-flight and at-rest encryption.
Data Engineers and IT Leaders
- Best TCO by leveraging NVMe flash for performance and HDD object for capacity, with built-in data protection
- Eliminating silos and multiple copies, but providing a single storage platform for the entire data pipeline
- Best agility with data management across the edge, core, and cloud
- Best scalability with up to EBs of storage with trillions of files across directories and billions in a single directory
- Ease of management, a single point for support, and easy-to-consume as small, medium, and large bundles.
Weka, an NVIDIA Partner Network Solution Advisor, is uniquely positioned to anticipate the needs from market transitions and provide transformative solutions. Weka, through delivering these solutions, makes it easy for our customers to monetize their data, achieve faster time-to-market, and gain competitive differentiation with the best TCO. Weka AI is a transformative solution framework that caters to these market transitions and delivers compelling benefits to Line of Business personas.
This concludes my three-part series on “Accelerated DataOps with Weka AI for Edge to Core to Cloud Pipelines”.
To learn more about WekaIO’s solutions for Artificial Intelligence and Data Analytics, click here.