From Pi to AI

Colin Gallagher. March 14, 2022
From Pi to AI

There was a “eureka” moment when we realized that the next release of our data platform was version 3.14 – AND it was scheduled for March. We knew we had to announce it today in recognition of Pi Day, but we didn’t want to celebrate our “Pi” release with anything banal like a pie eating contest or having a Pi recitation competition. Instead, we want to talk about how the quest to calculate Pi applies to the challenges of AI.

Pi represents infinite possibilities and challenges us to find meaning in seemingly random numbers. For four thousand years, mathematicians have been searching for ways to efficiently calculate Pi. While Pi may be irrational (it’s digits go on forever without showing a consistent pattern), it contains all sorts of interesting sub-patterns:

  • At position 17,387,594,880 you find the digits 0123456789 in order
  • At position 60 you find these same ten digits in a jumbled order
  • At position 768 there are six 9s in a row
  • Our VP of Product Marketing’s birthdate occurs 4 times in the first 200M digits of Pi

In many aspects AI is similar to the search for Pi – it is the search for meaning and underlying relationships in data. Even the methods that have been developed for AI mimic the ways in which Pi has been calculated – starting with brute force “exhaustion” calculations to eventually being catalyzed by the advent of computing to radically simplify the process.

A Better Data Pipeline or Is It a Data πpline?

A well running data pipeline – or, for today, a data πpline – is key to successfully deploying AI to transform business processes. A data pipeline is the collection of steps to reliably make use of data for analytics: from copying data from various source locations, to reformatting data to make it usable, or merging it with data from other sources. Each step of a data pipeline generally requires separate software and storage tuned for that step. But data silos and delays become inherent as data is copied, and the management challenges of disparate platforms make it hard to keep AI data pipelines fully utilized.

While not quite as transcendental as the quest for Pi, the WEKA® Data Platform for AI can radically transform your data pipelines and accelerate what’s possible by giving you the ability to rapidly process large amounts of data with lower epoch times to achieve faster time to insights.

So… What’s New in WEKA’s Pi Release?

Now that we have opined about Pi and data pipelines, let’s get down to brass tacks on what’s new in WEKA 3.14.

We focused the latest release in our 3.x series on driving greater interoperability and more choice, enabling better sharing of resources as customers scale out their AI deployments, and enhancing our Zero Copy architecture with even more multiprotocol support.

Evolving as Fast as Our Customers’ Needs Do

One of the key benefits of being a software-defined solution is, as noted above, platforms are designed to be future proof and extensible, to evolve with customer needs so they can take advantage of new innovations, and so their business and research can benefit. Late last year, several of our customers asked for more choice in network adapters to help provide a workaround for supply chain challenges, as many were waiting for extended periods of time on servers because of industry NIC shortages.

Today, WEKA is happy to announce that we now support three new cards: Mellanox ConnectX-6 Dx, The Intel E810 series, and Broadcom 57810s, giving our customers more variety and the flexibility to choose servers that can be deployed according to their schedules. In some cases choosing a new NIC can decrease lead times from months to weeks.

We have also updated the WEKA Data Platform to be able to take full advantage of more and more CPU cores by using multiple containers running our WekaFS software to take full advantage of our partners’ increasingly powerful servers. This enhancement will enable the WEKA Data Platform to drive even more performance and scalability – this was key to our recent leading results in the SPEC Storage 2020 benchmark. It also provides the means to increase scalability to 8000 total nodes, improving on how a large cluster can be created.

Helping AI to Go Mainstream

As the global pandemic accelerated the digital initiatives of nearly every enterprise on the planet, AI is rapidly becoming a strategic imperative for modern businesses and research organizations. According to Gartner Inc.: “By 2025, AI will be the top category driving infrastructure decisions, due to the maturation of the AI market, resulting in a tenfold growth in infrastructure requirements.”

As customers continue to consolidate multiple AI workloads (and multiple AI data pipelines) onto the same WEKA system, they are asking for more granular controls to help reduce the impact of one workflow over another. To enable better sharing of resources, we now offer client side QoS to throttle bandwidth from any client into the Weka system. When you combine this QoS functionality with granular capacity quotas and organizational role restrictions, you have a powerful set of tools to help manage resources on the Weka system.

Sharing Data Across Protocols Equally

A true data platform should be able to deliver the best support for multiple types of workloads without the need for tuning, re-configuration, or partitioning your system. Storage systems, on the other hand, are usually architected to excel in a limited number of performance dimensions while others languish, requiring different storage arrays for different workloads. The WEKA Data Platform performs across all dimensions without the need for tuning or re-configuration for multiple workloads across the entire data pipeline.

With WEKA 3.14, our multiprotocol support gets even better with interoperability for S3 workloads. We now support the creation of S3 buckets in multiple file systems and the ability to create an S3 bucket from an existing file system sharing quotas and user IDs. We already offer the ability to share the same filesystem across other protocols such as NFS, SMB, and POSIX.

“Pi” for Now

The patterns and uniqueness of Pi can seem arbitrary and daunting just like the information trapped within your data. At WEKA, we find the limitless equations Pi presents to be exciting – harnessing its infinite potential with AI and ML represents the next frontier for innovation and business transformation. We believe the way to do this is through the power of artificial intelligence and machine learning. And with every release, we strive to better help our customers organize and make sense of infinite amounts of their seemingly random data.