weka blog detail third level banner

I Am AI, but Who Am “I”

April 17, 2018
Barbara Murphy
I Am AI, but Who Am “I”

I love Nvidia’s palindrome “I am AI” emblazoned across its marketing literature, but it begs the question – who exactly is driving the AI phenomenon?  It turns out that WekaIO got to collect a really insightful profile of Nvidia GPU performance customers at the recent GTC 2018 conference.  We asked attendees to tell us a little bit about their GPU experience in return for a chance to win a cool prize.  This blog post summarizes the profiles we collected from over 350 of the show attendees, while this data may not be representative of the entire show, it highlights those that expressed an interest in WekaIO’s high performance parallel file systems for AI and technical compute.

“I” Cut Across all Industries

Deep learning and AI are not exclusive to a couple of sectors – autonomous vehicles and healthcare – that have captured the media’s attention.  Our research shows that GPU performance users cut across a whole spectrum of markets, including manufacturing, life sciences, government, security, surveillance, education, financial, and semiconductors.  No single vertical dominated and several of the more traditional HPC markets actually had the highest level of maturity when it came to data under management.

“I” Run Most of My GPU Workloads On-premises

In stark contrast to most new emerging technology trends, GPU workloads have  limited cloud adoption.  Our research showed that 64% of responders are utilizing on-premises GPU clusters.  Amazon was the leader in cloud-based GPU workloads, with Google in second place and Azure a distant third place at 4%.  With the addition of new Volta V100’s to the cloud infrastructure we may see a pickup soon. Cloud is good for bursty GPU workloads and small data sets but can get expensive at scale. This is also reflected in customer responses to where their data sets reside.

“I” Use Cloud When My Data Sets Are Small

Over half the attendees profiled had less than 10TB of data under management, which indicates that they are still developing their AI practice.  But once the project matures and data sets scale to large capacity, customers are running their GPU workloads on premises. The next chart clearly outlines that as the data sets get larger fewer of the responders in that category store their data in the cloud.  While 55% of responders with less than 10TB store data in the cloud, only 17% of responders with over 1PB had data in the public cloud.

“I” am Interested in GPUs for Deep Learning

GPGPUs have been around for quite a while and have been used extensively in more traditional HPC markets to provide better visualization capabilities.  It has helped scientists explore space, predict weather and model the earth’s surface for oil and gas.  But the predominant use case that drew attendees to GTC 2018 was GPUs for deep learning. Again, little surprise here as the biggest investment in new techniques and parallel computing is being driven by industry’s desire to drive greater insight into the vast amounts of data inside an enterprise.

“I” Span Across Many Levels of The Organization 

The conference attracted a broad range of people wanting to learn more about AI. The “other” category had a cross section of people including engineers, venture capitalists, press, analysts, marketing managers and students who were there to learn more about this hot new space. Over 30% of the responders were responsible for managing or architecting the underlying infrastructure for deep learning.  The main problem facing this group is how to keep up with the increase in processing being driven by Nvidia.  The new DGX-2 is 10X faster than the prior DGX-1 and this level of compute density puts huge pressure on the infrastructure teams to keep the GPU cluster saturated with data.

“I” am Big Data

The final cut of our data revealed that attendees from the Retail sector had consistently the largest data set for their vertical.  75% of the retail customers profiled had more than 100TB of data in their analytics cluster.  The next largest category was financial followed by cloud providers. No big surprise here, these verticals have been collecting data on customers for a long period and have very mature data sets to provide analytics insights. As Andrew Ng said “It’s not who has the best algorithms that wins. Its who has the most data”.


Modernized next-generation workloads require a high throughput I/O.

Learn more on how to get off your NAS to accelerate data transformation.


Learn More