“We’re willing to be misunderstood for long periods of time.”
-Jeff Bezos, 2011, Amazon Annual Shareholder meeting

Actually, based on the amount of generative AI activity coming out of the recent AWS Summit in New York, that period of time is more like six months. The spate of GenAI-focused announcements and launches from the likes of Microsoft-funded Open AI and Google during the first half of 2023 has put Amazon under a lot of pressure. So it’s no surprise to most industry insiders that Amazon used one of its largest customer forums to lay out its own offerings and strategy for competing in the race to win generative AI workloads.

For those who’ve been watching, Amazon has been signaling this for some time. In late June, AWS announced a new program with $100M in funding for GenAI startups building on AWS. In July, Amazon announced a new Amazon-wide machine learning organization forming under Deepak Singh, a longtime lieutenant of Amazon CEO Andy Jassy. Then, as AWS Summit was kicking off, the Financial Times posted an interview with AWS CEO Adam Selipsky, which set the tone for the entire week.

Here’s a rundown of the action at this year’s AWS Summit that gives us a look at what’s next for generative AI in the cloud:

Developer Tools for Building Generative AI Workloads

AWS updates on their ML/AI stack made up more than 75% of the main keynote. The launch highlights included:

Amazon Bedrock is Amazon’s service to make LLMs from AI21 Labs, Anthropic, and Stability AIavailable to customers via APIs. This is Amazon’s answer to Azure OpenAI, and is available as a limited preview. Amazon’s strategy here is meaningfully different, providing customers access to LLMs from Stability AI, Cohere, Anthropic, AI21, and Amazon and integrating with Amazon Sagemaker, which is growing in popularity with developers.

Amazon Titan is Amazon’s own LLM and will be available to developers directly through the Amazon Bedrock APIs. The story behind the Titan LLM is interesting and notable in that (according to Amazon) it has been under development internally since the mid-2000s and started out as the algorithms behind the recommendation engine used in the original Amazon.com storefront. So Titan in theory carries almost 20 years of development experience behind it. This is very consistent with AWS’s longtime strategy to position itself as the provider with the most experience in the space. It’s also consistent with AWS product strategy to productize internal tools when market opportunities open up. This also supports Selipsky’s original thesis that there is no one model to rule them all.

Beyond Bedrock and Titan, Amazon had a slew of new updates all focused on AI use cases. Amazon HealthScribe is AWS’s first foray into verticalized offerings that incorporate Generative AI. We expect to see many more of these in other industry verticals over the next 12-18 months and that they’ll be a big focus of news at AWS re:Invent in November. Generative BI with QuickSight Q is a super interesting example of generative AI for a real business use case – in this case, generating data visualizations using simple natural launch questions (aka, no need for complex SQL queries).

Infrastructure for Generative AI workloads

On the infrastructure side of the house, AWS’s VP of Data and Machine Learning Swami Sivasubramanian focused on AWS’s history of infrastructure innovation to support ML/AI workloads – mainly focused on custom silicon and GPUs.

Highlights included:

  • First to bring NVIDIA GPUs to the cloud on a pay-as-you-go basis (for the fact-checkers, November 2010 was the launch of EC2 cg1.4xl instances with NVIDIA Fermi GPUs)
  • Launched Amazon EC2 P5 instances, powered by NVIDIA H100 Tensor GPUs for complex LLMs that have hundreds of millions of parameters
  • Updates to Amazon purpose-built silicon, including Amazon Inferentia and Amazon Trainium, with headline messaging focused on price-performance benefits. It’s worth noting AWS does not compare itself directly to NVIDIA GPUs to try to be a good partner; however, the implication of these offerings is that AWS is coming for those workloads where NVIDIA is strong today: Tranium delivers “up to 2.3x higher throughput and up to 70% lower cost per inference than comparable EC2 instances.” Inferentia delivers “faster time to train while offering up to 50% cost-to-train savings over comparable Amazon EC2 instances.”

Vendors are scrambling to provide a credible alternative to NVIDIA GPU accelerators, which are notoriously hard to find, not to mention very expensive. AWS is trying to position itself as the provider with the most experience in the ML/AI space in the same way they have positioned itself for many years as first in the cloud. They are also trying to position Inferentia and Tranium instances as credible alternatives to NVIDIA GPUs. This is the same strategy other cloud providers are following with programs like Microsoft Athena. The notable difference here is that Amazon does have a head start here in that Tranium and Inferentia are now both GA products, though with limited regional availability.

Clearly the race is on for AI workloads in the cloud. While Microsoft and Google had splashy launches during the first half of 2023, Amazon has now laid out a pretty clear roadmap of its own AI strategy – focus on developer tools for AI workloads and infrastructure to build, train, and run the models themselves. The notable exception is Amazon has yet to offer a clear alternative to ChatGPT and Google Bard. In fact, if you dig into the keynote remarks, it’s not clear whether this is a focus.

But in the marathon to win the generative AI battle, we’ve barely hit the 5K mark. The first 5K was a sprint, so we’re eagerly anticipating the next leg of the race coming up in fall 2023 with another slate of big announcements during Google Cloud Next, Microsoft Ignite, and AWS re:Invent.

Explore WEKA for Generative AI