VIDEO

GPU Capacity Planning and Compute Market Dynamics

Women AI leaders at AI Infra Summit discuss capacity planning for the future while addressing current-day market realities.

Women Leaders in AI Infrastructure Panel Speakers


  • Lauren Vaccarello: Chief Marketing Officer, WEKA
  • Elisa Chen: Data scientist at Meta with over five years in AI infrastructure, building the foundation that powers machine learning models serving ads to hundreds of millions of users daily
  • Carmen Li: Founder and CEO of Silicon Data and CEO of Compute Exchange, transforming global compute markets through data transparency and creating an independent marketplace for GPU compute trading
  • Rebecca “Bink” Naughton: Leading data center capacity strategy at Lambda, with three decades of experience across Google, Meta, Yahoo, and Microsoft, having built infrastructure for everything from Meta's first AI research cluster to multi-billion dollar supercomputers

Transcript

00:00

Navigating GPU Shortage: Flexibility in AI Compute Planning

Lauren: Carmen, as you think about GPU capacity and how hard it is to get and find GPUs right now, what are you seeing in your world?

Carmen: Yeah, I think (Elisa and Bink) had really good points. I love that everyone keeps saying this is moving too slow. I was like, this moves really fast for me! I feel like Huang’s Law every year—I think it’s pretty fast and we keep expecting a new chip.

So the way Silicon Data does it is we publish GPU indexes on Bloomberg and Refinitiv. They’re actually going to be futures and options trading on those indexes. We’re six months delayed—obviously we need data. Every year we’re popping up a few indexes to keep up and catch up. And then there’s AMD, there’s newer ones from NVIDIA and other providers. So there’s a lot of things going on.

Every single layer, every single layer of GPU changes will trigger the code libraries and everything else up the stack will change. It’ll get better, but it takes time. So I feel like it’s moving very fast.

Obviously most of our Compute Exchange clients are Series B, Series C, Series D startups using tons of compute. They’re looking for a lot of different nodes, not just on-demand compute, testing one GPU at a time. So that requires planning. So that’s my two cents here.

02:08

GPU Utilization Best Practices: Balancing Planning and Agility

Lauren: Yeah, and it’s so interesting because so much of it requires forward-thinking planning. We know what’s going to happen in three years and five years and six months, but so much of the reality of the world that we live in is we don’t really know what next month is going to be.

How should people in the audience start thinking about, yes, I need planning for five years, but I need agility because maybe I’m doing a ton of training today and then all of a sudden I switch to inferencing, or my user base explodes. How should they start thinking about that?

Elisa: I think from my perspective, number one, you will never be at like a 95% utilization for your capacity. That’s like a dream state that you will probably never reach. So there will always, I think, be a gap between what you’re actually using and resources that are sitting idle.

But I think there are a lot of use cases for those resources as well. You’ll run into operational spikes. You’ll maybe have projects that are below the line or new experiments that you can run. There’s always, I feel like, additional use cases for these resources that are not necessarily used immediately, but will serve—can serve as I mentioned previously—you can build your own elastic pool that can serve these ad-hoc workloads as well. That’s one lever that you can use to use your resources more efficiently.

03:36

Understanding Peak Load vs. Actual Performance Needs

Bink: Yeah, I’ll double down on that. The one thing that has proven true over and over again is that there’s peak load, and then there’s how fast you need to deliver: How wide does the pipe need to be in terms of data delivery and computation? And then there’s, okay, what can I actually live with?

And that’s going to be—as the applications are evolving, and given everything that’s happening with the algorithm improvements that are being discussed and all the optimizations that the various contributors to the hardware are doing—this is going to continue to change and efficiency will continue to increase. But you’re always going to have that lever of being able to investigate and interrogate your workload and your needs for how fast that computation needs to run.

04:30

The Two-Tier GPU Market: Enterprise Underutilization vs. Startup Scarcity

Carmen: Yeah, I think there’s two pools of clients we’re talking about here. One is enterprises. They may be 60% utilized. I literally had a conversation yesterday where the CIO for one of the largest Fortune 500 companies said, “How do I monetize my GPU clusters when they’re not in use?” And I said, “What do you want to do?”

“I also have CPUs too.”

“Don’t even worry about CPUs.”

“Can I do multi-tenant?”

“Wait, what? You want to solve the world’s hardest problem right now?”

It is a lot.

So I think for the large companies they do have the underutilization problem. I was like, you can run benchmarks, I don’t know, we can talk about this.

On the other side, you have startups who need more compute. They couldn’t source compute fast enough. No one will rent them B200s month-to-month. So think about all those imbalances of demand-supply curves we’re talking about.

So that’s why I keep talking about: Hey, reserve what you really need to get done. You kind of know that, right? Do reserve contracts for the things that you know. Make sure it’s flexible. Can you do on-demand? Can you do multi-cloud? Can those nodes—can you trust them to spin up nodes very quickly? And then also have the transparency. Before you take on those nodes, can you just verify before you bring on the workloads, transfer the workloads? So all those things I think can be critical.

05:58

Hardware Quality Verification: A Critical Infrastructure Challenge

Bink: Yeah, I’ll touch on the verification piece. One of the things that Lambda is really priding itself on is delivering functional AI hardware. It has proven to be one of the more difficult things to land with quality. And that’s something that we’re going to need to continue to watch—not just us, but as an industry—because it matters a lot in terms of that you have a facility ready to accept equipment to that equipment being used.

The quality of your hardware is a very big piece of that, and being good at making sure you can mature that quickly is a superpower.

06:35

Third-Party Verification and Transparency in GPU Markets

Carmen: Can I double-click on that? So one of the things I think smaller companies are dealing with right now is the transparency piece. One of the things is to say, “Hey, I’m going to run MLPerf on my cluster. I think I’m great. Look at my results, it’s great, right?” But again, it’s not third-party verified. Or they can donate a cluster for a month to some third party and they run it. And then you don’t have transparency on that. But a month is a long time for expensive machines, and you’re losing the money you’re not making.

So one of the things we do—I don’t promote this, but some banks use us to do machine-level verification– so let’s say I’m going to refinance my 100 nodes. Then Silicon Data comes in, we verify it. It’s on-premise SaaS, it’s contained within the container, it can run in 90 seconds. We’ll tell you all the UID, all the information, and banks are like, “Okay, it’s the property. Verified. Check, check, check. This is the performance. This is the potential decay curve.” And that’s how much money I think you can be refinanced. So things like that I think would be very helpful just transparency-wise.

Like This Discussion? There’s More!

This clip was taken from a longer conversation at AI Infra Summit 2025, where Lauren, Elisa, Carmen, and Bink covered a broad range of topics for anyone building the AI infrastructure of the future.