VIDEO

Driving AI Innovation While Scaling Capacity with Meta

WEKA’s Val Bercovici speaks with Meta’s Elisa Chen at AI Infra Summit 2025 about how to balance rapid AI innovation with hardware procurement cycles, sharing strategies for GPU utilization, workload optimization, and regional capacity planning.

Speakers:

  • Val Bercovici - Chief AI Officer, WEKA
  • Elisa Chen - Data Scientist, Meta’s AI Infrastructure Team

Below is a full transcript of the conversation, which has been lightly edited for clarity.

Transcript

00:00

The Gap Between AI Innovation Speed and Hardware Procurement Cycles

Val Bercovici: I’m Val Bercovici, Chief AI Officer at WEKA.

Elisa Chen: And my name is Elisa Chen. I am a data scientist at Meta on the AI infrastructure team.

Val: So you spoke today, Elisa, and why don’t you tell us what you spoke about? Because it was a really fascinating introduction you were giving earlier on.

Elisa: Yeah, thank you for having me. Earlier today, I was giving a talk about the gap that we currently face between AI innovation and the long procurement cycles with hardware. So maybe to put this more concretely, there’s a huge challenge in how do we balance AI innovation that is moving so quickly—models get iterated on a monthly and even weekly basis—but hardware procurement, from fulfillment to actually making it production-ready, takes a very long time. So how do you balance these two acts? How do we make sure that capacity planning is keeping at the right pace with AI innovation?

Val: I love this topic because it’s so important, and it’s one I really haven’t dived into deeply at all. So definitely a chance to learn more here today.

01:17

How to Predict AI Capacity Requirements Without Perfect Foresight

Val: Do you have to be a bit of clairvoyant to predict the future, or is there a bit of a science to understanding there’s going to be a lot of variability in the algorithms, and yet to your point at this scale, you’ve got to get the infrastructure right, you’ve got to get the supply chain right, and you’ve got to line it up in time.

Elisa: Yeah, good questions. There’s a lot that we don’t know in today’s world, but what we can do is start setting up the correct data foundations, the right telemetry, and even defining the right metrics and benchmarks that we can goal against. For instance, what is good enough? We don’t necessarily always need the top of shelf machinery to power inference.

Val: Such a lazy answer to just buy the most expensive biggest, right?

Elisa: Which is tempting because we do want to be at the frontier and we do want to be the most efficient, but that’s not always needed. So what is the strategy that we want to deploy? That’s something that we can think about separately. And given that we have good measurements—which in itself is already really challenging because you don’t have all the definitions necessarily in mind yet. For instance, how do you think about capacity ROI? One machine that is used to train multiple models and these models have different performances, and how do they translate to business value?

Val: There’s no observability in the industry really that’s mature for GPU and for these workloads yet.

Elisa: Yeah, so it’s a really tough problem. And without the data foundations or just the right measurement—I would say instrumentation—it’s hard to make those decisions. As a data scientist, you need data, good data, quality data to make decisions, which doesn’t necessarily exist. So what are some other strategies that you can deploy?

03:02

Elastic Resource Strategies: Using GPU-as-a-Service for Flexible Capacity

Elisa: There are some other levers, I would say, in probably everyone’s back corner that people can use, such as—given that there are gaps in this knowledge and you don’t necessarily have all the information for predicting what you might need in the future—for immediate needs, there are ways that you can go around this through elastic resource supplies, using GPU-as-a-service. These offer some flexibility for your capacity planning.

But on top of that, you can also start thinking about efficiency as another lever. One thing about the industry is that we are constantly figuring out ways to become better at what we do.

So now we have hardware that is tailored to different workloads. You have your H100s that are dense GPUs used for training foundational models, then you have your A100s that are probably better suited for more lightweight models or fine-tuning purposes. So doing that matching between the right hardware as well as your workload is very crucial.
I think you learn to become more efficient with your machinery in itself, and then you learn that, oh, I can actually train this a lot faster, this model specifically with this hardware. So now I can free up this for other use cases. That’s another lever that you can use.

And lastly, I just want to mention that I think dynamic quota allocations is extremely important. So I don’t think there will ever be a case where you are using your capacity at 95% utilization. That is a dream state that most companies cannot achieve. So understanding which teams are underutilizing their capacity and allocating that to maybe more high-usage teams I think is also a good strategy to keep in mind.

Val: So many follow-up questions.

05:06

GPU Elasticity vs CPU Cloud Elasticity: Key Differences for Large Models

Val: I’m curious—elasticity. I have a lot of experience in the cloud, the CPU cloud world. Elasticity in the GPU world is very different with the giant model sizes and large datasets than the elasticity in the CPU world. What would be some of your main lessons learned in these differences?

Elisa: Yeah, I think that’s a good point as well to make. I do think there’s a lot of dependencies within the machinery as well. You can’t just separate, let’s say, GPU from memory or /IO or storage—all of these have to play together in this tightly coupled way. Even if you have, let’s say, more GPUs, it doesn’t mean that you can free it up necessarily for other workloads. So it is a little tricky, I will say. I think there’s probably some work that can be done in terms of how do you build your workloads? How do we think about using these clusters and for what workloads, and try to keep them as separate as possible, but obviously acknowledging that that is a very tough problem to solve.

Val: And I saw this hands-on—I was at a really fun meetup, a PyTorch meetup at Meta on the campus there about six months ago, and it was one of the first examples I’ve seen publicly, and now it’s really widely known, around disaggregated prefill and decode. And I have, at scale, very high capacity or a lot of capacity management for prefill clusters and distinctly a lot of separate capacity managers for decode clusters. And at Meta’s scale, that was necessary even six months ago, and this is very young and embryonic, but it’s becoming a best practice right now. So that was just one tangible example for sure that I saw at Meta.

Elisa: Yeah, that’s very interesting.

Val: Yeah, very cool.

06:43

How User Metadata from Instagram and WhatsApp Informs AI Model Training

Val: I use Instagram a lot, and then WhatsApp. A lot of these really cool AI features—you’re getting a lot of data, metadata you’re able to collect from that. How is that informing these metrics you talked about?

Elisa: Yeah, you know, that’s the user side. I think this is more probably related to—

Val: Does it feed into or is it so disconnected that you don’t see that sort of roll-up of that data?

Elisa: I’m trying to think—we take data privacy very seriously.

Val: Totally anonymized metadata in this case, not the data itself.

Elisa: Absolutely, yeah. So that would probably be more related to the training side itself. So this is information that would inform our model improvements. Obviously it depends on—we don’t use every data piece that we get. It will be selected, I would say, filtered for different use cases.

Val: And I’m thinking from a capacity manager’s perspective, you know, 9 to 5 in India time, there’s a lot of WhatsApp usage, right? So that means we need to have a certain amount of capacity for the AI features we want to offer in that region. That’s the sort of the kind of metrics I’m just imagining here as you’re describing it.

Elisa: Yeah, that’s a good point as well. Figuring out your capacity load for different regions or even just peak hours—it’s very challenging. And different regions, I think, also have just a different set of requirements because there are also policies. With the EU, for instance, you have your GDPR, so you need to make sure that your capacity is compliant with the requirements there vs. other regions.

Val: That’s a great point. Capacity planning is not global. It’s not homogeneous, universal. It’s very much driven by policy, which is driven by jurisdiction. So it’s not the same. It’s not cookie cutter, right?

Elisa: It’s not cookie cutter.

Val: At Meta scale, for sure. Very cool.

08:43

Alternative Energy for Data Centers: Fuel Cells as Cost-Effective Power Solutions

Val: What other cool topics have you seen at the show today that caught your attention?

Elisa: Yeah, I have been talking to a few other players in the field. I’m actually not really familiar with the space, but I learned how there are various ways that you can actually power your data center. So I learned more about fuel cells, I believe. These are new ways that you can power your data centers that are a lot more cost-effective and a lot more scalable and an alternative to the grids that we know.

Val: Especially in agile, which is a really big problem.

Elisa: Yeah, and it’s interesting to see how this technology that has been present in other industries can be leveraged for AI.

Val: Exactly. That’s cool. Right now it seems like when you really zoom out, the ultimate bottleneck is energy. And so you mentioned efficiency earlier on—the token per watt, classically. That efficiency, with very hard energy budgets, even if they’re gigawatt scale they’re still fixed. That seems to be the ultimate benchmark that we all have to measure ourselves against.

Elisa: That is true. And energy isn’t cheap by any means.

Val: No, not at all. And again, it’s not consistently priced anywhere. It’s variable by time of day, by region and so forth. So capacity planning for that must be a full-time job.

Elisa: Yeah, we have dedicated teams for that.

Val: Awesome. All right, well thank you, Elisa. This is really a fascinating insight into what it’s like—and again, this under-discussed topic of capacity planning and management for AI, particularly at the scale we all aspire to someday.

Elisa: Yeah, well thank you so much for having me.

Val: It’s a pleasure.

Like This Discussion? There’s More!

Hear additional insights from Meta’s Elisa Chen and other AI industry leaders during a panel discussion hosted by WEKA CMO Lauren Vaccarello at AI Infra Summit 2025.