VIDEO

Driving AI Innovation While Scaling Capacity with Meta

I'm Val Bercovici, Chief AI Officer at WEKA. And my name is Elisa Chen. I am a data scientist at Meta on the AI infrastructure team. So you spoke today, Elisa. And why don't you tell us what you spoke about? Because it was a really fascinating introduction you were just giving me earlier on. Yeah. Yeah. Thank you for having me, by the way. Earlier today, I was giving a talk about the gap that we currently face between AI innovation and the long procurement cycles with hardware. So maybe to put this more concretely, there's like a huge challenge in sort of like how do we balance like, you know, AI innovation that is moving so quickly. Like, you know, models get iterated on like a monthly and even a weekly basis. But hardware, like procurement from like fulfillment to actually like making it production ready takes a very long time. So how do you balance these two acts? How do we make sure that sort of capacity planning is keeping at the right pace with AI innovation? I love this topic because it's so important and it's one I really haven't dived into deeply at all. So definitely a chance to learn more here today. Do you have to be a bit of clairvoyant to predict the future, or is there a bit of a science to understanding there's going be a lot of variability in the algorithms? And yet to your point, at this scale, you've got to get the infrastructure right, you've to get the supply chain right, you've got to line it up in time. Yeah, yeah, good questions. Know, there's a lot that we don't know in today's world, but what we can do is like start setting up like the correct data foundations, the right telemetry, the to like the right even like defining the right metrics, benchmarks that we can like goal against. For instance, like what is good enough? We don't necessarily always need the top of shelf like machinery to power like inference, you know, we have to think about what are the lazy answer to just buy the most expensive biggest and everything right. So yeah. Which is tempting because it's like we do wanna, you know, be at the frontier and we do wanna like be the most efficient, but that's not always needed. So what is the strategy that we want to deploy? That's something that we can think about separately on its own. And given that we have good measurements, which in itself is already really challenging because like, you know, you don't have all the definitions necessarily in mind yet, like for instance, how do you think about capacity ROI? Yeah. Yeah, like, you know, one machine that is used to train like multiple models and these models have different performances. And how do they translate to business models? Observability industry really that's mature for GPUs or for these workloads yet. Yeah. Yeah. Yeah. So it's a really tough problem. And without, like, you know, the data foundations or just, the right measurement, like, I would say instrumentation, it's hard to make those decisions. Like, as I say, as a data scientist, you need data, good data, quality data to make decisions Exactly. Which doesn't necessarily exist. So, you know, what are some other strategies that you can deploy? So there are like some of other like levers I would say in probably like everyone's back corner that people can use such as, know, given that there are gaps in this knowledge and you don't necessarily have all the information for predicting what you might need in the future. For immediate needs, there are ways that you can go around, like elastic resource supplies, using GPU as a service, these offer some flexibility for your capacity planning, but on top of that you can also start thinking about efficiency as another lever. So, you know, one thing about the industry is that we are constantly figuring ways to become better at what we do, so now we have hardware that is tailored to different workloads. Like you have, you know, your H100s that are like dense GPUs used for training foundational models. Then you have like, you know, your A100s that are probably better suited for like more lightweight models or fine tuning purposes. So, you know, doing that like matching between the right hardware as well as your like workload is very crucial. Where was I going with this? Yeah, so efficiency. So I think you kinda like learn to become more efficient with your like, you know, machinery in itself, and then you learn that, oh, I can actually train this a lot faster, this model specifically with this hardware. So now I can free up this for other use Yeah, that's another lever that you can use. And lastly, I just want to like mention that I think dynamic quota allocations is extremely important. So I don't think there will ever be a case where you are using your capacity at ninety five percent utilization, that is a dream state that like most companies cannot achieve. So you know, understanding which teams are underutilizing their capacity and allocating that to maybe more high usage teams I think is also a good strategy to keep in mind. So many follow-up questions there because I was just Yeah, I'm sorry, I just rambled I I'm curious, Elasticity, I have a lot of experience in the cloud, CPU cloud world. Elasticity in the GPU world is very different with the giant model sizes and large data sets than elasticity in the CPU world, so to speak. Would have been sort of some of your main, you know, lessons learned in in these differences. Yeah. Yeah. I think that's a good, like, point as well to make. Like, I do think there's, like, a lot of dependencies within, like, the machinery as well. Like, you can't just, like, separate let's say like GPU from like you know memory or like IO or storage, like all of these have to play Very tightly coupled, like even if you have let's say like more GPU, it doesn't mean that you can feed up necessarily for other workloads, so it is all tricky I will say. Yeah, I think there's probably like some work that can be done in terms of like how do you build your workloads, like you know how do we think about using like these clusters and for what workloads, and try to keep them as like I guess separate as possible, but obviously acknowledging that that is a very tough problem to solve. And I saw this hands on, I was at a really fun meetup, a PyTorch meetup at Meta on the campus there like six months ago, and it was one of the first examples I've seen publicly, and now it's really widely known, around disaggregated prefill and decode. And I have at scale a very high capacity or a lot of capacity management for prefill clusters and distinctly a lot of separate capacity management for decode clusters. And metascale, that was necessary even six months ago. And this was very young and embryonic, but it's becoming a best practice right now. So that was just one tangible example for sure that I saw at Meta. That's very interesting. Very, very cool. So in terms of I use Instagram a lot. I see a lot of these cool and then WhatsApp, who doesn't use WhatsApp. So a lot of these really cool AI features, you're getting a lot of data, metadata you're able to collect from that. How is that informing these metrics you talked about? Yeah, that's like the user side. I think this is more like probably related to Does it feed into, or is it so disconnected that you don't see that sort of roll up of that data? I'm trying to think, like, you know, we take data privacy very Yeah, I'm taking totally anonymized metadata in this, because it's not the data itself, Yeah, so that would probably be more related to the training side itself, so this is information that would inform our model improvements. Obviously it depends on, we don't use every data piece that we get, it will be selected I would say, like filtered for different use cases, yeah. And I'm thinking from a capacity manager's perspective, nine to five in India time, there's a lot of WhatsApp usage, right, so that means we need to have a certain amount of capacity for the AI features we want to offer in that region. That's the sort of kind of metrics I'm just imagining here as you're describing it. Yeah, that's a good point as well. Figuring out your capacity load for different regions or even just peak hours is very challenging. Different regions I think also have just, would say like a different, because there are also policies. I think with the EU for instance, you have your GDPR, so you need to make sure that your capacity is compliant with their requirements there, but versus other regions. That's great point. Capacity planning is not global. It's not homogeneous universal. It's very much driven by policy, which is driven by jurisdiction. So it's not the same thing. It's not cookie cutter, right? It's not cookie cutter, yeah. At metascale, for sure. Very cool. What other cool topics have you seen at the show today that caught your attention? Yeah. I have been talking to like a few others just sort of like players in the field. I'm actually not like really familiar with the space, but I learned like how there are various ways that you can actually power your data center. So I learned more about fuel cells, I believe, so these are new ways I think you can power your data centers that are a lot more cost effective and a lot more scalable, and an alternative to like the grids that we are. Maybe agile, which is like a really big crop too. Yeah. Yeah. And it's interesting to see like how this technology that has been present in other like industries can be leveraged for like AI. Exactly. Yeah. That's cool. Now it seems like when you really zoom out, the ultimate bottleneck is energy. And so you mentioned efficiency earlier on, like the token per watt classically, right? That efficiency with very hard energy budgets, even if they're gigawatt scale, they're still fixed. That seems to be the ultimate benchmark that we all have to measure ourselves against. Yeah, that is true. And energy isn't cheap by any means. No, not at all. And again, it's not consistently priced anywhere, right? Variable by time of day, by region and so forth. Capacity planning for that must be a full time job and then some. Yeah, we have dedicated teams. Awesome. All right, well, you, Elise. This is a fascinating insight into what it's like. Again, this under discussed topic of capacity planning and management for AI, particularly at the scale we all aspire to someday. Yeah, well, thank you so much for having me. Yeah, it's a pleasure. Yeah, likewise.

WEKA’s Val Bercovici speaks with Meta’s Elisa Chen at AI Infra Summit 2025 about how to balance rapid AI innovation with hardware procurement cycles, sharing strategies for GPU utilization, workload optimization, and regional capacity planning.

Speakers:

Val Bercovici - Chief AI Officer, WEKA
Elisa Chen - Data Scientist, Meta’s AI Infrastructure Team

Below is a full transcript of the conversation, which has been lightly edited for clarity.

Transcript

00:00

The Gap Between AI Innovation Speed and Hardware Procurement Cycles

Val Bercovici: I’m Val Bercovici, Chief AI Officer at WEKA.

Elisa Chen: And my name is Elisa Chen. I am a data scientist at Meta on the AI infrastructure team.

Val: So you spoke today, Elisa, and why don’t you tell us what you spoke about? Because it was a really fascinating introduction you were giving earlier on.

Elisa: Yeah, thank you for having me. Earlier today, I was giving a talk about the gap that we currently face between AI innovation and the long procurement cycles with hardware. So maybe to put this more concretely, there’s a huge challenge in how do we balance AI innovation that is moving so quickly—models get iterated on a monthly and even weekly basis—but hardware procurement, from fulfillment to actually making it production-ready, takes a very long time. So how do you balance these two acts? How do we make sure that capacity planning is keeping at the right pace with AI innovation?

Val: I love this topic because it’s so important, and it’s one I really haven’t dived into deeply at all. So definitely a chance to learn more here today.

01:17

How to Predict AI Capacity Requirements Without Perfect Foresight

Val: Do you have to be a bit of clairvoyant to predict the future, or is there a bit of a science to understanding there’s going to be a lot of variability in the algorithms, and yet to your point at this scale, you’ve got to get the infrastructure right, you’ve got to get the supply chain right, and you’ve got to line it up in time.

Elisa: Yeah, good questions. There’s a lot that we don’t know in today’s world, but what we can do is start setting up the correct data foundations, the right telemetry, and even defining the right metrics and benchmarks that we can goal against. For instance, what is good enough? We don’t necessarily always need the top of shelf machinery to power inference.

Val: Such a lazy answer to just buy the most expensive biggest, right?

Elisa: Which is tempting because we do want to be at the frontier and we do want to be the most efficient, but that’s not always needed. So what is the strategy that we want to deploy? That’s something that we can think about separately. And given that we have good measurements—which in itself is already really challenging because you don’t have all the definitions necessarily in mind yet. For instance, how do you think about capacity ROI? One machine that is used to train multiple models and these models have different performances, and how do they translate to business value?

Val: There’s no observability in the industry really that’s mature for GPU and for these workloads yet.

Elisa: Yeah, so it’s a really tough problem. And without the data foundations or just the right measurement—I would say instrumentation—it’s hard to make those decisions. As a data scientist, you need data, good data, quality data to make decisions, which doesn’t necessarily exist. So what are some other strategies that you can deploy?

03:02

Elastic Resource Strategies: Using GPU-as-a-Service for Flexible Capacity

Elisa: There are some other levers, I would say, in probably everyone’s back corner that people can use, such as—given that there are gaps in this knowledge and you don’t necessarily have all the information for predicting what you might need in the future—for immediate needs, there are ways that you can go around this through elastic resource supplies, using GPU-as-a-service. These offer some flexibility for your capacity planning.

But on top of that, you can also start thinking about efficiency as another lever. One thing about the industry is that we are constantly figuring out ways to become better at what we do.

So now we have hardware that is tailored to different workloads. You have your H100s that are dense GPUs used for training foundational models, then you have your A100s that are probably better suited for more lightweight models or fine-tuning purposes. So doing that matching between the right hardware as well as your workload is very crucial.
I think you learn to become more efficient with your machinery in itself, and then you learn that, oh, I can actually train this a lot faster, this model specifically with this hardware. So now I can free up this for other use cases. That’s another lever that you can use.

And lastly, I just want to mention that I think dynamic quota allocations is extremely important. So I don’t think there will ever be a case where you are using your capacity at 95% utilization. That is a dream state that most companies cannot achieve. So understanding which teams are underutilizing their capacity and allocating that to maybe more high-usage teams I think is also a good strategy to keep in mind.

Val: So many follow-up questions.

05:06

GPU Elasticity vs CPU Cloud Elasticity: Key Differences for Large Models

Val: I’m curious—elasticity. I have a lot of experience in the cloud, the CPU cloud world. Elasticity in the GPU world is very different with the giant model sizes and large datasets than the elasticity in the CPU world. What would be some of your main lessons learned in these differences?

Elisa: Yeah, I think that’s a good point as well to make. I do think there’s a lot of dependencies within the machinery as well. You can’t just separate, let’s say, GPU from memory or /IO or storage—all of these have to play together in this tightly coupled way. Even if you have, let’s say, more GPUs, it doesn’t mean that you can free it up necessarily for other workloads. So it is a little tricky, I will say. I think there’s probably some work that can be done in terms of how do you build your workloads? How do we think about using these clusters and for what workloads, and try to keep them as separate as possible, but obviously acknowledging that that is a very tough problem to solve.

Val: And I saw this hands-on—I was at a really fun meetup, a PyTorch meetup at Meta on the campus there about six months ago, and it was one of the first examples I’ve seen publicly, and now it’s really widely known, around disaggregated prefill and decode. And I have, at scale, very high capacity or a lot of capacity management for prefill clusters and distinctly a lot of separate capacity managers for decode clusters. And at Meta’s scale, that was necessary even six months ago, and this is very young and embryonic, but it’s becoming a best practice right now. So that was just one tangible example for sure that I saw at Meta.

Elisa: Yeah, that’s very interesting.

Val: Yeah, very cool.

06:43

How User Metadata from Instagram and WhatsApp Informs AI Model Training

Val: I use Instagram a lot, and then WhatsApp. A lot of these really cool AI features—you’re getting a lot of data, metadata you’re able to collect from that. How is that informing these metrics you talked about?

Elisa: Yeah, you know, that’s the user side. I think this is more probably related to—

Val: Does it feed into or is it so disconnected that you don’t see that sort of roll-up of that data?

Elisa: I’m trying to think—we take data privacy very seriously.

Val: Totally anonymized metadata in this case, not the data itself.

Elisa: Absolutely, yeah. So that would probably be more related to the training side itself. So this is information that would inform our model improvements. Obviously it depends on—we don’t use every data piece that we get. It will be selected, I would say, filtered for different use cases.

Val: And I’m thinking from a capacity manager’s perspective, you know, 9 to 5 in India time, there’s a lot of WhatsApp usage, right? So that means we need to have a certain amount of capacity for the AI features we want to offer in that region. That’s the sort of the kind of metrics I’m just imagining here as you’re describing it.

Elisa: Yeah, that’s a good point as well. Figuring out your capacity load for different regions or even just peak hours—it’s very challenging. And different regions, I think, also have just a different set of requirements because there are also policies. With the EU, for instance, you have your GDPR, so you need to make sure that your capacity is compliant with the requirements there vs. other regions.

Val: That’s a great point. Capacity planning is not global. It’s not homogeneous, universal. It’s very much driven by policy, which is driven by jurisdiction. So it’s not the same. It’s not cookie cutter, right?

Elisa: It’s not cookie cutter.

Val: At Meta scale, for sure. Very cool.

08:43

Alternative Energy for Data Centers: Fuel Cells as Cost-Effective Power Solutions

Val: What other cool topics have you seen at the show today that caught your attention?

Elisa: Yeah, I have been talking to a few other players in the field. I’m actually not really familiar with the space, but I learned how there are various ways that you can actually power your data center. So I learned more about fuel cells, I believe. These are new ways that you can power your data centers that are a lot more cost-effective and a lot more scalable and an alternative to the grids that we know.

Val: Especially in agile, which is a really big problem.

Elisa: Yeah, and it’s interesting to see how this technology that has been present in other industries can be leveraged for AI.

Val: Exactly. That’s cool. Right now it seems like when you really zoom out, the ultimate bottleneck is energy. And so you mentioned efficiency earlier on—the token per watt, classically. That efficiency, with very hard energy budgets, even if they’re gigawatt scale they’re still fixed. That seems to be the ultimate benchmark that we all have to measure ourselves against.

Elisa: That is true. And energy isn’t cheap by any means.

Val: No, not at all. And again, it’s not consistently priced anywhere. It’s variable by time of day, by region and so forth. So capacity planning for that must be a full-time job.

Elisa: Yeah, we have dedicated teams for that.

Val: Awesome. All right, well thank you, Elisa. This is really a fascinating insight into what it’s like—and again, this under-discussed topic of capacity planning and management for AI, particularly at the scale we all aspire to someday.

Elisa: Yeah, well thank you so much for having me.

Val: It’s a pleasure.

Like This Discussion? There’s More!

Hear additional insights from Meta’s Elisa Chen and other AI industry leaders during a panel discussion hosted by WEKA CMO Lauren Vaccarello at AI Infra Summit 2025.

Watch the Full Video Here

PRODUCTS

DEPLOYMENT OPTIONS

USE CASES

INDUSTRIES

ARCHITECTURES

Learn AI

RESOURCES

TECHNICAL RESOURCES

ABOUT US

JOIN US