VIDEO

AI Infrastructure Management and ROI Measurement

Women AI leaders at AI Infra Summit discuss how to manage and measure AI infrastructure.

Women Leaders in AI Infrastructure: Panel Introduction


  • Lauren Vaccarello: Chief Marketing Officer, WEKA
  • Elisa Chen: Data scientist at Meta with over five years in AI infrastructure, building the foundation that powers machine learning models serving ads to hundreds of millions of users daily
  • Carmen Li: Founder and CEO of Silicon Data and CEO of Compute Exchange, transforming global compute markets through data transparency and creating an independent marketplace for GPU compute trading
  • Rebecca “Bink” Naughton: Leading data center capacity strategy at Lambda, with three decades of experience across Google, Meta, Yahoo, and Microsoft, having built infrastructure for everything from Meta's first AI research cluster to multi-billion dollar supercomputers

Transcript

00:00

Understanding the True Cost of AI Infrastructure

Lauren: I love that you brought up cost. I’m going to do an audience question right now. Do you know how much you are spending on your AI infrastructure?

Do you know what the relative costs of the output of the models you’re running, the programs you’re running, the AI you’re building?

So what’s fascinating about the world we’re in right now is how do you measure costs? What is cost? What is the value of what you’re doing? Is the cost your GPUs? Is the cost your tokens? Is the cost the number of queries that you’re able to do? So I would love to talk about that first big macro question. How should I even think about cost?

Carmen: I actually have a talk later talking about the cost.

Again, really agnostic, right? We don’t host any models. It’s purely a pricing point of view because we have token indexes. It’s one thing we do keep track of. Like, do you always need the latest, greatest model? Maybe yes, maybe not. Maybe you want to use those open source models that will give you more deterministic, faster results. There’s not as much thinking involved a lot of times, so it really depends on the total cost. There’s human costs, right? There’s others. But everything else is much, much lower compared to the actual token cost and input-output token. We can talk about all those things.

I think another thing people don’t consider is the human piece. If you keep running something, the engineers are very expensive. So that’s one of the things I want to highlight.

Elisa: Well, maybe I’ll be replaced by AI in a few years. I agree with that. There’s a lot of hidden costs that are not necessarily discussed. Just maybe to echo on that, there’s also a lot of fixed costs, let’s say, in power and cooling, like running data centers. Those are really expensive as well that are not necessarily accounted for. But this might also be applicable only to bigger companies as well.

02:05

Opportunity Cost Framework for AI Investment Decisions

Bink: So, wearing my pre-Lambda hat, cost is opportunity cost. So in the mode that I am seeing conversations happen right now—and this is not me speaking from direct engagement with Lambda’s customers, at least not yet—when you have a clear value proposition, it becomes much easier to justify what you need and where.

But I think our company has to be very conscious of the cost that Elisa mentioned, right? Building, working with our cloud providers to get our capacity, potentially building in the future—all of those things are a financial game as much as they are a physical constraints game. And so being intelligent in terms of picking the right infrastructure at the right time, in the right place, that’s our best bet in terms of managing cost, because our customers are then going to influence what we need to do next.

03:12

Calculating ROI for AI and Machine Learning Infrastructure

Elisa: Maybe just to add on to that, I really like that idea. I think something else that we don’t have a good sense of is, what is the ROI of our capacity? How do we measure the sort of upside of using our capacity that is very expensive? That’s not always easy to measure either.

Carmen: I would say margin is hard to measure to begin with. The ROI is even more challenging. We don’t have costs yet. The whole chain is very murky at this point.

03:42

Defining AI Infrastructure Metrics and Currency

Lauren: And from an ROI perspective, should I be thinking about the number of tokens that we’re generating if we’re doing inferencing? Should I be thinking about how many users I can support simultaneously? Should I be thinking about GPU utilization and this idea of what is the currency of AI? What is it? And if I’m an enterprise, how do I know how to understand if this is good or bad?

Elisa: I think it does vary, depending on the machine itself as well. I think tokens are a very common cost. But then you also do have, let’s say, your network costs, I/O cost, or your storage cost, which could be measured in terms of your petabytes. So I think the unit varies a lot depending on what type of machines we are talking about.

Bink: Yeah, I’m going to go back to the opportunity cost discussion. I did my basic training in that at Google. So I think the issue for an enterprise is understanding—I don’t want to sound like a broken record—understanding what you need to do. What is AI going to accomplish for you? That dictates effectively your input, understanding of what it’s worth, and then what you’re prepared to spend.

Carmen: And I would just caution, I think it doesn’t matter what metrics you pick, just know that any metrics have blind spots that will lead you to a path that could not be the right path. If you do token output, not all tokens are the same quality and the precision can be different, right? The longer the output doesn’t mean it’s better.

Another thing I would caution is utilization. How do you precisely measure that as well? So if I were anyone, I would be a hoarder of measurements. Measure almost everything, and I will try to triangulate and see what kind of thesis they come up with.

As a business owner—maybe it’s the wrong thing to do—I’m not concerned about cost. I’m concerned about output. I’m concerned about, will my users pay for this? Once they prove its validation, let me try to lower the cost. My default is let’s use the best, the latest and greatest. Once I achieve what I want, we’ll break the workflow down into different models. We’ll try different things. But right now I’m not concerned about cost.

6:14

Learning from History While Embracing the New

Lauren: So we brought up a couple of interesting things. This idea of, what are our blind spots? What are the things we don’t know yet? And there’s also a piece of this that is—there’s only so much new here. We are building infrastructure for AI in the world that did not exist years ago. So there’s some blind spots in that, but there’s only so much that’s new. There’s a lot that’s pattern recognition.

Like This Discussion? There’s More!

This clip was taken from a longer conversation at AI Infra Summit 2025, where Lauren, Elisa, Carmen, and Bink covered a broad range of topics for anyone building the AI infrastructure of the future.