VIDEO

AI Infrastructure Management and ROI Measurement

I love that you brought up costs. I'm going to do an audience question right now. Do you know how much you are spending on your AI infrastructure? Do you know what the relative costs of the output of what you are, the models you're running, the programs you're running, and the AI you're building. So what's fascinating about the world we're in right now is how do you measure cost? What is cost? What is the value of what you're doing? Is the cost your GPUs? Is the cost your tokens? Is the cost the number of queries that you're able to do? So I would love to talk about that. First big macro question, how should I even think about cost? I actually have a talk later, talk about the cost. Again, really agnostic side, right? We don't host any models. It's really purely from, you know, pricing point of view because we have token index is one thing we do keep track. Like, do you always need the latest or greatest model? Maybe it's, maybe not. Maybe you want to use, you know, those open source models where it give you more deterministic, faster. There's no thinking involved at all times, right? Not much thinking involved. So it really depends on, right, the total cost aside from a token, other, there's human costs, right? There's other, but everything else is much, much lower compared to actual token costs and the input of a token, I'm going talk about all those things, but in general. So I think another thing people don't consider is the human piece. If you keep running something, the human being, you're expensive, engineers are very expensive. So that's one of the things I want to highlight. She's very expensive. Well, maybe I'll be replaced by AI in a few years. Yeah, I think that makes a lot I agree with that. There's a lot of like hidden costs that are not necessarily discussed. Just maybe just to echo on that, there's also a lot a lot of, like, fixed costs, let's say, in power and cooling, like running data centers. Those are really expensive as well that are not necessarily accounted for when you're but this might also be applicable only to bigger companies as well. Yeah. So for wearing my pre Lambda hat, cost is opportunity cost. Okay? So in the the mode that I am seeing conversations happen right now, and this is not me speaking from direct engagement with Lambda's customers, at least not yet. When you have a clear value proposition, it becomes much easier to justify what you need and where. But I think our company has to be very conscious of the costs that, Elisa mentioned, right? Building a working with our colo providers, work to get our capacity, potentially building, in the future. All of those things are, a financial game as much as they are a physical constraints game. And so, being intelligent in terms of picking the right infrastructure at the right time in the right place, that's our best bet in terms of managing cost, because our customers are then going to influence what we need to do next. Maybe just to add on to that, I really like that idea. I think something else that we don't have a good sense of is like what is the ROI of our capacity? Like how do we measure the sort of like upside of using our capacity that is very expensive? That's not always easy to measure either. I would say margin is hard to measure to begin with. The ROI is even more challenging, we don't have costs and yeah, it is very, the whole chain is very murky at this point. Yeah. And from an ROI perspective, should I be thinking about is it the number of tokens that I'm that we are that we're generating if we're doing inferencing? Should I be thinking about how many users I can support simultaneously? Should I be thinking about GPU utilization? And this idea of what is the currency of AI? And if I'm an enterprise, how do I know how to understand if this is good or bad? I think so it does vary, I think, depending on the machine itself as well. I I think token is, like, a very common Costa, I would say, unit here. But then you also do have, let's say, like, your, you know, network cost, IO cost, or your storage cost, which could be measured in terms of, like, know, your petabytes. So I think the unit varies a lot depending on what type of machines we are talking of. Yeah, I'm going go back to the opportunity cost discussion. At least I learned I did my basic training in that at Google. So, but I think the issue there for an enterprise is understanding, I don't know, it sound like a broken record, understanding what you need to do. What is AI going to accomplish for you? That dictates effectively your input understanding of what it's worth and then what you're prepared to spend. And I would just caution, I think, doesn't matter what metrics you pick. Just know that what doesn't any metrics have blind spots will lead you to a path that could not be the right may not be the right path. If you do token outputs, not all token the same quality, and the precision can be different, right? Like the longer the output doesn't mean it's better, right? Another thing I will caution is utilization. How do you precisely measure that as well, right? So, if I were anyone, I would try to measure, I will be a hoarder. I'll measure everything, and then we'll try to triangulate and see what kind of thesis they come up with. As a business owner, right, what I do when I have an AI product is I don't I'm not concerned, maybe it's the wrong thing to do, I'm not concerned about cost. I'm concerned about output. I'm concerned about will my user pay for this, right? Once they prove its validation, I was like, okay, let me try to lower the cost. My GAC, my default insights use the best, the latest and greatest. Once I achieve what I want, we'll we'll break the workflow down to different models, we'll try to do different things, but right now I'm not concerned about cost. So you brought up a couple of interesting things, this idea of, like, what are what are our blind spots? What are the things we don't know yet? And there's also a piece of this that is there's only so much new here. There's only so much new. We are building infrastructure for AI in the world that did not exist years ago. So there's some blind spots in that, but there's only so much that's new. There's a lot that's pattern recognition.

Women AI leaders at AI Infra Summit discuss how to manage and measure AI infrastructure.

Women Leaders in AI Infrastructure: Panel Introduction

Lauren Vaccarello: Chief Marketing Officer, WEKA
Elisa Chen: Data scientist at Meta with over five years in AI infrastructure, building the foundation that powers machine learning models serving ads to hundreds of millions of users daily
Carmen Li: Founder and CEO of Silicon Data and CEO of Compute Exchange, transforming global compute markets through data transparency and creating an independent marketplace for GPU compute trading
Rebecca “Bink” Naughton: Leading data center capacity strategy at Lambda, with three decades of experience across Google, Meta, Yahoo, and Microsoft, having built infrastructure for everything from Meta's first AI research cluster to multi-billion dollar supercomputers

Transcript

00:00

Understanding the True Cost of AI Infrastructure

Lauren: I love that you brought up cost. I’m going to do an audience question right now. Do you know how much you are spending on your AI infrastructure?

Do you know what the relative costs of the output of the models you’re running, the programs you’re running, the AI you’re building?

So what’s fascinating about the world we’re in right now is how do you measure costs? What is cost? What is the value of what you’re doing? Is the cost your GPUs? Is the cost your tokens? Is the cost the number of queries that you’re able to do? So I would love to talk about that first big macro question. How should I even think about cost?

Carmen: I actually have a talk later talking about the cost.

Again, really agnostic, right? We don’t host any models. It’s purely a pricing point of view because we have token indexes. It’s one thing we do keep track of. Like, do you always need the latest, greatest model? Maybe yes, maybe not. Maybe you want to use those open source models that will give you more deterministic, faster results. There’s not as much thinking involved a lot of times, so it really depends on the total cost. There’s human costs, right? There’s others. But everything else is much, much lower compared to the actual token cost and input-output token. We can talk about all those things.

I think another thing people don’t consider is the human piece. If you keep running something, the engineers are very expensive. So that’s one of the things I want to highlight.

Elisa: Well, maybe I’ll be replaced by AI in a few years. I agree with that. There’s a lot of hidden costs that are not necessarily discussed. Just maybe to echo on that, there’s also a lot of fixed costs, let’s say, in power and cooling, like running data centers. Those are really expensive as well that are not necessarily accounted for. But this might also be applicable only to bigger companies as well.

02:05

Opportunity Cost Framework for AI Investment Decisions

Bink: So, wearing my pre-Lambda hat, cost is opportunity cost. So in the mode that I am seeing conversations happen right now—and this is not me speaking from direct engagement with Lambda’s customers, at least not yet—when you have a clear value proposition, it becomes much easier to justify what you need and where.

But I think our company has to be very conscious of the cost that Elisa mentioned, right? Building, working with our cloud providers to get our capacity, potentially building in the future—all of those things are a financial game as much as they are a physical constraints game. And so being intelligent in terms of picking the right infrastructure at the right time, in the right place, that’s our best bet in terms of managing cost, because our customers are then going to influence what we need to do next.

03:12

Calculating ROI for AI and Machine Learning Infrastructure

Elisa: Maybe just to add on to that, I really like that idea. I think something else that we don’t have a good sense of is, what is the ROI of our capacity? How do we measure the sort of upside of using our capacity that is very expensive? That’s not always easy to measure either.

Carmen: I would say margin is hard to measure to begin with. The ROI is even more challenging. We don’t have costs yet. The whole chain is very murky at this point.

03:42

Defining AI Infrastructure Metrics and Currency

Lauren: And from an ROI perspective, should I be thinking about the number of tokens that we’re generating if we’re doing inferencing? Should I be thinking about how many users I can support simultaneously? Should I be thinking about GPU utilization and this idea of what is the currency of AI? What is it? And if I’m an enterprise, how do I know how to understand if this is good or bad?

Elisa: I think it does vary, depending on the machine itself as well. I think tokens are a very common cost. But then you also do have, let’s say, your network costs, I/O cost, or your storage cost, which could be measured in terms of your petabytes. So I think the unit varies a lot depending on what type of machines we are talking about.

Bink: Yeah, I’m going to go back to the opportunity cost discussion. I did my basic training in that at Google. So I think the issue for an enterprise is understanding—I don’t want to sound like a broken record—understanding what you need to do. What is AI going to accomplish for you? That dictates effectively your input, understanding of what it’s worth, and then what you’re prepared to spend.

Carmen: And I would just caution, I think it doesn’t matter what metrics you pick, just know that any metrics have blind spots that will lead you to a path that could not be the right path. If you do token output, not all tokens are the same quality and the precision can be different, right? The longer the output doesn’t mean it’s better.

Another thing I would caution is utilization. How do you precisely measure that as well? So if I were anyone, I would be a hoarder of measurements. Measure almost everything, and I will try to triangulate and see what kind of thesis they come up with.

As a business owner—maybe it’s the wrong thing to do—I’m not concerned about cost. I’m concerned about output. I’m concerned about, will my users pay for this? Once they prove its validation, let me try to lower the cost. My default is let’s use the best, the latest and greatest. Once I achieve what I want, we’ll break the workflow down into different models. We’ll try different things. But right now I’m not concerned about cost.

6:14

Learning from History While Embracing the New

Lauren: So we brought up a couple of interesting things. This idea of, what are our blind spots? What are the things we don’t know yet? And there’s also a piece of this that is—there’s only so much new here. We are building infrastructure for AI in the world that did not exist years ago. So there’s some blind spots in that, but there’s only so much that’s new. There’s a lot that’s pattern recognition.

Like This Discussion? There’s More!

This clip was taken from a longer conversation at AI Infra Summit 2025, where Lauren, Elisa, Carmen, and Bink covered a broad range of topics for anyone building the AI infrastructure of the future.

Watch the Full Video Here

PRODUCTS

DEPLOYMENT OPTIONS

USE CASES

INDUSTRIES

ARCHITECTURES

Learn AI

RESOURCES

TECHNICAL RESOURCES

ABOUT US

JOIN US