VIDEO

The Real Cost of AI: How Smart Companies Are Maximizing Token ROI

Hi, everyone. It's great to be with all of you. This is going to be a really valuable session on the return to intelligence or return on intelligence, return on investment. We're gonna have a kind of a play on words here, but really the mission of this session is to talk about how do we pay for all of this? How do we pay for it now? How do we pay for it in the future? And, is the word investment, know, kind of too limited to talk about just in terms of financial sense? There's obviously a lot more that you get in return for investing in AI today than just your bottom line. So we'll get into all of that. But before we get started, I want to introduce myself. I'm Hope King. I'm the moderator. I'm the founder of MacroTalk News. I went independent last year. So please subscribe on YouTube and and find me on LinkedIn to connect. And I'll have each of my esteemed panelists introduce themselves. Jonah, I'll start with you. Great. Thank you, Hope. My name is Jonah Serks. I'm a director on the growth equity team at Generation. We are fortunate enough to be an investor in WEKA as well as a bunch of other companies that are tackling this challenge. And I spend a lot of my day job at a higher level or a dumber level thinking about AI efficiency and infrastructure. We were founded by vice president Al Gore, and so I spent a lot of our time also thinking about the climate impact of AI. Amazing. Valver Gavici, chief AI officer at WEKA. Previously, I helped co create the Cloud Native Compute Foundation for bringing Kubernetes out of Google and open sourcing it, and I was even CTO at NetApp in the cloud era after their SolidFire acquisition. Awesome. Thank you, Hope. My name is Dan Lawrence. I am the general manager of Nebius for the Americas, and prior to joining Nebius, I was at Akamai in AWS. Nebius, if you're not familiar, is an AI cloud. We build massive GPU clusters and capabilities around the world. Alright. Let's start with the big question which is the cost. Like on the surface right now, the economics seem to be improving. Right? Models are getting cheaper, GPUs faster, token costs falling, but many organizations are really still finding it hard to keep up with their AI bills. So Val, I wanna start with you. What is happening under the surface if all of these elements seem to be falling? There's a bunch of these very conflicting narratives out there right now. On the one hand, that unit cost per token with open models, open weight models from China and other innovations are definitely dropping quite dramatically. On the other hand, AI itself has fundamentally changed since the ChatGPT Cambrian explosion. We were doing very small chat sessions, we started adding PDFs, we started adding agent turns, thousands of turns per session versus one back and forth. We start adding video and multimodal models right now. So the volume of tokens is exploding at the same time as some reductions in unit costs are happening. And the net effect is we're even seeing a GTC a couple of weeks ago in San Jose, Jensen talking conservatively on one end of budgeting for tokens at the same rate as a full time engineer's salary. So let's just say, you know, two fifty k, don't laugh here in the Bay Area, that's a low number, but other parts of the world it's high. But one other end of the spectrum, and I think it is a polar end of the spectrum, is semi analysis boasting that some of their top analysts are at about a fifteen thousand to twenty thousand dollar daily spend on tokens. So that's annualized to about five million. And Dylan is proudly saying they're worth it. Right? The output and the revenue they generate from that investment is worth it. So really interesting data points already. I thought that the AI spend was supposed to replace the human cost. So Jonah, what's what's going on? The demand, it sounds like is obviously very much overtaking even the unit cost going down. Yeah. I didn't want to be the first person to bring up token maxing, but It's gonna be the other topic and toys. So that's we know we can do We are in we are in this funny this funny moment where the kind of the measure of return is usage for many companies who are still not able to calculate whether that usage is turning into something productive. And when that measure of return is usage and the measure of how well am I doing my job is usage, and the measure of how much might I get paid is usage, you're gonna use as much as you can. And so there's this kind of human emotion and human psychological desire to max right now, which is a huge part of it. And the second is I would just double click on what Val said. We're seeing a huge change in, you know, the the architecture of what AI is doing for us and the requirement on kind of token use is therefore far far higher. What's weird is that no one really knows why something should be a hundred thousand tokens for a task versus ten thousand or a million. Sometimes there's no real sense of that. And that didn't matter as much when it was a simple chat session and the difference was twenty or forty or fifty. But now when it's an AI agent running overnight and it you wake up in the morning and it's used one point eight million tokens versus one point eight k tokens You you know what I mean? There's sort of a misunderstanding of how useful that is. Well, who who who knows the answer? I mean I mean, you can also look into the future with what you're building at at Nebius, so maybe you can shed light on this. Yeah. So certainly the, you know, the price of token is is on everyone's mind, and as an infrastructure provider, we're trying to make it as palatable as possible. And part of part of that can happen, you know, a number of ways in terms of being horizontally or vertically integrated across the supply chain. So if you're building the AI factory, if you're breaking ground, if you're designing the servers and whatnot, then there's a cost advantage to doing that versus if you colo a site and pay hardware for else. If you think of all the margin across the profile. So that's one way to drive the economics down. The other is on automation. And so, you know, orchestration and spending money to develop software that's going to, you know, actually make it automated and and bring the cost down, and and make the hardware efficient as well. So that's a lot of out of the control of the builders in the room and the founders. What's within their control? I mean, you you can't really pull back on demand. You gotta move faster than your competitor. So I don't Jonah, if you wanna start with that, and and Dan also too. I mean, we're gonna talk a little bit about resource constraints because the earth has got finite resources for now. But so how would you advise companies today to manage the cost internally when they do have to continue to ship? So there are there are sort of multiple ways of answering this question. I think we there's two things that we tend to recommend. The first is trying to think about the usefulness of each token that you spend and trying to tie it closely to a business outcome rather than just to usage. And that business outcome will be very different depending on the task, whether it's customer support, whether it's code generation. The outcome will be different, but trying to tie it really closely. And the second, which I'm sure we'll talk more about, are the architectural decisions that enterprises make around whether it is being able to vertically integrate and build their own AI factory, giving them more control, whether it is being able to work with someone like WEKA, for example, to effectively make their storage more efficient. There are a number of sort of architectural things people can do, but those are the two vectors that we we recommend attacking from. And if you rewind back the clock a little bit, when cloud was taking off, companies took advantage of it and scaled, and departments got involved, and cost was overrun. And from that, people learned, and FinOps was born as a category. And so companies hired people to put controls around, you know, the financial financial implications of using cloud. I think similarly from an AI perspective, token cost, AI cost, etcetera, I fully expect that humans inside organizations are gonna have to have responsibility around governance and creating the guardrails. Yeah. And they're doing that now. And by the way, we're gonna have time for Q and A at the end, so for those who've just entered the room, we're gonna we're gonna have the moderated panel and then we'll have ten minutes before the end. Yeah. Go ahead. If you don't mind, Hope, let me give a very specific answer to follow-up to this since there are builders in the audience here. You know, I love the reference to FinOps. I think I know what we need to do now is if we're gonna use Open Clause as a framework, you wanna give your agent a particular skill here, an AI FinOps skill, which is look at OpenRouter for example, look at the category of models you're interested in, typically coding agent style models, look at the pricing. OpenRouter is really transparent about not just a million, you know, the price per million input tokens and output tokens, they've been at the forefront of cash read pricing, and a kind of an empty column that's about to change soon with some innovations as well because introducing called cache rights, and have the skill optimized for model providers that are giving you a much closer gap between the input price and the cache read price. Because as the caching technologies mature, there almost shouldn't need to be a difference between those two prices, but they should very much converge. That's a very particular skill you can implement right now. I was gonna go to you for the next question too. As as Dan was talking about, there's still a big human problem in executing AI. And one of the big discussions in a lot of companies, you know, you're a chief AI officer. Not every company has a chief AI officer. There's a lot of debate like who owns the work stream? Who owns the strategy? What do you see working well inside the companies that are moving fastest? And I'll start with Val first and then and Jonah. You know, is it a CFO, CTO? Is it CIO, CEO? Like, I mean, every company of course is different. Industries are different. But generally speaking, do you see certain dynamics playing out better in order to move faster on AI? Yeah. This this moves so so fast. I have a cyber background, so the way I apply it, call it CHOW for short, this chief AI officer, is the the CSO model, the four letter c level model, where you're more an advisor to various other c levels that really own budgets and and own workflows. So I work very very closely with our CIO. I work very very closely with our CTO in the room here. I work very very closely with a whole bunch of folks with you, you know, within WEKA to help evangelize this was the job last year, help evangelize the usage of agents. The job this year is very much token maxing. And so it's it's a consultative role I find, but you you you have to keep adapting because this industry adapts to the pace I've never seen before. I I think the real question is who has the veto power? When it comes to the big decision and someone needs to make that decision, like who makes that call? Who is the best person? Interestingly enough, I haven't seen veto be exercised yet right now. Nobody's vetoing? In the spirit of token maxing, in the spirit of transparency here, like we are one of the first or one of the few companies in our size that actually has an anthropic enterprise license. It's kind of an honor to be you have a phone call return from their sales team nowadays, but we have them. We have OpenAI. We have Google Workspace and Gemini. We have a lot of models. We use open weights models as well, so it's not as much not so much veto as experimenting Right. Measuring the results of that experiment, and then just doubling down on what's working. Yeah. Sorry. Real quick. Joanna, go ahead. Yeah. This this may be a more controversial take, but we've we've seen in some of our portfolio companies and actually some of our kind of enterprise partners that the chief people officer has a really interesting role to play. And actually, as you think about the role of agents in replacing human labor, the decision around human resource allocation actually becomes question. And so when you sort of power a chief people officer with a CTO, and maybe you have a CISO together in that sort of triangle of power, it's really interesting what we've seen companies get done. But so in that triangle of power, who who's who's ultimately like writing the checks? Mean, the CFO has to, but also like if there's an initiative and two of them don't agree or I mean, where do you see that falling? I think what's worrying right now is everyone's writing checks, and so I mean, you're seeing It's a good thing, I think, for a of people. Oh, you're seeing you're seeing, you know Yeah. Anybody in an organization is able to get get things done. I think when something happens negatively, the CISO will start to take more power, I think. Dan? So I agree with both of you that having a leader or leaders responsible is a big lever. However, one of the other things I'm seeing in our customer base that's quite interesting is a convergence of product management and engineering. And so for a long time, product managers were the ones responsible for talking to customers, understanding requirements, etcetera, and the engineers were in the back office building and writing code. We're seeing now that those two personas are converging, and so the the fastest working companies with AI are ones where the engineers are actually going out and talking to customers, helping define product because things happen so rapidly with model changes and automated code, etcetera, that that is really a powerful phenomenon. That's great stuff. I feel like that's something that everyone can take away. Any other best practices that you've seen you've seen work generally quickly and then Val? Anything else that you want to add to that? I'll I'll I'll pass. Yeah. Yeah. Just to follow-up with what Jonah was saying, a trend we're absolutely seeing, the convergence not just at an individual level of product manager and engineering skill sets at a group level. I joke it's like the revenge of the MBAs or the fact that middle managers were very much, you know, considered unnecessary. They're almost critical path right now because you have Amazon that. Exactly. Well, you have these ambitious goals now that you can set upon an agent swarm, but the skill in making that swarm productive and successful is understanding how to divide and conquer the tasks that support the goal, how to supervise, how to measure, how to coordinate. That's really the state of the art in terms of skilling your agent swarm, and that is a middle management skill set. I that is a very contrarian take, Regina. What do you think? I think right I think right now, it's right. I mean, there is a question as to whether it's right in one year or two years. I think the other question that the sort of the premise it relies on is that the way things are currently done is the way things should be done once agents are doing them too. And I think the reality is that many processes that enterprises use to do things that we take for granted will change fundamentally once humans are no longer in control. And so, you know, trying to get agents to replicate the process that humans have been doing for twenty years or thirty years with an enterprise may not actually be the right approach either. And and I'm hearing there's a lot of friction to try to train agents. I mean, that knowledge, institutional knowledge deep within, you know, a senior manager, you you know, you it's it's innate within them and to have to document and to do that. Because you brought it up, what are we doing today that you think will be obsolete in two years? I think we're we're seeing a forward deployed engineer explosion right now, which could well be more obsolete in two years. I think there is a effectively a kind of a process documentation, you know, period that we're going through. I don't know that that needs to last forever. At some point, there is a self learning capability of agents. Any of you want to add on to that too? Yeah. I mean, I'd say from a just a planning perspective, the biggest fallacy you can make is that the state of the art will remain static. Right. Yep. Right? It's breathtaking what the state of the art keeps happening, know, keeps how and keeps improving every month or so. But exponential progress means that these models are gonna be incredibly powerful by the end of this year, and so the agent loops are going to be running on these models, if we even have to still run agent loops for these models, is just not going to be the same as it is today. So to Jonas point, even what a forward deployed engineer does today, which is a really critical skill set today, will either evolve or go away by the end of the year. Eliot? Yeah. And I think that as we mature with AI, certainly we're in an augmented workplace now where AI is augmenting and helping productivity. As we automate, then that'll change. But, I think the place where humans go next is verification. And so, workloads happen, changes happen, and ultimately agents may be verifiers as well, but I think as humans it's our responsibility to to verify what's happening. And I just wanna go back into the technical side of things and and start with you, Val, on on this portion of our of our chat here. You've written something called the AI memory wall. For those in the audience, maybe not so deep in the infrastructure weeds, what do you mean by this AI memory wall? Why does it matter to this topic of return on intelligence and investment? It's pretty fundamental, you know. If if you talk to a lot of experts in infrastructure in general and inference, they'll tell you the memory is a bottleneck and what they mean by that literally is in the past being two to three years ago, we had this gentle balance in scientific computing, GPU computing between GPU floating point operations per second and memory capacity and bandwidth. AI, chat GPT, and reasoning models kind of broke that balance, and then agents and multimodal agents completely broke that balance. So we're at a in a situation right now, the stock markets, the supply chains, the stocks in the supply chain reflect it, is we physically don't have enough materials in the world to fabricate the memory needed by agents yesterday. So the memory wall is this collision of how do we actually get agents to run, how do we get them to run on Nebius and so forth with the physical, you know, capacity that we have for memory, and that's where a lot of innovations the good news is a lot of innovations at the software layer and elsewhere let you reorient how you use a standard GPU server. And for example, take a lot of stranded storage that is non volatile memory flash storage in that server and be able to wire it in such a way through software where it literally delivers additional memory. And that unlocks a huge bunch of efficiencies. It literally gets you the, at scale, the capacity of four to five additional data centers from one physical data center. And and just also related to this, you know, you've done lots of research, and and one of them I want you to talk about the research, but when we were having our our prep call ahead of this, you brought up a really fascinating anecdote about two analysts spending fifteen thousand dollars a day in API tokens each annualized to five million dollars. I think the company CEO said that the output was worth more than that, but can you elaborate on this and and just just tell everyone too what what we chatted about? Because I think that's such a fascinating case study. Yeah. Just to recap, so that's this is semi analysis, very very well known. Jensen gave him a big shout out at his keynote a few weeks ago on stage for the inference max benchmark. One of their other analysts not working on that particular benchmark, his name is Jeremy, and he's a data center analyst who just analyzes satellite imagery. He goes on-site to these weird locations all over the world and figures out, you know, is there powering? Is there water? You know, do the local zoning laws and communities support these data centers? And he advises these large, you know, mag seven style companies on is this billion dollar CapEx you're about to spend worthwhile? So he's developed this amazing model. He showed it to me last week in New York actually, where it's literally like a three d interactive game where you can look around the globe and double click on a particular country and region, and then find out exactly what data centers are there, what power they're using, you know, what's in them, what kind of processors are in them. That, you couldn't have done that individually before as an analyst with Cloud Code in this case, with leveraging it and with token maxing absolutely at fifteen thousand dollars a day, he was able to build this amazing model, and I get it now when Dylan, the founder of semi analysis says, this is totally worth it because I can see how they can sell that model multiple times from multiple millions of dollars to multiple customers. So this is an outlier. I mean, I'm trying to gauge for everybody here, like that's a lot of money. It's a lot of money. It seems to us today like that's a polar endpoint. Okay. You know, to Jonah's point earlier on, we've gotta rethink I'm not gonna say rethink labor and jobs. Rethink tasks. Fundamentally, think the tasks that make up our jobs, and in many many cases, yes, you can have a lot of tokens, fifteen thousand dollars a day of tokens, do a team of engineers that would have been ten engineers in the past, and you would have delivered you will deliver something with an agent swarm in days that would have taken that team months. So at a task level, the ROI is undeniable. Yeah. I think that the the the common thread here is that it's about the usefulness of the output to justify the in insane cost of the input. There is a ratio there. Yeah. Exactly. But do you cosign on the fact that, like, this is a good for this particular company. Right? And what are you what are you seeing in terms of cost? For this particular company, yes. It sounds like it. I think the the reality is is that through the arc of history, companies have always spent tens of millions of dollars on things that they have felt are important and would drive high ROI, whether it's hiring an extremely expensive person who can deliver something or whether it is throwing on a conference that costs tens of millions of dollars. You know, I think it's not a it's not a foreign concept, the idea of doing this. I think we just now have a a new very productive thing to spend on. Alright. So we know how much the price tags are, generally speaking. When you look at procurement and procurement is probably like one of the least sexiest like departments in any company. Right? But they are very very important. And Dan, you had a great kind of case study too that you shared on the call around the speed now at which procurement has to move. And this also goes back to kind of a human element of AI execution. So share with us what a typical procurement process that most larger enterprises are going through now, and and the speed at which they have to move, and and really how we get and move all companies there, but also what needs to happen for change management inside procurement. Yeah. So hope there's, you know, this amazing supply and demand imbalance in the market right now, which is driving some really interesting behavior, I'll say. And as a result of that, we have capacity that's soaked up. Right? So at the moment we have roughly three, four, five customers for every one GPU that we have for sale, and we're not alone. All the other providers that are out there in the market have that same phenomenon, and so what happens is a company, let's say a Global two thousand, they have a procurement process that includes a legal review and multiple approvals in terms of finance and perhaps a business person. And so they've built a very disciplined procurement, risk averse, financially motivated, right? And it makes sense for that. But in this time and as as the, you know, the supply and demand imbalance continues and the pace of AI changes, when people need to to procure, they're gonna have to move faster. And what we're seeing also is that smaller AI native companies are capable of doing that, and even their their funders are are encouraging them to move quickly and be less concerned with risk and and, you know, buy hundreds of millions, billions of dollars worth of capacity in days and hours versus weeks, months, years. It's a risk question, the trade off, like that they can and want to. You were joking that everyone's just writing checks these days, so that sort of aligns with procurement teams having to be collapsed in the process and and the timeline. Did you wanna how you're I would say the the procuring of capital was happening in a similar Yeah. Similar fashion, which, you know, again, if you look at the arc of history, that typically led to bad outcomes. I think here, you know, the the underlying kind of product and demand is growing so fast that it's there's an argument for why it might be different. There's an interesting thing we've started to see just on the procurement Thread is that as more developed enterprises who have sort of swarms of agents within their tech stack and are calling on various tools to execute tasks to do work, sometimes those agents come back to the human user and say, hang on. You are I I tried calling this tool. I actually looked into the analytics of that tool. You've never used it before. Why are you paying a million dollars for this tool? And it's starting to have an opinion on application procurement, which is also enabling procurement to move faster, which is a trend that we're covering and thinking about quite strongly, is how companies procure anything in a world where humans are no longer actually using the tools that are being procured. Val, do you wanna talk more about just unused capacity? I think even just to build upon the previous, you know, comments here, the prevailing wisdom, the conviction I see amongst a lot of leaders is the opportunity cost. To not invest rapidly, to not procure rapidly is just too high right now. There's enough conviction within leadership everywhere that the benefits, the ROIs are not restricted to edge cases or outliers, but they're very much, you know, in the middle of the bell curve. And and yeah, acting fast is a cliche and hopefully not breaking too much, but it clearly the dollars are you know, people are voting with their dollars and their wallets right now. Something that a lot of companies are trying to figure out is how to deepen their competitive moat at a time when AI can just copy services or companies can take clot and build it inside of their own organizations with the right engineers. There's a discussion around how infrastructure could be maybe now a competitive moat if you if you actually own some infrastructure or or you have more control over that. Dan, I wanna start with you just really quickly on this. How have you seen maybe the way that you've organized your teams, but also how customers have come to you on this? Have you seen maybe more of a desire to get into infrastructure somehow, and what does that look like? Yeah. So I think certainly certain customers are trying and experimenting with adding, you know, AI factory type capacity on their own, And those that have really really deep pocketbooks are somewhat successful in in doing that. Those that don't wanna do that are I think have it struggling. And the reason that they're struggling is the talent that's available, that has the know how is scarce. Just because you were able to build a data center ten years ago, fifteen years ago, and horizontally scale it with CPUs and networking and storage, the new paradigm and the new way of building is different. That skillset isn't also very transferable. It's going to take heavy investment in terms of training and changing the humans that are doing that. And so because of that, we're finding that customers are willing to partner with the companies that have those resources and the humans that can build those data centers. Jonah, on this, how do you advise companies on this question? The eternal challenging question is the search for moat, especially within companies at the application layer. There was a period maybe a year ago where a lot of those companies were fine tuning their own models, or in some cases, even training their own models. And that quickly, in many domains legal is quite a good example, where in legal AI, there were companies building their own legal AI models, which have now almost all entirely been deprecated in favor of the frontier models that are very high performance at legal tasks. And so they're now looking to go a level deeper into the stack to find their moat, which is what brings them closer to infrastructure. And so we're we're seeing that trend, and we'll see that repeat itself across these domains over time. I think the question is partly, agree with Dan. I mean, there's a massive talent gap, enormous talent gap at the infrastructure layer. Data center technology has moved on so fast that it has left many behind. And there's also the question of it is moving so fast, and so you don't want to be saddled with physical infrastructure that actually evolves so fast that you end up putting forward a huge amount of upfront capital spend for something that doesn't give you an advantage and actually gives you a disadvantage in the long run. So it's a really tough equation for companies, but I think Dan is right that right now, those who can afford it and who can get the talent are finding significant advantage. Go ahead. Yeah. Was just gonna say, you know, the other sort of moat opportunities around experience and experiential data that's captured. So if you've been in a retail market for twenty years, thirty years, and you have all of that data, that's a moat. If you So even for new companies, really picking your experience and capturing and focusing on that think is probably something that AI is not gonna be able to do. There's no compression algorithm for experience as Andy Jassy from Amazon likes to say. Yeah. We're at an event too which can't be replicated. But, Val, did you want to add on to this? I mean, you know, the question of on prem now and people going out and buying Mac minis and Yeah. Get very granular, yeah, to the point where there's no such thing as one model. Right? There's no such thing as one style of inference whether it's local or whether it's regional or whether it's, you know, big consolidated data centers. The differences I'm seeing with cloud, because I got in very early about seventeen, eighteen years ago, and cloud is, yes, aggregation of infrastructure is massive, but the infrastructure to Dan's point is so different now. It's so high performance. It's so parallel. One rack of CPUs back in the day was less than a thousand cores. One rack of GPUs is over a million and a half cores. So that skill set is just fundamentally it's a different universe. Fundamentally different. But the other thing is planning ahead, right? The marginal cost of a new cloud user is almost nothing. You've pre compiled a software, it's another record in database, very scalable to run a traditional SaaS business model. The reason the SaaS populace is among us is people have realized the incremental cost of a new AI user is so different. It's like a whole other, you know, replicated cost for every user. There's no marginal cost anymore. And so planning for that energy wise, capacity wise is is a different process, and I've got a personal high take that we're gonna see, you know, a lot of mid market neo clouds and a lot of mid market SaaS companies have to merge. Because if if they can't cost effectively, you know, to Dan's point, produce the tokens, then they just won't have a gross margin business. Yeah. Was just gonna add on that I think also there's the capital component, and companies have to make huge investments upfront to build out infrastructure, And there's a lot of change coming as well. And so we don't we don't yet know the useful life for, you know, Verruban or or whatnot. Right? And so we're making a lot of predictions in that regard. And so you're you're making some really big bets infrastructure wise with your capital with a lot of unknown. And so yeah. I I I know it's gonna be a personal, you individualized decision, but on the whole if we're trying to get ahead and and develop this moat now, would you advise companies to look more deeply at infrastructure as a competitive moat? I I think that infrastructure, you certainly need it, right, to to run AI and and and whatnot. I think having some knowledge around using it and whatnot is is important. I do think that, you know, and I say this in an unbiased way, the the clouds are there's a business for for cloud and AI cloud for a reason because the scarcity, because of the capital, of those things take a lot of optimization. And if if that's not your core business, then, yeah, leave it to the experts. Okay. Leave it to the experts. We're gonna have q and a in in just a couple of minutes, so if you have anything that you wanna say, just you'll just have to kind of stand up and I'll I'll repeat the question for for the the room. The last question, and and I don't mean it to be the the like sort of least important in terms of you know, being the last, but on the responsibility side of you know, all of our our lives as citizens of of this one earth that we have until we get to Mars and moon. How how are you thinking about the responsibility that that you all have and and that your CEOs that you're that you're working with have on having such a resource intensive process that are that are building their businesses? Because we are, you know, super constrained. Is there a responsibility and and or do we just kind of forget about it for now and let the next generation deal with this? Joan, I'll start with you. It's a question we think about a lot, And I think every investor and leader should be thinking about the trade offs that they're making for the progress that we're seeing. We think about it in two ways. Firstly, as with all transformational technologies, electricity, the wheel, fire, these technologies themselves don't have values inherently. The value that comes from it is shaped by the people or systems that actually control what they're used for. And the same will be true of AI. And so we are trying to empower and promote as many kind of, I guess, impact maxing from like a high impact perspective, trying to empower leaders to focus on the highest impact uses of their tokens, which is why I think about this usefulness of per token idea. And that that's the that's the sort of first element. The second is the energy constraints are real and serious. And, you know, I think you you may know that the the kind of demand for gas plants in the US is 5x in the last couple of years, largely to drive data center demand. That is a near term phenomenon. What's actually quite interesting is that there is a enormous increase in demand for renewable power and battery stored power right now also to power data centers because it is fundamentally cheaper. And the stat I love is that last year, over eighty percent of new renewable investment came from effectively the AI industry making forward orders for cheaper energy. And so we're sort of seeing a kick start Of cheaper cheaper, lower impact energy sources, which hopefully will come online in the next few years. Val? Yeah. One of the things I love about WEKA is that, you know, it's very aligned. The the efficiency story we've had, you know, performance efficiency wasn't really trendy before AI, now it's extremely trendy and important and relevant, but it's completely aligned with the economic efficiency, the the the cash flow, the CapEx, OpEx, energy usage efficiency, And this enables literally more tokens per either a user preference latency budget for alignment on quality and morality of an answer, as well as cyber security and safety of an answer. If we're going to fix energy and a fixed latency budget, you can do more iterations, you literally have a higher quality answer, a better answer, more moral end, and an actual safe response for an agent that has actions and power to do things on your behalf while you're sleeping. Sustainability is certainly on our mind as well. Every data center we build, we try to use as much renewable energy as possible. We try to do creative things like heating towns. Our data center in Finland, we actually heat the town next door. So it's certainly on our mind and it's our responsibility, but I do think that sustainability is a shared responsibility as well, so there's the providers of services but also the consumers of the services, and just like at home, you know, you you pay for your electric bill and they have to give you the electricity. Yeah. But you have to turn the lights off also. And so I think there's a shared responsibility. Yeah. On an individual consumer level, mean, I think maybe I should just ask Google and not Gemini like on the day. With the time remaining, I'd love to open it up to questions from the audience if you've got one. First one's the hardest. Yeah. Yes. Go ahead. Of the interesting things gentleman over far right because you hear say partnership between, like, the head of a people team and the head of, like, engineering. Curious if you could share more details or what sort of partnerships you see there. Absolutely. Yeah. So the question is around partnerships between people leaders and engineering leaders within companies and where that's been successful. I'll zoom in on one very specific example. So one of our portfolio companies is a company called Gloat, and they build a kind of internal talent marketplace that helps match people within very large organizations to different pockets of that company. So say you're in product, and actually there's a project going on in the finance team that you have a specific skill set that would be relevant for, you can go and work with them temporarily. And they started working with some of their largest customers. These are Fortune ten, Fortune fifty companies who were initiating projects that were one off that might be collaborative. So for example, there's an m and a transaction happening and suddenly legal finance product and maybe marketing need to work together. And there's a project leader, and that person only has a three week window to complete that project, and they know what their various tasks are. What they don't know is what could humans in that team do and which humans. They also don't know what AI tools do I have at my disposal that might actually exist across other teams or enterprise wide that I could use to orchestrate the work I'm doing and perhaps reduce the human burden. And so as an example, they're using a tool right now that basically says, tell us what your project is. Tell us what tasks that sort of you'd break that project down into, and we will help allocate that to humans and then to AI agents and tools. And basically show you which tasks are more applicable for humans and which tasks are more applicable for agents. And that that became quite quickly an HR problem because it was around how do we how do we ensure that we have the right skills. And skills are a You know, fundamentally the kind of the bedrock of human talent and then of the ability to complete tasks to be able to do that project. And they were able to complete that project in half the time that they expected, and now it's become an HR initiative to roll this out across the organization for all projects, which is centered on this human agent kind of handoff collaboration. Yeah. Go ahead. I have a question about the value realization that y'all talked about. So where are we at in trace of the verification attribution of that contractible outcome? Because the value is, of course, being created. But is it the is it the value that meets the organizational intent? And how is contractible? Is it commercial? Is it enforceable? Yeah. This is a very active area of research right now, and some of it is solved. So the the whole concept of reinforcement learning, particularly the modern version of it, is to train on something that has a verifiable outcome, so you can give the reinforcement learning algorithm a verifiable reward or a penalty for getting it right or wrong. Source code, you know, there's a reason why it's so popular right now because programs, you know, either run or they don't. They either produce like something that corresponds with a test result or they don't or they crash. So anything that's mathematical, anything that's algorithmic, you know, programmatic is turning out to be really really good. When you have open ended, you know, legal or HR issues, then it's very very hard because it's not there's nuance applied, and that's where active research is going on in this concept of evaluations and benchmarks and making a subjective and nuanced call as to whether a particular service level agreement has been met or not. That is active research, that is a very very human centric role right now is to work with a lot of AI systems and agentic systems and determine whether a vague outcome is correct or not. Okay. Use use a human system, but it's more on the the business, like the NeuralMesh outcome. Right? Because it's you can respond to a ticket and you can route your complaint or you can actually issue a return, which is two different outcomes. Right? So it's less human than an angel in that context. So is that research going on in that area? Yeah. And I'd say that No. No. Please. ServiceNow in particular has kinda led the charge here, and they've got pretty good workflows that they were very early. I remember being impressed soon after ChatGPT in, like, early twenty twenty three, how effectively they were able to apply some of the early GPT technology towards categorizing tickets, classifying them, and ultimately, they're automatically resolving them if there are simple documentation questions, or escalating them to actual, you know, escalation managers if they are more complex and nuanced, and and that was a a couple years ago. So with with the the the advent of agents and the ability to iterate through this a lot more, a lot more training data was collected. The to Dan's point earlier on, the moats right now are not just around collecting more data. It's this concept that's pretty popular lately around collecting decision traces, like the who, what, when, where, why of a decision was made, not just the final outcome. Building a graph, a context graph of that, and then letting your agents rip on that, that's really state of the art today. You wanna add to that? Just just to add to your question, we're of see we're in this middle ground now on pricing at the application layer where there is a combination often of a base kind of service fee, and then sometimes they kind of almost like out, you know, an upside outcome based fee. The outcome based fees, as Val said, work well in verifiable situations. What we are starting to see is that the enterprises are getting much better at accurately predicting which outcomes actually matter. And so a year ago, it was a complete guess. And from companies I've been speaking to even just this week around this conference, there's a real noticeable shift in everyone now has a measurement. I don't think they know what the quantum of the measurement is that should trigger certain payouts, but they have a better idea of what the measurement actually is. Yeah. I was going to say the the two sort of continuums, and I think you touched on them both, were were productivity. So that's the that's the measure of value, right? So a lot of customers can can, you know, quantify how much savings they had, how much human time was saved, etcetera. So that's an output. And then the other piece that I'm seeing often also is the decision making criteria and how much better are your decisions because of the AI output. It's it's a great question. Yeah. Any other questions in the room? In the back? I've got one more for you, Val. Just going back to I wanted to hear more about the the memory wall because I think a lot of people don't think about capacity and evaluating the underlying infrastructure of the AI to understand where they're going to get the ROI. Can you speak more about that? With regards to the memory wall in particular? More about the memory wall, but also around the output, the ultimate output, the quality of it, you know, not just the speed. They're related. So the easiest way for a builder to think about the memory wall is am I hitting a rate limit? I don't know any builder that's not hitting token rate limits right now. Right? We're always hitting a particular if we're on a subscription plan, or if we're on that API plan, we're hitting a crazy budget rate limit of thousands of dollars a day on token costs. So optimizing around rate limits is definitely one of the things you experience when you're hitting the memory wall, Partnering with GPU providers, inference providers that actually let you get more token max, get more tokens for a particular budget or rate limit, or work contractually more tokens outside of existing rate limits, that's one thing, but how you apply them is really really important right now, because we learned this earlier on. There's this classic triad in AI, which is accuracy, latency, and cost, and you're typically trading off one to get the other two. When you can improve cost and latency, you're making better trade offs, and that's one of the things we emphasize. But for me it's budgets. You've got a power budget at a macro sense, and then you've definitely got latency budgets in your app, And you can do you can deliver quick answers that are inaccurate or or or eighty percent accurate, or it can take more time to actually go through more durations and loops and evals to either verify or or apply, you know, heuristics and figure out what the right answer is. But if you can't do that efficiently, then you're out of time. You're out of your latency budget or you're out of cost budget. So it's really important to to squeeze out the token max all the efficiency out of out of your inference so that you do have the budgets for lots of quality iterations, lots of guardrail safety iterations, and even before that just lots of experiments to ultimately get good fine tunes or good models to to do this all with. Awesome. Any last questions before we wrap? I think to sum it up, it sounds like just be very thoughtful on on the per token use. Still move fast, your procurement teams. And I think about for for you, it's like looking at the infrastructure and making sure that it works for you as a competitive moat down the line. So thank you all so much for joining us here, and we'll see you around Human Next. Thanks so much.

WEKA Chief AI Officer Val Bercovici joins investors and cloud leaders at HumanX to debate AI token costs, procurement strategy, infrastructure moats, and what "return on intelligence" really means.

Speakers:

Hope King - Founder, Macro Talk News
Val Bercovici - Chief AI Officer, WEKA
Dan Lawrence - General Manager, Americas, Nebius
Jonah Surkes - Director, Growth Equity Team, Generation Investment Management

Below is a transcript of the conversation, which has been lightly edited for clarity.

Transcript

00:00

Return on Intelligence: A New Framework for the AI-Native Economy

Hope King: Hi, everyone. It’s great to be with all of you. This is going to be a valuable session on the return on intelligence — or return on investment. We’re going to have a play on words here, but the mission of this session is really to talk about:

How do we pay for all of this? How do we pay for it now, how do we pay for it in the future, and is the word “investment” too limited to discuss purely in financial terms?

There’s obviously a lot more that you get in return for investing in AI today than just your bottom line. We’ll get into all of that.

Before we get started, I want to introduce myself. I’m Hope King. I’m the moderator and the founder of Macro Talk News. I went independent last year, so please subscribe on YouTube and find me on LinkedIn to connect. I’ll have each of my esteemed panelists introduce themselves.

Jonah Surkes: Thank you, Hope. My name is Jonah Surkes. I’m a director on the growth equity team at Generation Investment Management. We are fortunate enough to be an investor in WEKA as well as a number of other companies tackling this challenge. I spend a lot of my time thinking about AI efficiency and infrastructure at a higher level. We were founded by Vice President Al Gore, so I also spend a lot of time thinking about the climate impact of AI.

Val Bercovici: Val Bercovici, Chief AI Officer at WEKA. Previously, I helped co-create the Cloud Native Compute Foundation, bringing Kubernetes out of Google and open-sourcing it. I was also CTO at NetApp in the cloud era following their SolidFire acquisition.

Dan Lawrence: Thank you, Hope. My name is Dan Lawrence. I am the General Manager of Nebius for the Americas. Prior to joining Nebius, I was at Akamai and AWS (Amazon Web Services). Nebius, if you’re not familiar, is an AI cloud — we build massive GPU clusters and capabilities around the world.

01:59

How AI token costs are rising even as unit prices fall

Hope: Let’s start with the big question, which is cost. On the surface right now, the economics seem to be improving — models are getting cheaper, GPUs faster, token costs falling — but many organizations are still finding it hard to keep up with their AI bills. Val, I want to start with you. What is happening under the surface if all of these elements seem to be falling?

Val: There are a bunch of conflicting narratives out there right now. On one hand, the unit cost per token with open-weight models from China and other innovations are definitely dropping quite dramatically. On the other hand, AI itself has fundamentally changed since the ChatGPT Cambrian explosion. We were doing very small chat sessions, then we started adding PDFs, then agent turns — thousands of turns per session vs. one back and forth. Now we’re adding video and multimodal models. So the volume of tokens is exploding at the same time as some reductions in unit costs are happening.

The net effect — and we even saw this at NVIDIA GTC a couple of weeks ago in San Jose — was Jensen (Huang) talking conservatively about budgeting for tokens at the same rate as a full-time engineer’s salary. Let’s say $250,000. Don’t laugh; here in the Bay Area that’s a low number, but in other parts of the world it’s high. On the other end of the spectrum, and I think it is a polar end, is SemiAnalysis boasting that some of their top analysts are at about $15,000 to $20,000 in daily spend on tokens. Annualized, that’s about $5 million. And (SemiAnalysis CEO Dylan Patel) is proudly saying they’re worth it, that the output and the revenue they generate from this investment justifies it. So really interesting data points already.

Hope: I thought AI spend was supposed to replace human cost. Jonah, what’s going on? The demand sounds like it’s overtaking even the unit cost going down.

Jonah: We are in a funny moment where the measure of return is usage for many companies who are still not able to calculate whether that usage is turning into something productive. And when the measure of return is usage, and the measure of how well you’re doing your job is usage, and the measure of how much you might get paid is usage… you’re going to use as much as you can. There’s a human psychological desire to “token max” right now, which is a huge part of it.

The second thing I’d double-click on from what Val said: We’re seeing a huge change in the architecture of what AI is doing for us, and the requirement on token use is therefore far higher. What’s weird is that no one really knows why something should be 100,000 tokens for a task vs. 10,000 or a million. Sometimes there’s no real sense of that. It didn’t matter as much when it was a simple chat session and the difference was 20, 40 or 50 tokens. But now when it’s an AI agent running overnight and you wake up in the morning and it’s used 1.8 million tokens vs. 1,800, there’s a real misunderstanding of how useful that actually is.

Hope: Dan, who knows the answer? You can also look into the future with what you’re building at Nebius.

Dan: Certainly the price of tokens is on everyone’s mind, and as an infrastructure provider we’re trying to make it as palatable as possible. Part of that can happen a number of ways in terms of being horizontally or vertically integrated across the supply chain. If you’re building the AI factory, if you’re breaking ground; if you’re designing the servers, there’s a cost advantage to doing that vs. co-locating a site and paying for hardware elsewhere. Think of all the margin across the full stack. So that’s one way to drive the economics down. The other is automation: Orchestration and spending money to develop software that will actually make things automated, bring the cost down, and make the hardware efficient as well.

06:07

How companies can manage AI spending and tie tokens to business outcomes

Hope: So there’s a lot that’s out of the control of the builders and founders in the room. What’s within their control? You can’t really pull back on demand because you have to move faster than your competitor. How would you advise companies today to manage cost internally when they do have to continue to ship?

Jonah: There are two things we tend to recommend. The first is trying to think about the usefulness of each token you spend and tying it closely to a business outcome rather than just to usage. That business outcome will be very different depending on the task — whether it’s customer support, whether it’s code generation — but trying to tie it really closely. The second, which I’m sure we’ll discuss more, are the architectural decisions that enterprises make around whether to vertically integrate and build their own AI factory, giving them more control, or to work with someone like WEKA to effectively make their storage more efficient. Those are the two vectors we recommend attacking from.

Dan: If you rewind the clock a little bit, when cloud was taking off, companies took advantage of it, scaled, departments got involved, and costs were overrun. From that, people learned, and FinOps was born as a category. Companies hired people to put controls around the financial implications of using cloud. I think similarly from an AI perspective — token cost, AI cost — I fully expect that people inside organizations are going to have responsibility around governance and creating guardrails.

Hope: And they’re doing that now.

Val: Let me give a very specific answer as a follow-up since there are builders in the audience. I love the reference to FinOps. What we need to do now — if we’re going to use Open Claude as a framework — is give your agent an AI FinOps skill. That means: Look at Open Router, look at the category of models you’re interested in, typically coding agent-style models, and look at the pricing. Open Router is really transparent about not just the price per million input and output tokens as they’ve been at the forefront of cache-read pricing. Have your skill optimized for model providers that are giving you a much closer gap between the input price and the cache-read price. As caching technologies mature, there almost shouldn’t need to be a difference between those two prices — they should very much converge. That’s a specific skill you can implement right now.

09:15

Who should own AI strategy inside your organization?

Hope: As Dan was talking about, there’s still a big human problem in executing AI. Not every company has a Chief AI Officer. There’s a lot of debate: who owns the workstream, who owns the strategy? What do you see working well inside the companies that are moving fastest? Is it the CFO, CTO, CIO, CEO?

Val: This moves so fast. I have a cyber background, so the way I apply the Chief AI Officer role — call it CAIO for short — is the CSO model, the four-letter C-level model, where you’re more of an advisor to various other C-levels who actually own the budgets and the workflows. I work very closely with our CIO, very closely with our CTO, and with a whole bunch of folks within WEKA. The job last year was to evangelize the usage of agents. The job this year is very much “token maxing”. It’s a consultative role, but you have to keep adapting because this industry moves at a pace I’ve never seen before.

Hope: I think the real question is who has the veto power? When a big decision needs to be made, who makes that call?

Val: Interestingly enough, I haven’t seen the veto exercised yet. In the spirit of transparency, we are one of the few companies our size that actually has an Anthropic enterprise license. We also have OpenAI, Google Workspace and Gemini, and we use open-weight models as well. So it’s less about a veto and more about experimenting, measuring the results, and doubling down on what’s working.

Jonah: This may be a more controversial take, but we’ve seen in some of our portfolio companies and enterprise partners that the Chief People Officer has a really interesting role to play. As you think about the role of agents in replacing human labor, the decision around human resource allocation actually becomes the central question. When you combine a Chief People Officer with a CTO, and maybe a CISO together in that triangle of power, it’s really interesting what we’ve seen companies get done.

Hope: But in that triangle of power, who’s ultimately writing the checks?

Jonah: What’s concerning right now is that everyone’s writing checks. When something happens negatively, I think the CISO will start to take more power.

Dan: I agree with both of you that having a leader or leaders responsible is a big lever. However, one of the other things I’m seeing in our customer base is a convergence of product management and engineering. For a long time, product managers were responsible for talking to customers and understanding requirements, and engineers were in the back office building and writing code. We’re seeing now that those two personas are converging. The fastest-moving companies with AI are ones where engineers are actually going out and talking to customers, helping define product, because things happen so rapidly with model changes and automated code that it’s a really powerful phenomenon.

13:05

What AI roles and workflows will be obsolete in two years?

Hope: Any other best practices? Val, anything to add?

Val: Just to follow up on what Jonah was saying, the convergence isn’t just at an individual level of product manager and engineering skill sets; it’s happening at a group level. I joke that it’s the revenge of the MBAs, or the fact that middle managers, who were very much considered unnecessary, are almost a critical path right now. You have these ambitious goals that you can set upon an agent swarm, but the skill in making that swarm productive and successful is understanding how to divide and conquer the tasks, how to supervise, how to measure, how to coordinate. That’s the state of the art in terms of running an agent swarm, and that is a middle management skill set.

Hope: That is a very contrarian take. Jonah?

Jonah: I think it’s right — right now. There is a question of whether it’s still right in one or two years. The premise it relies on is that the way things are currently done is the way they should be done once agents are doing them too. I think the reality is that many processes enterprises use will change fundamentally once humans are no longer in control. Trying to get agents to replicate the process that humans have been doing for 20 or 30 years may not actually be the right approach either.

Hope: I’m hearing there’s a lot of friction trying to train agents. That institutional knowledge deep within a senior manager is innate within them. What are we doing today that you think will be obsolete in two years?
Jonah: We’re seeing a forward-deployed engineer explosion right now, which could well be obsolete in two years. There’s effectively a process documentation period we’re going through. I don’t know that it needs to last forever. At some point, there is a self-learning capability of agents.

Val: From a planning perspective, the biggest fallacy you can make is that the state of the art will remain static. Exponential progress means these models are going to be incredibly powerful by the end of this year, and so the agent loops are going to be running on these models — if we even still need to run agent loops — in a way that’s just not the same as it is today. Even what a forward-deployed engineer does today, which is a really critical skill set right now, will either evolve or go away by end of year.

Dan: I think that as we mature with AI — certainly we’re in an augmented workplace now where AI is augmenting and helping productivity — as we automate, that will change. But I think the place where humans go next is verification. Workloads happen, changes happen, and ultimately agents may be verifiers as well, but as humans it’s our responsibility to verify what’s happening.

16:20

What is the AI memory wall and why does it affect your inference costs?

Hope: I want to go back to the technical side of things and start with you, Val. You’ve talked about something called the AI memory wall. For those in the audience who may not be deep in the infrastructure weeds, what do you mean by this, and why does it matter to this topic of return on intelligence?

Val: It’s pretty fundamental. If you talk to experts in infrastructure and inference, they’ll tell you that memory is the bottleneck. What they mean literally is this: Two to three years ago, we had a gentle balance in scientific and GPU computing between floating-point operations per second and memory capacity and bandwidth. AI, ChatGPT and reasoning models kind of broke that balance, and then agents and multimodal agents completely broke it.

So we’re in a situation right now that the stock markets and supply chains reflect: We physically don’t have enough materials in the world to fabricate the memory needed by agents at the pace required. The memory wall is this collision between how we actually get agents to run with the physical capacity we have for memory.

The good news is there are a lot of innovations at the software layer that let you reorient how you use a standard GPU server. For example, taking stranded non-volatile flash storage in that server and wiring it through software so that it literally delivers additional memory. That unlocks a huge range of efficiencies. It gets you, at scale, the capacity of four to five additional data centers from one physical data center.

18:10

How high token spenders are generating outsized ROI

Hope: You brought up a fascinating anecdote on our prep call about analysts spending $15,000 a day in API tokens, annualized to $5 million. Can you elaborate on this case study?

Val: This is SemiAnalysis, very well known. Jensen gave them a big shout-out at his keynote a few weeks ago for the InferenceMax benchmark. One of their other analysts is a data center analyst who analyzes satellite imagery. He goes onsite to locations all over the world and figures out: Is there power? Is there water? Do the local zoning laws and communities support these data centers? He advises large companies on whether a $1 billion CapEx spend is worthwhile.

He’s developed this amazing model — he showed it to me last week in New York — where it’s literally a 3D interactive globe where you can look around and double-click on a particular country and region, then find out exactly what data centers are there, what power they’re using, what kind of processors are in them. You couldn’t have done that individually before as an analyst. With Cloud Code and token maxing at $15,000 a day, he was able to build this amazing model. I get it when Dylan, the founder of SemiAnalysis, says it’s totally worth it because I can see how they can sell that model multiple times for multiple millions of dollars to multiple customers.

Hope: That’s a lot of money. How should we contextualize this for everyone here?

Val: We’ve got to rethink tasks. Not jobs — tasks. Fundamentally rethink the tasks that make up our jobs. In many cases, yes, you can have $15,000 a day of tokens do the work of a team that would have been 10 engineers in the past, and you will deliver something with an agent swarm in days that would have taken that team months. So at a task level, the ROI is undeniable.

Jonah: The common thread is that it’s about the usefulness of the output to justify the cost of the input. There is a ratio there. Through the arc of history, companies have always spent tens of millions of dollars on things they felt were important and would drive high ROI — whether it’s hiring an extremely expensive person who can deliver something, or putting on a conference that costs tens of millions of dollars. It’s not a foreign concept. We just now have a very productive new thing to spend on.

21:20

How AI is forcing procurement teams to move faster

Hope: When you look at procurement — probably one of the least glamorous departments in any company, but very important — Dan, you had a great case study around the speed at which procurement now has to move. What does a typical procurement process look like for larger enterprises right now, and what needs to happen for change management inside procurement?

Dan: There’s an amazing supply-and-demand imbalance in the market right now, which is driving some really interesting behavior. We have capacity that gets soaked up immediately. At the moment we have roughly three, four, or five customers for every one GPU we have for sale — and we’re not alone. Every provider out there has that same phenomenon.

So what happens is: a Global 2000 company has a procurement process that includes legal review and multiple approvals from finance and a business person. They’ve built a very disciplined, risk-averse, financially motivated process. But with the supply-and-demand imbalance continuing and the pace of AI changing, when companies need to procure, they’re going to have to move faster. What we’re seeing is that smaller AI-native companies are capable of doing that — their funders are actually encouraging them to move quickly, be less concerned with risk, and commit to hundreds of millions or even billions of dollars of capacity in days and hours vs. weeks, months, or years.

Hope: It’s a risk question, a tradeoff. You were joking that everyone’s just writing checks these days, so that aligns with procurement teams being collapsed in the process and the timelines.

Jonah: There’s an interesting thing we’ve started to see on the procurement thread: As more mature enterprises have swarms of agents within their tech stack calling on various tools to execute tasks, sometimes those agents come back to the human user and say, “Hold on — I tried calling this tool, I looked into the analytics, and you’ve never used it before. Why are you paying $1 million for it?” The agent is starting to have an opinion on application procurement. That’s enabling procurement to move faster and it raises a fascinating question: How do companies procure anything in a world where humans are no longer actually using the tools being procured?

Val: The prevailing conviction I see among leaders is that the opportunity cost of not investing rapidly is just too high right now. There’s enough conviction within leadership everywhere that the benefits and ROIs are not restricted to edge cases or outliers; they’re very much in the middle of the bell curve. People are clearly voting with their dollars right now.

25:14

Can infrastructure ownership become a competitive moat for AI companies?

Hope: A lot of companies are trying to figure out how to deepen their competitive moat at a time when AI can copy services or companies can take a model and build it inside their own organization. There’s a discussion around whether infrastructure could be a competitive moat if you actually own some of it or have more control. Dan, how have you seen customers approach this?

Dan: Certain customers are trying and experimenting with adding AI factory-type capacity on their own, and those with really deep pocketbooks are somewhat successful. Those that don’t want to do that are struggling, and the reason is the talent that has the know-how is scarce. Just because you were able to build a data center 10 or 15 years ago and horizontally scale it with CPUs, networking and storage… the new paradigm is different. That skill set isn’t very transferable. It’s going to take heavy investment in training and retraining. Because of that, we’re finding customers are willing to partner with companies that have those resources and the people who can actually build modern data centers.

Jonah: The eternal challenging question is the search for moat, especially at the application layer. There was a period about a year ago where a lot of companies were fine-tuning their own models, or in some cases even training their own. Legal AI is a good example. Companies were building their own legal AI models, which have now almost all been deprecated in favor of frontier models that have very high performance at legal tasks. Those companies are now looking to go a level deeper into the stack to find their moat, which is what brings them closer to infrastructure. We’ll see that pattern repeat across domains over time.

The question is partly one of talent. There’s a massive talent gap at the infrastructure layer. Data center technology has moved so fast that it has left many behind. And there’s also the question of pace. You don’t want to be saddled with physical infrastructure that evolves so quickly that you end up putting forward huge upfront capital for something that actually gives you a disadvantage in the long run. But those who can afford it and get the talent are finding significant advantage right now.

Dan: The other moat opportunity is around experience and experiential data that’s been captured. If you’ve been in a retail market for 20 or 30 years and you have all that data, that’s a moat. There’s no compression algorithm for experience, as Andy Jassy from Amazon likes to say.

Val: To get more granular: There’s no such thing as one model and no such thing as one style of inference — whether local, regional, or a big consolidated data center. The big difference I’m seeing compared to cloud, which I got into about 17 or 18 years ago, is that the infrastructure is fundamentally different now. It’s so high-performance, so parallel. One rack of CPUs back in the day was less than 1,000 cores. One rack of GPUs today is over 1.5 million. That skill set is a different universe.

The other thing is planning ahead. The marginal cost of a new cloud user is almost nothing. You’ve pre-compiled the software, it’s another record in a database, very scalable. The reason the SaaS business model is under strain is that people have realized the incremental cost of a new AI user is so different — it’s almost a fully replicated cost for every user, with no marginal cost in the traditional sense. I have a personal take that we’re going to see a lot of mid-market neo-clouds and mid-market SaaS companies have to merge, because if they can’t cost-effectively produce tokens, they just won’t have a gross margin business.

Dan: I’d add that companies have to make huge investments upfront to build out infrastructure, and there’s a lot of change coming. We don’t yet know the useful life for some of this hardware. You’re making big bets with your capital against a lot of unknowns. If optimizing infrastructure isn’t your core business, leave it to the experts.

31:47

AI’s energy footprint and the shared responsibility of sustainable compute

Hope: The last question — and I don’t mean it to be least important just because it’s last — is on the responsibility side. How are you all thinking about the responsibility you have, and that the CEOs you work with have, around such a resource-intensive process? Is there a responsibility here, or do we just move forward and let the next generation deal with it? Jonah, I’ll start with you.

Jonah: It’s a question we think about a lot, and I think every investor and leader should be thinking about the trade-offs they’re making for the progress we’re seeing. We think about it in two ways.

First, as with all transformational technologies — electricity, the wheel, fire — these technologies don’t have values inherently. The value that comes from them is shaped by the people or systems that control what they’re used for. The same will be true of AI. So we’re trying to empower leaders who focus on the highest-impact uses of their tokens, which is why I think about the usefulness-per-token idea.

Second, the energy constraints are real and serious. The demand for gas plants in the United States is roughly 5x in the last couple of years, largely to drive data center demand. That’s a near-term phenomenon. What’s quite interesting is that there is also an enormous increase in demand for renewable power and battery-stored power to power data centers because it’s fundamentally cheaper. The stat I love is that last year, over 80% of new renewable investment came from effectively the AI industry making forward orders for cheaper energy. So we’re seeing a kickstart of lower-cost, lower-impact energy sources, which should hopefully come online in the next few years.

Val: One of the things I love about WEKA is that the efficiency story is completely aligned here. Performance efficiency wasn’t really trendy before AI. Now it’s extremely important and relevant, and it’s completely aligned with economic efficiency: cash flow, CapEx, OpEx, and energy usage. This efficiency enables more tokens within a fixed energy budget and a fixed latency budget, which in turn allows more iterations. So you literally get higher-quality answers and actually safer responses for an agent that has the power to do things on your behalf while you’re sleeping.

Dan: Sustainability is certainly on our mind with every data center we build. We try to use as much renewable energy as possible and do creative things like heating towns. Our data center in Finland actually heats the town next door. Sustainability is a shared responsibility between providers and consumers of services, just like at home: You pay for your electric bill and they have to give you the electricity, but you also have to turn the lights off.

35:37

Audience Q&A: Human-agent collaboration, value verification, and the memory wall

Hope: We have time for Q&A. Go ahead.

Audience Member 1: I’m curious if you could share more details about what the partnerships between a head of people and a head of engineering actually look like.

Jonah: One of our portfolio companies is called Gloat, and they build an internal talent marketplace that helps match people within very large organizations to different pockets of the company. If you’re in product and there’s a project in the finance team where your specific skill set would be relevant, you can go work with them temporarily.

They started working with Fortune 10 and Fortune 50 companies who were initiating one-off collaborative projects. For example: An M&A transaction happens, and suddenly legal, finance, product, and marketing need to work together. There’s a project leader with a three-week window who knows what the tasks are, but doesn’t know which humans in the team can do what, or which AI tools exist across the enterprise that could reduce the human burden.

They’re using a tool now that says, “Tell us what your project is, tell us what tasks you’d break it down into, and we’ll help allocate those to humans and AI agents, showing you which tasks are more applicable for humans and which for agents.” They completed this project in half the expected time, and now it’s become an HR initiative to roll this out across the organization for all projects, centered on human-agent handoff and collaboration.

Audience Member 2: Where are we in terms of verification and attribution of contractible outcomes? The value is being created, but is it the value that meets organizational intent? Is it commercial, is it enforceable?

Val: This is a very active area of research, and some of it is solved. The whole concept of reinforcement learning — particularly the modern version of it — is to train on something that has a verifiable outcome, so you can give the algorithm a verifiable reward or penalty. Source code is a great example of why coding is so popular right now: Programs either run or they don’t; they either produce something that corresponds with a test result or they crash. Anything mathematical, algorithmic or programmatic is turning out to work really well.

When you have open-ended legal or HR questions, it’s very hard because there’s nuance. That’s where active research is going on around evaluations and benchmarks, making a subjective call as to whether a particular service-level agreement has been met. That’s a very human-centric role right now.

Jonah: We’re in a middle ground at the application layer where there is often a combination of a base service fee and sometimes an upside outcome-based fee. What we are starting to see is that enterprises are getting much better at accurately predicting which outcomes actually matter. A year ago it was a complete guess. From companies I’ve been speaking with even just this week, there’s a real noticeable shift: Everyone now has a measurement. I don’t think they know what the quantum of that measurement should be that triggers certain payouts, but they have a better idea of what the actual measurement is.

Dan: The two continuums I’m seeing are productivity, where customers can quantify savings and how much human time was saved, and decision-making quality: How much better are your decisions because of the AI output?
Hope: I wanted to hear more about the memory wall — not just the concept, but around the quality of output, not just the speed.

Val: They’re related. The easiest way for a builder to think about the memory wall is: Am I hitting a rate limit? I don’t know any builder who isn’t hitting token rate limits right now, whether it’s a subscription plan or an API budget of thousands of dollars a day. Optimizing around rate limits is definitely one of the things you experience when you’re hitting the memory wall.

Partnering with GPU and inference providers that let you get more tokens for a particular budget is one piece. But how you apply them is really important. There’s a classic triad in AI: accuracy, latency, and cost. And you’re typically trading off one to get the other two. When you can improve cost and latency simultaneously, you’re making better trade-offs.

For me it comes down to budgets. You’ve got a power budget at the macro level, and you have latency budgets in your app. You can deliver quick answers that are 80% accurate, or you can take more time to go through more iterations and evaluation loops to figure out the right answer. If you can’t do that efficiently, you run out of your latency budget or your cost budget. It’s really important to squeeze out, to token-max all the efficiency from your inference so you have the budget for lots of quality iterations, lots of guardrail and safety iterations, and lots of experiments to ultimately get good models to do all of this with.

Hope: To sum it up: Be very thoughtful on per-token use. Still move fast, including your procurement teams. And think seriously about infrastructure as a potential competitive moat down the line.

Thank you all so much for joining us here, and we’ll see you around HumanX.

Like This Discussion? There’s More!

This conversation took place during HumanX 2026 in San Francisco. Val mentioned the AI memory wall — here’s the data behind how WEKA delivered 4.2x more tokens per GPU without adding hardware.

Read the Article Here

PRODUCTS

DEPLOYMENT OPTIONS

USE CASES

INDUSTRIES

ARCHITECTURES

Learn AI

RESOURCES

TECHNICAL RESOURCES

ABOUT US

JOIN US