VIDEO

The Real Cost of AI: How Smart Companies Are Maximizing Token ROI

WEKA Chief AI Officer Val Bercovici joins investors and cloud leaders at HumanX to debate AI token costs, procurement strategy, infrastructure moats, and what "return on intelligence" really means.

Speakers:

  • Hope King - Founder, Macro Talk News
  • Val Bercovici - Chief AI Officer, WEKA
  • Dan Lawrence - General Manager, Americas, Nebius
  • Jonah Surkes - Director, Growth Equity Team, Generation Investment Management

Below is a transcript of the conversation, which has been lightly edited for clarity.

Transcript

00:00

Return on Intelligence: A New Framework for the AI-Native Economy

Hope King: Hi, everyone. It’s great to be with all of you. This is going to be a valuable session on the return on intelligence — or return on investment. We’re going to have a play on words here, but the mission of this session is really to talk about:

How do we pay for all of this? How do we pay for it now, how do we pay for it in the future, and is the word “investment” too limited to discuss purely in financial terms?

There’s obviously a lot more that you get in return for investing in AI today than just your bottom line. We’ll get into all of that.

Before we get started, I want to introduce myself. I’m Hope King. I’m the moderator and the founder of Macro Talk News. I went independent last year, so please subscribe on YouTube and find me on LinkedIn to connect. I’ll have each of my esteemed panelists introduce themselves.

Jonah Surkes: Thank you, Hope. My name is Jonah Surkes. I’m a director on the growth equity team at Generation Investment Management. We are fortunate enough to be an investor in WEKA as well as a number of other companies tackling this challenge. I spend a lot of my time thinking about AI efficiency and infrastructure at a higher level. We were founded by Vice President Al Gore, so I also spend a lot of time thinking about the climate impact of AI.

Val Bercovici: Val Bercovici, Chief AI Officer at WEKA. Previously, I helped co-create the Cloud Native Compute Foundation, bringing Kubernetes out of Google and open-sourcing it. I was also CTO at NetApp in the cloud era following their SolidFire acquisition.

Dan Lawrence: Thank you, Hope. My name is Dan Lawrence. I am the General Manager of Nebius for the Americas. Prior to joining Nebius, I was at Akamai and AWS (Amazon Web Services). Nebius, if you’re not familiar, is an AI cloud — we build massive GPU clusters and capabilities around the world.

01:59

How AI token costs are rising even as unit prices fall

Hope: Let’s start with the big question, which is cost. On the surface right now, the economics seem to be improving — models are getting cheaper, GPUs faster, token costs falling — but many organizations are still finding it hard to keep up with their AI bills. Val, I want to start with you. What is happening under the surface if all of these elements seem to be falling?

Val: There are a bunch of conflicting narratives out there right now. On one hand, the unit cost per token with open-weight models from China and other innovations are definitely dropping quite dramatically. On the other hand, AI itself has fundamentally changed since the ChatGPT Cambrian explosion. We were doing very small chat sessions, then we started adding PDFs, then agent turns — thousands of turns per session vs. one back and forth. Now we’re adding video and multimodal models. So the volume of tokens is exploding at the same time as some reductions in unit costs are happening.

The net effect — and we even saw this at NVIDIA GTC a couple of weeks ago in San Jose — was Jensen (Huang) talking conservatively about budgeting for tokens at the same rate as a full-time engineer’s salary. Let’s say $250,000. Don’t laugh; here in the Bay Area that’s a low number, but in other parts of the world it’s high. On the other end of the spectrum, and I think it is a polar end, is SemiAnalysis boasting that some of their top analysts are at about $15,000 to $20,000 in daily spend on tokens. Annualized, that’s about $5 million. And (SemiAnalysis CEO Dylan Patel) is proudly saying they’re worth it, that the output and the revenue they generate from this investment justifies it. So really interesting data points already.

Hope: I thought AI spend was supposed to replace human cost. Jonah, what’s going on? The demand sounds like it’s overtaking even the unit cost going down.

Jonah: We are in a funny moment where the measure of return is usage for many companies who are still not able to calculate whether that usage is turning into something productive. And when the measure of return is usage, and the measure of how well you’re doing your job is usage, and the measure of how much you might get paid is usage… you’re going to use as much as you can. There’s a human psychological desire to “token max” right now, which is a huge part of it.

The second thing I’d double-click on from what Val said: We’re seeing a huge change in the architecture of what AI is doing for us, and the requirement on token use is therefore far higher. What’s weird is that no one really knows why something should be 100,000 tokens for a task vs. 10,000 or a million. Sometimes there’s no real sense of that. It didn’t matter as much when it was a simple chat session and the difference was 20, 40 or 50 tokens. But now when it’s an AI agent running overnight and you wake up in the morning and it’s used 1.8 million tokens vs. 1,800, there’s a real misunderstanding of how useful that actually is.

Hope: Dan, who knows the answer? You can also look into the future with what you’re building at Nebius.

Dan: Certainly the price of tokens is on everyone’s mind, and as an infrastructure provider we’re trying to make it as palatable as possible. Part of that can happen a number of ways in terms of being horizontally or vertically integrated across the supply chain. If you’re building the AI factory, if you’re breaking ground; if you’re designing the servers, there’s a cost advantage to doing that vs. co-locating a site and paying for hardware elsewhere. Think of all the margin across the full stack. So that’s one way to drive the economics down. The other is automation: Orchestration and spending money to develop software that will actually make things automated, bring the cost down, and make the hardware efficient as well.

06:07

How companies can manage AI spending and tie tokens to business outcomes

Hope: So there’s a lot that’s out of the control of the builders and founders in the room. What’s within their control? You can’t really pull back on demand because you have to move faster than your competitor. How would you advise companies today to manage cost internally when they do have to continue to ship?

Jonah: There are two things we tend to recommend. The first is trying to think about the usefulness of each token you spend and tying it closely to a business outcome rather than just to usage. That business outcome will be very different depending on the task — whether it’s customer support, whether it’s code generation — but trying to tie it really closely. The second, which I’m sure we’ll discuss more, are the architectural decisions that enterprises make around whether to vertically integrate and build their own AI factory, giving them more control, or to work with someone like WEKA to effectively make their storage more efficient. Those are the two vectors we recommend attacking from.

Dan: If you rewind the clock a little bit, when cloud was taking off, companies took advantage of it, scaled, departments got involved, and costs were overrun. From that, people learned, and FinOps was born as a category. Companies hired people to put controls around the financial implications of using cloud. I think similarly from an AI perspective — token cost, AI cost — I fully expect that people inside organizations are going to have responsibility around governance and creating guardrails.

Hope: And they’re doing that now.

Val: Let me give a very specific answer as a follow-up since there are builders in the audience. I love the reference to FinOps. What we need to do now — if we’re going to use Open Claude as a framework — is give your agent an AI FinOps skill. That means: Look at Open Router, look at the category of models you’re interested in, typically coding agent-style models, and look at the pricing. Open Router is really transparent about not just the price per million input and output tokens as they’ve been at the forefront of cache-read pricing. Have your skill optimized for model providers that are giving you a much closer gap between the input price and the cache-read price. As caching technologies mature, there almost shouldn’t need to be a difference between those two prices — they should very much converge. That’s a specific skill you can implement right now.

09:15

Who should own AI strategy inside your organization?

Hope: As Dan was talking about, there’s still a big human problem in executing AI. Not every company has a Chief AI Officer. There’s a lot of debate: who owns the workstream, who owns the strategy? What do you see working well inside the companies that are moving fastest? Is it the CFO, CTO, CIO, CEO?

Val: This moves so fast. I have a cyber background, so the way I apply the Chief AI Officer role — call it CAIO for short — is the CSO model, the four-letter C-level model, where you’re more of an advisor to various other C-levels who actually own the budgets and the workflows. I work very closely with our CIO, very closely with our CTO, and with a whole bunch of folks within WEKA. The job last year was to evangelize the usage of agents. The job this year is very much “token maxing”. It’s a consultative role, but you have to keep adapting because this industry moves at a pace I’ve never seen before.

Hope: I think the real question is who has the veto power? When a big decision needs to be made, who makes that call?

Val: Interestingly enough, I haven’t seen the veto exercised yet. In the spirit of transparency, we are one of the few companies our size that actually has an Anthropic enterprise license. We also have OpenAI, Google Workspace and Gemini, and we use open-weight models as well. So it’s less about a veto and more about experimenting, measuring the results, and doubling down on what’s working.

Jonah: This may be a more controversial take, but we’ve seen in some of our portfolio companies and enterprise partners that the Chief People Officer has a really interesting role to play. As you think about the role of agents in replacing human labor, the decision around human resource allocation actually becomes the central question. When you combine a Chief People Officer with a CTO, and maybe a CISO together in that triangle of power, it’s really interesting what we’ve seen companies get done.

Hope: But in that triangle of power, who’s ultimately writing the checks?

Jonah: What’s concerning right now is that everyone’s writing checks. When something happens negatively, I think the CISO will start to take more power.

Dan: I agree with both of you that having a leader or leaders responsible is a big lever. However, one of the other things I’m seeing in our customer base is a convergence of product management and engineering. For a long time, product managers were responsible for talking to customers and understanding requirements, and engineers were in the back office building and writing code. We’re seeing now that those two personas are converging. The fastest-moving companies with AI are ones where engineers are actually going out and talking to customers, helping define product, because things happen so rapidly with model changes and automated code that it’s a really powerful phenomenon.

13:05

What AI roles and workflows will be obsolete in two years?

Hope: Any other best practices? Val, anything to add?

Val: Just to follow up on what Jonah was saying, the convergence isn’t just at an individual level of product manager and engineering skill sets; it’s happening at a group level. I joke that it’s the revenge of the MBAs, or the fact that middle managers, who were very much considered unnecessary, are almost a critical path right now. You have these ambitious goals that you can set upon an agent swarm, but the skill in making that swarm productive and successful is understanding how to divide and conquer the tasks, how to supervise, how to measure, how to coordinate. That’s the state of the art in terms of running an agent swarm, and that is a middle management skill set.

Hope: That is a very contrarian take. Jonah?

Jonah: I think it’s right — right now. There is a question of whether it’s still right in one or two years. The premise it relies on is that the way things are currently done is the way they should be done once agents are doing them too. I think the reality is that many processes enterprises use will change fundamentally once humans are no longer in control. Trying to get agents to replicate the process that humans have been doing for 20 or 30 years may not actually be the right approach either.

Hope: I’m hearing there’s a lot of friction trying to train agents. That institutional knowledge deep within a senior manager is innate within them. What are we doing today that you think will be obsolete in two years?
Jonah: We’re seeing a forward-deployed engineer explosion right now, which could well be obsolete in two years. There’s effectively a process documentation period we’re going through. I don’t know that it needs to last forever. At some point, there is a self-learning capability of agents.

Val: From a planning perspective, the biggest fallacy you can make is that the state of the art will remain static. Exponential progress means these models are going to be incredibly powerful by the end of this year, and so the agent loops are going to be running on these models — if we even still need to run agent loops — in a way that’s just not the same as it is today. Even what a forward-deployed engineer does today, which is a really critical skill set right now, will either evolve or go away by end of year.

Dan: I think that as we mature with AI — certainly we’re in an augmented workplace now where AI is augmenting and helping productivity — as we automate, that will change. But I think the place where humans go next is verification. Workloads happen, changes happen, and ultimately agents may be verifiers as well, but as humans it’s our responsibility to verify what’s happening.

16:20

What is the AI memory wall and why does it affect your inference costs?

Hope: I want to go back to the technical side of things and start with you, Val. You’ve talked about something called the AI memory wall. For those in the audience who may not be deep in the infrastructure weeds, what do you mean by this, and why does it matter to this topic of return on intelligence?

Val: It’s pretty fundamental. If you talk to experts in infrastructure and inference, they’ll tell you that memory is the bottleneck. What they mean literally is this: Two to three years ago, we had a gentle balance in scientific and GPU computing between floating-point operations per second and memory capacity and bandwidth. AI, ChatGPT and reasoning models kind of broke that balance, and then agents and multimodal agents completely broke it.

So we’re in a situation right now that the stock markets and supply chains reflect: We physically don’t have enough materials in the world to fabricate the memory needed by agents at the pace required. The memory wall is this collision between how we actually get agents to run with the physical capacity we have for memory.

The good news is there are a lot of innovations at the software layer that let you reorient how you use a standard GPU server. For example, taking stranded non-volatile flash storage in that server and wiring it through software so that it literally delivers additional memory. That unlocks a huge range of efficiencies. It gets you, at scale, the capacity of four to five additional data centers from one physical data center.

18:10

How high token spenders are generating outsized ROI

Hope: You brought up a fascinating anecdote on our prep call about analysts spending $15,000 a day in API tokens, annualized to $5 million. Can you elaborate on this case study?

Val: This is SemiAnalysis, very well known. Jensen gave them a big shout-out at his keynote a few weeks ago for the InferenceMax benchmark. One of their other analysts is a data center analyst who analyzes satellite imagery. He goes onsite to locations all over the world and figures out: Is there power? Is there water? Do the local zoning laws and communities support these data centers? He advises large companies on whether a $1 billion CapEx spend is worthwhile.

He’s developed this amazing model — he showed it to me last week in New York — where it’s literally a 3D interactive globe where you can look around and double-click on a particular country and region, then find out exactly what data centers are there, what power they’re using, what kind of processors are in them. You couldn’t have done that individually before as an analyst. With Cloud Code and token maxing at $15,000 a day, he was able to build this amazing model. I get it when Dylan, the founder of SemiAnalysis, says it’s totally worth it because I can see how they can sell that model multiple times for multiple millions of dollars to multiple customers.

Hope: That’s a lot of money. How should we contextualize this for everyone here?

Val: We’ve got to rethink tasks. Not jobs — tasks. Fundamentally rethink the tasks that make up our jobs. In many cases, yes, you can have $15,000 a day of tokens do the work of a team that would have been 10 engineers in the past, and you will deliver something with an agent swarm in days that would have taken that team months. So at a task level, the ROI is undeniable.

Jonah: The common thread is that it’s about the usefulness of the output to justify the cost of the input. There is a ratio there. Through the arc of history, companies have always spent tens of millions of dollars on things they felt were important and would drive high ROI — whether it’s hiring an extremely expensive person who can deliver something, or putting on a conference that costs tens of millions of dollars. It’s not a foreign concept. We just now have a very productive new thing to spend on.

21:20

How AI is forcing procurement teams to move faster

Hope: When you look at procurement — probably one of the least glamorous departments in any company, but very important — Dan, you had a great case study around the speed at which procurement now has to move. What does a typical procurement process look like for larger enterprises right now, and what needs to happen for change management inside procurement?

Dan: There’s an amazing supply-and-demand imbalance in the market right now, which is driving some really interesting behavior. We have capacity that gets soaked up immediately. At the moment we have roughly three, four, or five customers for every one GPU we have for sale — and we’re not alone. Every provider out there has that same phenomenon.

So what happens is: a Global 2000 company has a procurement process that includes legal review and multiple approvals from finance and a business person. They’ve built a very disciplined, risk-averse, financially motivated process. But with the supply-and-demand imbalance continuing and the pace of AI changing, when companies need to procure, they’re going to have to move faster. What we’re seeing is that smaller AI-native companies are capable of doing that — their funders are actually encouraging them to move quickly, be less concerned with risk, and commit to hundreds of millions or even billions of dollars of capacity in days and hours vs. weeks, months, or years.

Hope: It’s a risk question, a tradeoff. You were joking that everyone’s just writing checks these days, so that aligns with procurement teams being collapsed in the process and the timelines.

Jonah: There’s an interesting thing we’ve started to see on the procurement thread: As more mature enterprises have swarms of agents within their tech stack calling on various tools to execute tasks, sometimes those agents come back to the human user and say, “Hold on — I tried calling this tool, I looked into the analytics, and you’ve never used it before. Why are you paying $1 million for it?” The agent is starting to have an opinion on application procurement. That’s enabling procurement to move faster and it raises a fascinating question: How do companies procure anything in a world where humans are no longer actually using the tools being procured?

Val: The prevailing conviction I see among leaders is that the opportunity cost of not investing rapidly is just too high right now. There’s enough conviction within leadership everywhere that the benefits and ROIs are not restricted to edge cases or outliers; they’re very much in the middle of the bell curve. People are clearly voting with their dollars right now.

25:14

Can infrastructure ownership become a competitive moat for AI companies?

Hope: A lot of companies are trying to figure out how to deepen their competitive moat at a time when AI can copy services or companies can take a model and build it inside their own organization. There’s a discussion around whether infrastructure could be a competitive moat if you actually own some of it or have more control. Dan, how have you seen customers approach this?

Dan: Certain customers are trying and experimenting with adding AI factory-type capacity on their own, and those with really deep pocketbooks are somewhat successful. Those that don’t want to do that are struggling, and the reason is the talent that has the know-how is scarce. Just because you were able to build a data center 10 or 15 years ago and horizontally scale it with CPUs, networking and storage… the new paradigm is different. That skill set isn’t very transferable. It’s going to take heavy investment in training and retraining. Because of that, we’re finding customers are willing to partner with companies that have those resources and the people who can actually build modern data centers.

Jonah: The eternal challenging question is the search for moat, especially at the application layer. There was a period about a year ago where a lot of companies were fine-tuning their own models, or in some cases even training their own. Legal AI is a good example. Companies were building their own legal AI models, which have now almost all been deprecated in favor of frontier models that have very high performance at legal tasks. Those companies are now looking to go a level deeper into the stack to find their moat, which is what brings them closer to infrastructure. We’ll see that pattern repeat across domains over time.

The question is partly one of talent. There’s a massive talent gap at the infrastructure layer. Data center technology has moved so fast that it has left many behind. And there’s also the question of pace. You don’t want to be saddled with physical infrastructure that evolves so quickly that you end up putting forward huge upfront capital for something that actually gives you a disadvantage in the long run. But those who can afford it and get the talent are finding significant advantage right now.

Dan: The other moat opportunity is around experience and experiential data that’s been captured. If you’ve been in a retail market for 20 or 30 years and you have all that data, that’s a moat. There’s no compression algorithm for experience, as Andy Jassy from Amazon likes to say.

Val: To get more granular: There’s no such thing as one model and no such thing as one style of inference — whether local, regional, or a big consolidated data center. The big difference I’m seeing compared to cloud, which I got into about 17 or 18 years ago, is that the infrastructure is fundamentally different now. It’s so high-performance, so parallel. One rack of CPUs back in the day was less than 1,000 cores. One rack of GPUs today is over 1.5 million. That skill set is a different universe.

The other thing is planning ahead. The marginal cost of a new cloud user is almost nothing. You’ve pre-compiled the software, it’s another record in a database, very scalable. The reason the SaaS business model is under strain is that people have realized the incremental cost of a new AI user is so different — it’s almost a fully replicated cost for every user, with no marginal cost in the traditional sense. I have a personal take that we’re going to see a lot of mid-market neo-clouds and mid-market SaaS companies have to merge, because if they can’t cost-effectively produce tokens, they just won’t have a gross margin business.

Dan: I’d add that companies have to make huge investments upfront to build out infrastructure, and there’s a lot of change coming. We don’t yet know the useful life for some of this hardware. You’re making big bets with your capital against a lot of unknowns. If optimizing infrastructure isn’t your core business, leave it to the experts.

31:47

AI’s energy footprint and the shared responsibility of sustainable compute

Hope: The last question — and I don’t mean it to be least important just because it’s last — is on the responsibility side. How are you all thinking about the responsibility you have, and that the CEOs you work with have, around such a resource-intensive process? Is there a responsibility here, or do we just move forward and let the next generation deal with it? Jonah, I’ll start with you.

Jonah: It’s a question we think about a lot, and I think every investor and leader should be thinking about the trade-offs they’re making for the progress we’re seeing. We think about it in two ways.

First, as with all transformational technologies — electricity, the wheel, fire — these technologies don’t have values inherently. The value that comes from them is shaped by the people or systems that control what they’re used for. The same will be true of AI. So we’re trying to empower leaders who focus on the highest-impact uses of their tokens, which is why I think about the usefulness-per-token idea.

Second, the energy constraints are real and serious. The demand for gas plants in the United States is roughly 5x in the last couple of years, largely to drive data center demand. That’s a near-term phenomenon. What’s quite interesting is that there is also an enormous increase in demand for renewable power and battery-stored power to power data centers because it’s fundamentally cheaper. The stat I love is that last year, over 80% of new renewable investment came from effectively the AI industry making forward orders for cheaper energy. So we’re seeing a kickstart of lower-cost, lower-impact energy sources, which should hopefully come online in the next few years.

Val: One of the things I love about WEKA is that the efficiency story is completely aligned here. Performance efficiency wasn’t really trendy before AI. Now it’s extremely important and relevant, and it’s completely aligned with economic efficiency: cash flow, CapEx, OpEx, and energy usage. This efficiency enables more tokens within a fixed energy budget and a fixed latency budget, which in turn allows more iterations. So you literally get higher-quality answers and actually safer responses for an agent that has the power to do things on your behalf while you’re sleeping.

Dan: Sustainability is certainly on our mind with every data center we build. We try to use as much renewable energy as possible and do creative things like heating towns. Our data center in Finland actually heats the town next door. Sustainability is a shared responsibility between providers and consumers of services, just like at home: You pay for your electric bill and they have to give you the electricity, but you also have to turn the lights off.

35:37

Audience Q&A: Human-agent collaboration, value verification, and the memory wall

Hope: We have time for Q&A. Go ahead.

Audience Member 1: I’m curious if you could share more details about what the partnerships between a head of people and a head of engineering actually look like.

Jonah: One of our portfolio companies is called Gloat, and they build an internal talent marketplace that helps match people within very large organizations to different pockets of the company. If you’re in product and there’s a project in the finance team where your specific skill set would be relevant, you can go work with them temporarily.

They started working with Fortune 10 and Fortune 50 companies who were initiating one-off collaborative projects. For example: An M&A transaction happens, and suddenly legal, finance, product, and marketing need to work together. There’s a project leader with a three-week window who knows what the tasks are, but doesn’t know which humans in the team can do what, or which AI tools exist across the enterprise that could reduce the human burden.

They’re using a tool now that says, “Tell us what your project is, tell us what tasks you’d break it down into, and we’ll help allocate those to humans and AI agents, showing you which tasks are more applicable for humans and which for agents.” They completed this project in half the expected time, and now it’s become an HR initiative to roll this out across the organization for all projects, centered on human-agent handoff and collaboration.

Audience Member 2: Where are we in terms of verification and attribution of contractible outcomes? The value is being created, but is it the value that meets organizational intent? Is it commercial, is it enforceable?

Val: This is a very active area of research, and some of it is solved. The whole concept of reinforcement learning — particularly the modern version of it — is to train on something that has a verifiable outcome, so you can give the algorithm a verifiable reward or penalty. Source code is a great example of why coding is so popular right now: Programs either run or they don’t; they either produce something that corresponds with a test result or they crash. Anything mathematical, algorithmic or programmatic is turning out to work really well.

When you have open-ended legal or HR questions, it’s very hard because there’s nuance. That’s where active research is going on around evaluations and benchmarks, making a subjective call as to whether a particular service-level agreement has been met. That’s a very human-centric role right now.

Jonah: We’re in a middle ground at the application layer where there is often a combination of a base service fee and sometimes an upside outcome-based fee. What we are starting to see is that enterprises are getting much better at accurately predicting which outcomes actually matter. A year ago it was a complete guess. From companies I’ve been speaking with even just this week, there’s a real noticeable shift: Everyone now has a measurement. I don’t think they know what the quantum of that measurement should be that triggers certain payouts, but they have a better idea of what the actual measurement is.

Dan: The two continuums I’m seeing are productivity, where customers can quantify savings and how much human time was saved, and decision-making quality: How much better are your decisions because of the AI output?
Hope: I wanted to hear more about the memory wall — not just the concept, but around the quality of output, not just the speed.

Val: They’re related. The easiest way for a builder to think about the memory wall is: Am I hitting a rate limit? I don’t know any builder who isn’t hitting token rate limits right now, whether it’s a subscription plan or an API budget of thousands of dollars a day. Optimizing around rate limits is definitely one of the things you experience when you’re hitting the memory wall.

Partnering with GPU and inference providers that let you get more tokens for a particular budget is one piece. But how you apply them is really important. There’s a classic triad in AI: accuracy, latency, and cost. And you’re typically trading off one to get the other two. When you can improve cost and latency simultaneously, you’re making better trade-offs.

For me it comes down to budgets. You’ve got a power budget at the macro level, and you have latency budgets in your app. You can deliver quick answers that are 80% accurate, or you can take more time to go through more iterations and evaluation loops to figure out the right answer. If you can’t do that efficiently, you run out of your latency budget or your cost budget. It’s really important to squeeze out, to token-max all the efficiency from your inference so you have the budget for lots of quality iterations, lots of guardrail and safety iterations, and lots of experiments to ultimately get good models to do all of this with.

Hope: To sum it up: Be very thoughtful on per-token use. Still move fast, including your procurement teams. And think seriously about infrastructure as a potential competitive moat down the line.

Thank you all so much for joining us here, and we’ll see you around HumanX.

Like This Discussion? There’s More!

This conversation took place during HumanX 2026 in San Francisco. Val mentioned the AI memory wall — here’s the data behind how WEKA delivered 4.2x more tokens per GPU without adding hardware.