PODCAST

Can AI Survive Its Own Energy Appetite? | DeepGeeks Ep. 1

AI infrastructure is scaling at an unprecedented pace — but at what energy cost? In the debut episode of Deep Geeks, host Dr. Serena Huang sits down with Daria Mukhortova, Head of Sustainability at Nebius, and Val Bercovici, Chief AI Officer at WEKA, to unpack the real relationship between AI performance and energy efficiency. From data center heat recovery to token warehousing, this conversation goes deep on the engineering decisions that separate sustainable AI from wasteful AI.

Transcript

Meet the Speakers:

Dr. Serena Huang (Host) — Data and AI strategist with 10+ years of Fortune 100 leadership across GE, Kraft Heinz, and PayPal. Author of The Inclusion Equation (Wiley, 2025) and founder of Data with Serena.

Daria Mukhortova — Head of Sustainability at Nebius, where she embeds efficiency-first principles across Nebius’s AI infrastructure stack — from custom server design to closed-loop cooling systems.

Val Bercovici — Chief AI Officer at WEKA. Former CTO at NetApp/SolidFire with patents in AI agent smart contracts and streaming data integrity. At WEKA, Val drives product strategy around high-performance storage for accelerated compute and AI inference.

0:00

Why Sustainability Is an Engineering Problem, Not an Afterthought

Serena: Dasha, tell us about your work at Nebius and why sustainability sits at the center of an AI infrastructure company.

Dasha: My key task from the very beginning was to ensure that we don’t just figure out what Nebius is and what it builds, but also how. This translates into a few principles that myself and the team agreed on early on. First, we don’t treat sustainability as something that comes after — we treat it as a principle. For us, sustainability is a synonym for efficiency, a synonym for reliability, which makes it understandable as an engineering approach.

Serena: And Val, you’re approaching this from a different angle. As Chief AI Officer at WEKA, what does energy efficiency mean when you think about how AI systems are actually built?

Val: Storage for accelerated compute for AI is very different than storage for traditional computing. It’s a deep technical challenge of performance efficiency, not just capacity efficiency. Performance efficiency requires very sophisticated engineering. You can say that general performance equals revenue, and performance efficiency equals profit. As Dasha was saying, there are really great aligned incentives here. It’s not just altruistic — it aligns with us as a business trying to help our customers extract maximum profit out of these unprecedented capital and operational expenditures.

03:12

How Nebius Turns Data Center Waste Heat Into Community Energy

Serena: Let’s dive into something concrete. Nebius built data center infrastructure near Helsinki, and that wasn’t an accident. Walk us through the thinking — specifically the heat recovery piece, because the idea of recycling waste heat into usable energy flips this whole conversation.

Dasha: Our data center in Finland, about 30 minutes south of Helsinki, is what we internally call our playground for innovations that we then roll out across other sites. What’s interesting about this site is how it’s engineered to be not just a consumer of energy, but a contributor to the energy system. The heat recovery is built into the cooling system cycle.

First, it’s a great economic contribution to the local community. By reusing server heat — which is essentially free — you reduce the cost of producing heat for the municipality. In 2025, households spent 10% less on heating because they were able to leverage that free server heat as a resource. In terms of numbers, we’re recovering around 20–30% of electricity consumption as heat on an annual basis and giving it back to the local energy system. When we think about infrastructure as an interplay of solutions that can be efficient on their own but also connected with the local energy system, that’s the key.

Serena: I’m going to keep that story in my back pocket. I spend a lot of time with AI skeptics, and one of the top reasons people resist AI is energy consumption. You just illustrated a very different approach that can translate into greater good for the whole community.

06:40

What Energy-Efficient Infrastructure Unlocks for AI Customers

Serena: Val, from your vantage point, when customers come to WEKA, what does this energy efficiency actually unlock for them?

Val: If you’re striving for capacity efficiency and performance efficiency in the storage world, it’s a direct impact on your bottom line and CapEx. Since late 2025 and into 2026, we’re seeing the rise of agents. If we go from chat to reasoning, that’s an order of magnitude more energy consumption. From reasoning to agents, it’s another 10x — meaning 100x more intensive compute workloads than for chat.

The performance efficiency benefits get amplified 100x between WEKA and Nebius right now. Every ounce of inefficiency is not just bad for the environment, it’s bad for business, and it restricts agility. You’re seeing leading voices — the CEOs of OpenAI and Anthropic — publicly talk about balancing cash flows to survive as these unprecedented-scale startups. Performance efficiency is a big part of that.

08:37

Tokens Per Watt, PUE, and the Metrics That Actually Matter for AI Efficiency

Serena: What should we be measuring? I’ve heard terms like microwatts per token and PUE — help us understand the benchmarks that matter.

Val: I’ll start with transparency. Token pricing is a very public metric, very competitive. Nebius and WEKA do a lot of work optimizing token pricing. But with regard to this discussion, I’ll let Dasha focus on PUE. We can focus on tokens per watt.

Very often we measure tokens per second, sometimes tokens per GPU, and GPUs and other accelerator types are diversifying. But tokens per watt is a really key metric. Inefficient inference — which is where we are today — is like a logistics system with no warehouses. Amazon isn’t legendary for their factories; they’re legendary for their warehouses and delivery logistics. In the inference world right now, there are no warehouses — just factories with lots of inefficiencies.

Introducing a concept of token warehouses lets you stop wasting factory output and optimize delivery of tokens to users. It shows up directly in tokens per watt. Inefficient implementations can consume as much energy per chat or agent session as an individual household uses in a single day. Efficient token warehousing and efficient memory for tokens reduces that by 80% or more — that’s an active project we have with Nebius right now.

Dasha: PUE — power usage effectiveness — measures how efficiently energy is delivered to IT equipment. However, it doesn’t tell you how efficiently the energy is translated into AI outputs. It’s important to have a broader look at efficiency. What Val was talking about should definitely be part of the picture.

There’s good understanding of PUE in the industry, and people tend to compare providers based on it, but it doesn’t tell the full story. It’s important to also measure how much energy per workload or per token has been used. Another level of depth is differentiating between goodput and throughput — how much useful output is produced per watt of energy entering the system.

12:57

Why Software Optimization Matters More Than Hardware Alone

Dasha: This brings me to how we treat infrastructure. The discussion is largely focused on hardware — chips, servers, cooling systems. But the big question is about software, because software plays an important role in orchestrating all of that hardware and allocating workloads in a way that leverages all available capacity and keeps servers from staying idle. Idling is actually a killer to efficiency. This discussion needs to account for all layers, from how you build a chip to what your software is doing and how it orchestrates workloads.

Val: Goodput literally shows that you can have a very busy system that isn’t producing useful output, or that’s producing at a very slow rate. If you measure throughput or general utilization, you miss this. If you focus on goodput — actual useful output at the performance and efficiency level you need — you want utilization focused on that versus being inefficiently busy. Final tokens per second is more important than general utilization of the infrastructure.

Dasha: This is something Nebius decided to invest in from the very beginning — a software layer on top of the hardware stack. For example, the software layer can track failures in workloads and fix them seamlessly so the workload doesn’t have to retrain. Retraining means drawing twice as many resources.

Autoscaling is critical for matching the bursty consumption that AI workloads are characterized by with the right-sized cluster — ensuring GPUs aren’t overprovisioned and sitting idle. Idling draws significant power, especially with GPUs. Autoscaling at the software layer solves this.

Storage also has to be AI-tailored. There are different types of data — active storage versus cold storage. Training data needs active storage for faster access and fewer bottlenecks, which means less energy drawn. Historical logs are better allocated to storage that consumes near-zero power when idle. Managing and orchestrating workloads at runtime can bring significant efficiency gains on top of what infrastructure provides through cooling savings and reduced power draw.

16:49

How Memory Tiering Reduces GPU Waste in AI Inference

Val: Tiering memory is probably the hottest thing in AI right now for inference, to respond to agent demand. The ability to provide storage capacities with memory performance to offerings like the Nebius Token Factory is one of the most exciting things we’re working on together.

Today, it’s a general best practice that if you want to scale inference, the memory that traditionally comes with GPUs is tightly coupled to those GPUs. As you need more memory for inference — which is fundamentally a memory problem — you have to overprovision GPUs just for the memory they bring along. During inference, those GPUs are largely idle. It’s a waste of capital resources and inefficient energy-wise.

If you decouple compute from memory for the first time in the AI era — GPUs from GPU memory tiers — you can rebalance the system and only provision the memory needed for inference without overprovisioning idle, wasteful GPUs. That’s the magic: software combined with hardware to balance the system, weed out inefficiencies, and yield the goodput we’re all chasing.

18:09

How Much Energy Does AI Actually Consume? From Racks to Gigawatts

Serena: Let’s contextualize the scale, because these numbers are shocking. Val, how much energy does AI actually consume — from training to inference to keeping the lights on?

Val: We’re now in the era of GPUs which, instead of thousands of cores per rack, have millions of cores per rack. A fundamentally different level of parallelization. Energy consumption for a rack of GPUs is hundreds of kilowatts. With the latest generation processors, we’re heading toward a megawatt per rack.

In historical context, that’s almost insane. Many of the big announcements between frontier labs and GPU suppliers like NVIDIA, AMD, and Google aren’t in performance metrics or dollar values anymore — they’re announced in how many gigawatts of capacity they’ve agreed to provide each other. A gigawatt is typically what one nuclear power plant generates. We’re talking about multiple nuclear power plants now dedicated to large-scale AI data centers.

20:48

Common Misconceptions About AI's Environmental Footprint

Serena: Dasha, what are the most common misconceptions you encounter?

Dasha: First, beyond energy consumption, there’s a big focus on water resources. The intuitive reaction is that data centers consume a lot of water, and it can be the case. But the question is: how is the cooling system designed? That’s what the industry should ask first.

At Nebius, we don’t rely on water intake. Even though we’re introducing liquid cooling, the system is closed-loop with no evaporative components. We use outside air through dry coolers to reduce the temperature of fluid that circulates within the same loop through millions of cycles. This design is possible because our servers are specifically engineered to be resilient and high-performing at temperatures up to 45 degrees Celsius, so we need less cooling power.

Second, what gets overlooked is how the model itself is designed. The model’s setup defines, to a big extent, how efficiently it operates on given hardware. Nebius Token Factory recently introduced a post-training optimization tool that tunes the model’s setup so it runs more efficiently on the hosting hardware — meaning post-training completes faster, without failures, and draws less power.

Third, there’s a big discussion around adding energy capacity — building facilities linked to data centers to supply power. The question I would ask is: how much of this new build will actually be translated into useful compute? Or is it adding capacity and then losing it to overheads? That question will become more important than how much capacity a company needs to add.

25:07

Why Efficiency Is the New Competitive Advantage in AI Infrastructure

Val: Systems efficiency of hardware and software together is critical. One transparency metric I hope becomes more prominent throughout 2026 is whether providers are using proper augmented memory technologies. Are you warehousing tokens or dropping them on the floor as you pump them out of your AI factories? It’s proven now that you get 75–90% efficiency gains by balancing systems with appropriate memory versus not doing it.

Dasha: For customers choosing an AI infrastructure provider, my advice is: challenge every provider on how they’re building their stack. Customers are driven by cost efficiency and reliability, but also ask how that’s achieved. In our case at Nebius, affordability is managed through efficiencies achieved throughout the stack, which gives us freedom to set favorable pricing — not arbitrarily, but because we can optimize our costs.

Our servers draw 20% less power than off-the-shelf solutions because we design them in-house. That translates into savings at the cooling system level and the PUE level. Last year in Finland, because of hardware and cooling system efficiency, we avoided 50 gigawatt-hours of electricity use. That saved power could run our Paris site for five to seven months, 24/7.

This will become a growing interest from enterprise clients trying to understand what kind of energy and carbon footprint their workloads carry. As a provider, you should be able to report that to clients — and if you can’t, that’s something to consider.

28:01

The Future of Sustainable AI: Closing Thoughts

Serena: As we close out, what’s one thing you want listeners to remember?

Val: This message is for providers and consumers alike: this is such a competitive industry at the forefront of science, technology, and engineering that market forces will drive efficiency as part of competitive offerings. You won’t be able to compete on token pricing without an efficient system — hardware, software, environmentals. I’m generally optimistic. It’s easy to be a doomer in this industry, but when you look at how you build systems to compete, efficiency is not optional. It was optional in the past for IT systems. Now it’s on the critical path — fundamental and essential for a competitive offering in AI training and especially inference.

Dasha: We all agree that AI brings benefits. It can accelerate drug discovery, disease research, and many other critical fields — which means the use of energy is worth it. What will define the future of AI is how well we engineer it. If we think about this as an ecosystem of decisions made at different levels with different players — from chip providers to infrastructure providers to software builders — and if we all work together, that’s how we achieve sustainable AI that produces maximum value while being mindful of the resources it uses.

Serena: Thanks for listening to Deep Geeks. A huge thank you to Dasha and Val for bringing such depth and honesty to a conversation the whole industry needs right now. If this episode made you think differently about how AI gets built or powered, share it with someone who needs to hear it. Find Deep Geeks on Spotify, YouTube, or wherever you get your podcasts. Until next time.

Related Resources