The Future of Frontier Models And What They Will (And Won’t) Do Next
So, now, we're going to continue this afternoon program with a panel. You alright? Yeah, I am. On stage, but I see a lot of people coming in, some people standing up. So, if you are next to an empty chair, please hop over to your nearest neighbor to the middle so more people can sit because people think it's very scary to just walk in front of everyone and then jump over some other seats to find an empty one. So please facilitate that. And then we have the panel on stage in a minute. There's still some empty space here in the middle, so if you were here in the middle and you're next to an empty seat, please move a little bit so the people in the back can also find a seat. So apparently the right side of the room doesn't really understand English or do you? Like, Move. More people need seats. It's hard to climb over the seats, so please move to the middle. Alright. Before I announce the panel, who's at World Summit AI for the ninth time? Just me? Alright. Who's here for the eighth time? Seventh? Sixth? Five? Four? Three? Ah, two? One. Wow. You know that AI has been around a little while longer than three years. Right? Yeah. Oh, yeah. There's one thing. There are lots of new people in here. You can turn down the volume or up on the side of these headphones. You can also switch to a different channel, in which case you will hear a different room, which could be fun if if you have a bore out, but it's visible to everyone because the color changes. So you see your neighbor becoming green or blue, you know they're not paying attention, but they're listening to a different room. Still more people coming in. So here in the front, there are a couple of empty chairs there in the back. In the middle, there's one still. Alright. Let's get started. The panel, if everyone is mic'd up by now, is going to moderated by Lauren Vaccarello, the CMO of WEKA WEKA. Don't know how to pronounce it, but I'm Dutch, so I will say WEKA. They'll be here on stage to discuss the frontier models, what's next for the most advanced AI systems. And I see a lot of people turning up to hear this story. So that's great. So can I please ask the panel and moderator to come to the stage? There we are. Big round of applause for this panel. Alright. Hello everybody. How is your day going? How's your afternoon? Good? Are we excited for a great panel? Yeah. Today we are going to be talking about frontier models, what's next for the most advanced AI systems. We are thrilled to have an incredibly esteemed panel with us today. Immediately to my left, have Georgia Channing. She is the machine learning for science lead at Hugging Face. She works on enabling scientific discovery with AI and building tools for scientists in an open source community. She's been building in biotech, fusion engineering, and materials discovery with Hugging Face. She has her PhD in computer science from the University of Oxford where she worked on multi agent methods and distribution training. To her left, we have Marzieh Fadaee. She's the head of Cohere Labs where she leads research on fundamental problems in artificial intelligence. Her work spans multilingual language models, data efficient learning, model evaluation and trustworthy AI with a focus on building systems that are robust, inclusive and globally impactful. She holds her PhD from the University of Amsterdam where she conducted foundational research on neural machine translation. And last but not least, we have Val Val Bercovici. Val is the chief AI officer at WEKA where he helps AI builders advance their enterprise and agentic AI research and innovation. He has extensive experience in the infrastructure industry. He's been the CTO at NetApp and SolidFire. He also co created the Cloud Native Compute Foundation, which is the home of Kubernetes. So incredible group of panelists with us today. First up, I wanna ask each of you, what emerging capabilities and frontier models excite you the most? Georgia, do you wanna kick us off? Yeah, sure. So as you mentioned, I have a bit of a science flavor. I'm gonna bring a science flavor to this as well. The long context that we're now able to achieve means that we can consume so much more scientific knowledge than we ever could before. So say you wanna synthesize a molecule, you need to go through hundreds of papers to find the information on how to synthesize that, and that's something that we're only just getting to. It's really cool. Awesome. Yeah. No. I I think actually AI for science is a super interesting area that I'm very excited to also see a lot of more recent works like Hugging Face and also new startups now working on problems that really impact humanity maybe in a more direct way, like carbon capture and for environment. That is definitely something that I look forward to see where the capabilities go with these models. I'm also very curious to see how what capabilities we haven't thought about and what use cases we haven't really looked at so far that we will have in a few years. Like, when we look back at five years ago when we talked about this technology and how what it can achieve, it seemed like some things that we do today was not really an option or not possible. So that in that sort of realm, like, what is still something that we it's just too hard to even think about to work on. So that is very exciting to me. Some alright. And then, Val, what do you think? So for me, it's like the seasons of AI this year have been fascinating. We started the year transitioning from a lot of non reasoning pretrained models to reasoning models in the spring. Then for me, it was like the summer of coding agents and, you know, agents becoming really real, as I mentioned in the keynote a couple hours ago. To me, what's most exciting right now are the advances in reinforcement learning, the real fusion now of training and inference in reinforcement learning loops and episodes. And if you spend time with a lot of Frontier lab people, they're seeing the direct path between reason reinforcement learning and AGI. So we don't know if that's really gonna happen or not. There might be a couple of exits along the way or side roads, but that's exciting to me is there's real material progress towards AGI. Oh, that's that's amazing. And as each of you had said, how advanced we've gotten even in the last year, in the last eighteen months is completely incredible. I love the agentic advancements, what we're doing in science, what we're doing for carbon capture. Are there any other new applications or use cases that you're seeing that are possible today that weren't even possible a year ago? Marzieh, what do you think? I think many math and code capabilities that we see today. The more standardized way of games or math competitions like Olympiads. Like, that that is one area that I think has been a little bit surprising maybe. But there's also just, like, more things that maybe we got used to them very quickly. Like, we we now expect the models to have we we expect to have, like, long conversations, context switch, talking about different topics, completely different areas, and the models are doing quite well there. And this is something that's kind of also related to long context, but also just really building an all around general models that are kind of capable to know essentially what is important from like, that you have to use your long memory, what is you have to retrieve from your short memory. This is something that is very exciting that we can do for the most part now, and I'm hoping this will also unlock new use cases of using this technology. That's great. And then, slightly different question for you, Georgia. As we're thinking about here's all the possibility, what do you think is holding frontier models back right now? That is a great question. I think that I think about it more in the sense of what is holding us back from achievement. Yes. And I think that that's different from what is holding frontier models back because like, as Marzieh was talking about, okay, a lot of the industry is focused on making really general models. But actually, most of the time you don't need models that know a ton about art history and also how to code, particularly for business use cases. Right? You have like really limited tasks that you wanna work on. And I think that making models compact enough to be cost efficient, but also deliver for those specific tasks is what is gonna like sort of hold back advancements with AI. And then from the science perspective, I think that really it's just that we haven't figured out how to format. We have a ton of data, but we haven't figured out how to serve that to models yet. So I think there's actually a lot of different problems going on in different areas, and that's what's holding back, you know, sort of the realization of what we want AGI to be. I I would love to dig into these a little bit more and these ideas of small models versus large models and when you're doing custom models. Before we get there, you said something that I wanna double click on is this idea of cost. We all know running these models is extremely costly from a business perspective, from an end user perspective. And then, Val, I know you work with some of the most innovative AI companies on the planet. What are you seeing as some of their challenges around cost? Yeah. It's very much like the Uber scenario. If you remember taking Uber about ten years ago, it was very cheap. We found out it's highly subsidized. Right? That's why they were able to disrupt a lot of taxi industries everywhere, and it seemed cheaper. Today, we all are afraid of surge pricing. Right? It's like the worst thing when we try and get an Uber. We're kind of in the surge pricing era of of AI inference right now where the more you start to use particular agents, the more tokens they consume and you run out. Like, instead of just even being able to pay for more, we're such a capacity constrained industry that you literally hit your rate limit or or you're throttled, and you you kind of, like, literally pause. You're in the flow and you and and you have to pause artificially because of supply constraints. So there's a real challenge in the industry right now. We see it particularly because of the fact that in order to scale, you've gotta scale very, you know, in a in a very homogeneous way, you know, where if you need more memory, you gotta add more GPUs just to get the memory and underutilized the GPUs. If you need more GPUs, you know, then you're basically tagging along stranded memory for for the ride. We haven't disaggregated the hardware the way, fortunately, we're disaggregating the software from a prefilled decode perspective, which we can get to later. So if I can even summarize some of what you're saying is even if I wanted to have more advancement, even if I wanted to run whether more large or small models, I might reach a point where the the fundamental guts, the infrastructure is just going to block me from being able to work on carbon capture, work on scientific development. Yeah. And I can clear up a misconception about small models too. Everyone thinks small models is automatically cheaper, is automatically, you know, gonna save you money. Not really. You end up using more tokens. You know, the more efficient these small models are, again, Jevan's paradox kicks in, and you end up consuming more tokens because a small model is more efficient. So you're back to the same problem even though these are the problems we want. Right? We're advancing the science. We have more capability, and we gotta figure out how to afford it all. So Completely. And then going to starting to dig into small models and when you'd want a small model versus a large model or a custom model. How should we think about that? And Marzieh, how should I like, are there any other misconceptions with small models? Anything that you're seeing? Yeah. I think we there was a period of time that everyone was racing to scale, like the the scaling law was great. And the bigger the the more number of parameters, the more data we threw at these models, the better they became also. But I think more recently, we are revisiting how how the learning is actually happening because it's very clear scale helps. With setup right, with more number of parameters, you have more capacity to learn different capabilities that you need. But it's not always necessarily required, like, depending on what you want your model, whether you want, like, a specialized model in in an area or even a more general model. Like, we we now see, like, multilingual models that are quite good in a smaller size. So it really boils down to where we do this optimization, whether it's about the the easy part of increasing the the size of everything or, in my opinion, the the harder part of doing optimization on a smaller scale, whether it's about the quality of your data that or actually, like, what data is useful for learning from or not, or the the learning algorithms that you use, the optimization algorithms that you use. These are all have been shown in the last year or two how you can reach the capabilities of models ten times, a hundred times bigger from a year before with much smaller sizes. And it's also like a lot of there's a lot of, like, interesting research questions there and counterintuitive sometimes research questions. One particular project we have worked on recently was on training a language model that that is specialized in code, so programming languages. And the the general idea is that so we we should have, like, really good quality, high quality code data to train these models. And how we define high quality code data is data that code data that passes all unit tests, so, like, code that is right. What we actually saw in that project was if we relax the the passing threshold, so it the the code really didn't have to be perfect. It was fine if it still failed a few unit tests, but that was more useful for the model to learn from. It helped the model generalize to unseen case set. It also allowed harder problem to get in. So when we think about, like, the data space, the optimization that we can do there, there's a lot of still interesting open questions that then you will be able to train a smaller model that can have the capability of a bigger model that has seen a lot of, let's say, just like Internet noisy data or like just yeah. On a larger scale. And I I I do think it's important for the developer community as well because smaller models are also much easier and accessible to everyone to use. Then, Georgia, I know we've talked a little bit not just about large and small, but custom models. So now I use ChatGPT. I used it to plan my trip to Chile. Planned an incredible trip to South America for me. Can I just use ChatGPT to cure cancer or do some scientific development? Great question. I it's a little bit tougher than that and for probably a couple reasons. One thing that I think is actually really interesting is that even in that example where you are gonna plan a trip to Chile, unlike with coding say maybe where there's like a verifiable reward, there's not a clear answer to what is a good trip to Chile, right? And so maybe you looked at it, you've looked at some other sites, right, and so you can evaluate that. But for evaluating cures to cancer, the verification pipeline might be like five hundred million dollars and six hundred, like six months of work. And so already like that is like a really significant jump and is really not in the same class of problems. And then beyond that, like fundamentally most science data is really high dimensional. And so like we all work with transformers almost all of the time, sequence to sequence models. That works really well for text and even for proteins, right, where you also have a sequence of amino acids. But when you're thinking about cancer data, you're often talking about whole slide images. So that's like a probably an image that's a gigabyte by itself for a very small sample of your skin. And every single pixel in that image has a multidimensional embedding of like the genes that are in that cell. Whether or not they're cancerous, other other information like that. How do you tokenize that? And I don't think anybody knows the answer to that question. Oh, that is incredible. And just very different ways to approach problems like that. And where where would you like to see custom model development go in the scientific community? Honestly, I think the barrier there, though I was just highlighting a sort of like technical question, is not a technical one. I think it's mostly a social problem. I think that it has been really difficult for people from the machine learning community and from the domain sciences community to get together and to really collaborate. And a lot of that is because in the machine learning community we're often like, what do we optimize? What's my loss? And when you go talk to a material scientist and they're like, well, I'm interested in this prop, like this property of grafting and it would be cool if we understood this. Like, those two things are not inherently compatible at all. And so, lots of stuff ends up not being done. And so I think actually the fundamental issue that we have right now is a social one there rather than a technical one. There's so much more stuff that we can do with transformers that we have not been able to do for reasons that have nothing to do with the technology. Interesting. And then, Marzieh, this reminds me a little bit of what we talked about with building more open AI and building more diversity in AI, even with incorporating more languages. I know you're doing a lot of work at Cohere on how you bring in multi language support. What do you where do you see the importance of that? How do you see that advancing models? Yeah. I mean, I I was worried a little bit when you mentioned building more open AI. No. Open AI. Open period AI. Yeah. Yeah, so that's a very good question, and I think actually a room like this, actually this panel, like, we we don't really the the the importance of working on multiple languages, I don't think we we need to justify. We all come from different backgrounds, speaking different languages. I think this room everyone speaks at least two languages. And everyone can agree that how when you speak different languages, there are concepts and specific nuances to each language that you might not be able to literally transfer and translate between them. So really, the diversity of each language and how each of them capture the human experience is something that I also believe where this technology started from. So when you look at the history of transformer before then before that attention models, they all were really trying to develop to to use neural networks in in the field of machine translations because it's very challenging, because it's very difficult to capture meaning in a language and also transfer meaning in a language. So with this context that how language and multi-linguality is important, I think reason that we need to work on is definitely there. There are challenges. It's not an easy problem at all. When we think about there are practical challenges when we think about, like, data for many of these languages and how represented this data is or versus how much it's like a translation of existing, like, English data. And then there are the the societal challenges of actually how humans would interact with these multilingual modelss in their own languages, maybe from the safety side, from bias side, how it might be different requirements and different sensitivities in different languages. So there's a lot of complexity there. And at the end of the day, it is also kind of a multi objective optimization problem. So there are, like, multiple things that you want to learn at the same time. And when you boil it down to that, I think it's more at least, like, for the ML crowd, it's, like, more clear how to address and how to fight for that and, like, figure that out. So, yeah, we also a a big part of what we have done in our research at Cohere Lab has been on multilingual models. And I would say even more importantly, multilingual evaluation. That is something that I I think another thing that might be, like, to your question about what is holding us back, holding this technology back, good evaluation. Like, that is something that we definitely need. Yeah. How do you think the role of good evaluation fits in this? I think the thing is you have no idea if you're making progress unless you have good evals. And evals are kind of unsexy, so not that many people wanna work on them, but they're actually the building blocks of all progress. There's a great site for people who wanna check it out called Artificial Analysis that has some really fun evals including IQ points versus cost Yeah. For different models that are out there. But, yeah, there is no progress without good evals. So I I think you bring up a really good point there of like IQ points versus cost and how we are measuring the efficacy or success of models today. I don't think always makes the most amount of sense that you're judging on, well, this is how many tokens and what is the quality of it. I I think there needs to be a fundamental rethink and evolution of how we measure the success of this. Yeah. Unfortunately, I'm not an eval's expert. Maybe you're the person to talk to. I mean, what we actually do see nowadays is there's kind of a disconnect between the benchmarks that we have and the the vibe checking of the model, the, like, the real world capabilities of of the models. And the disconnect actually partially comes from how the the evals, they become saturated or contaminated very quickly. When we we have been working on creating benchmarks in our research lab, but more recently, I feel like that might not be the way to go because also very quickly, everyone can train their models on that benchmark intentionally or unintentionally. And each benchmark might capture one specific specialty and expertise, but really, like, this overall sense of, like, what model we think is doing better, like, what it feels better, that is not really captured here. And that is a big question. I think just rethinking the framework of evaluation, how we are is it just like, should it be just one score or like a leaderboard or how like, what are other ways that now that this technology is so good that it's catching up with every benchmark right away, what are other ways that we can test it out? I I couldn't agree with you more. In some ways, I look at this as we don't even know how much some of this costs, and how can you understand is it good or bad without even understanding costs? And how would you run a business without knowing this is what success looks like, and this is what my customer needs, and this is the fundamental costs. And we are in not even day one of AI if we think about this. Val, I know you work with so many businesses right now thinking about the success and efficacy of AI. What are what are you seeing? How do you think we should be looking at the success and evaluating the frontier models that we're using? Yeah. It's it's it's such an interesting topic because I'm a huge fan of artificial analysis as well. They're they're finally starting to benchmark multi turn, longer context conversations, but even their data from the past year, eighteen months is outstanding in terms of not just the quality and the IQ, you know, portion of the models, but they've added in cost. They measure the amount of reasoning tokens each model uses if it's a reasoning model to generate some of the benchmark results. As we all know, a lot of the traditional benchmarks we relied on are really saturated right now. So I'm also a big fan of, like, the ARC Prize and the ARC-AGI benchmarks. ARC-AGI-1 was cool because it proved you could literally reason out of distribution outside of the training dataset, but at pretty enormous costs. People were guessing, you know, was it sixteen hours to complete some of these? Was it over a million dollars to actually complete our you know, or saturate ARC-AGI-1? ARC-AGI-2 is largely unsaturated right now, so that's interesting. And because they're factoring inefficiency now, it's a core part of the result. It's not just, you know, a detail. And I'm honestly, I have to go back and look at what ARC-AGI-3 is all about specifically, but that's a trend, you know, to the point of it's okay to have it's important to have quality and intelligence, but if it's not practical, it's just not gonna get used. Yeah. So some of the techniques that are coming out now, if we have time to discuss them, I think are interesting in terms of NVIDIA leading the way and saying, you know what? Yes. Our general purpose GPUs are amazing at training. They're surprisingly good at inference. But inference now at scale is so important that they preannounced this Rubin CPX processor, which is just for the prefill part of inference, showing you that you really do need to disaggregate prefill and decode. You do need to really optimize your infrastructure to be able to afford all of the, you know, ambitious aspirational goals we have for for AI at scale. Yep. And I I think a lot about this idea of commercially viable AI. And to the point that you made earlier about the early days of Uber were, in many ways, subsidized, and this is the advent of where AI is going to go. I think about how years from now, months from now maybe, by probably years from now, this will become more commercially viable. This is really for everybody is what fundamental trade offs are we navigating through to get this idea of commercially viable, but also viable for society? What are the roles and what are the trade offs we have to navigate from model size, efficiency, cost, capacity, success? Yes. A simple one because I I had the pleasure of actually doing a keynote a few months ago with one of the lead analysts from SemiAnalysis, Dan Nishball. And he has a nice way. He's written some of the really classical reports on SemiAnalysis.com if you haven't read them. Eighty percent of them are free, so they're definitely worth a read. And he wrote one of the earlier reports that said there's a classic trade off between throughput and latency for users. And he uses a brilliant analogy of you can have a private jet to get a few people somewhere really, really fast. But if you put people on a bus, it's way more efficient, but that bus is gonna get to the destination way, way slower. And we still struggle with that right now in terms of really being able to have really, really low latency, but for an affordable broad community of users, and ideally in the same batch and so forth inference. So that's one of the fundamental trade offs we still make. Yeah. I would add to that just trying things out to understand them a little bit better first. And like a particular example there is how now reasoning models are super popular and very impressive, But also, there have been more recently research on how much of the reasoning trace is actually useful, or does the model actually need to to get to the answer? There have been papers showing just, like, essentially, like, randomly removing, like, the bottom half of the reasoning trace, like, just removing half of that. The model still would do great. How much of it is for the benefit of the human who is looking at them to interpret the steps and how the model got to the answer? And how much of it is actually needed to to get to the answer. And this is the the trade up of costs. Right? Like, these reasoning traces are just long inference time spent that we are doing. So my one thing that I think is important is that with every new way that we find that we can use these models even a little bit better, we should also do that in a systematic scientific way. So really study what is it about this particular way that now we are using this model or now we are training these models that is helping it to do that, and how we can maybe do that more efficiently, how we can do that more effectively also. So playing with all of these trade offs, that's something that I'm hoping we are doing that, but we should do a little bit better. Yeah. I thought that was super interesting. I think also in the reasoning tracing, part of the reason people were motivated to do that, and you can like feel free to disagree here, but is so that you could do, particularly in math and coding, sort of like step wise corrections, right? But, and that makes a lot of sense when you're training a model, but it's not clear that you need that at inference. And so then like, could you have it in training that would cut it at inference? Yes. Yeah. Exactly. Yeah. And then I think from a like business perspective, I think it's important for people to think about when they need AI. There have also been papers about how AI can leads of, lead to lots of work inefficiency. Yep. And I think that it's worthwhile to think before you go and ChatGPT and ask it to do the, you know, the presentation that you don't wanna make, whether or not that's a good thing to have AI doing for you. That would also benefit the environment. I think there's also a place for small models here, particularly if you have an agent in your email. Sort of banal responses to like very common work interactions do not need GPT four or five. You could probably have Quinn like half a billion parameters do that for you just as well. Yeah, and I think that it's gonna, as prices rise for models that are actually really expensive to run, it's gonna become much more about matching the correct model to the correct task. I I think that's such a great point. And it is both the cost the cost trade off, the output trade off. And then you mentioned the the energy side of this and how power hungry in so many ways AI is. And then sort of back to Val, what are you seeing in terms of power consumption that's that's happening? Are there ways that we can make AI when I think not just commercially viable, but also sustainable? Yeah. It's it's crazy when you actually do the math. Just putting in, you know, two PDFs into a a medium sized prompt now, whether it's a ChatGPT session or whether it's just a first turn of hundreds of turns in an agent session, just the prefill for doing that consumes the entire usage of a a household for a day. So, like, about, you know, more than more than twenty kilowatts to just start, you know, a chat session or to certainly start an agent session. And the fact that, again, we run out of memory, I spoke about the memory wall earlier, so quickly in a in a parallel concurrent, you know, multi sub agent task environment, run out of memory so quickly and we're re prefilling again after minutes. And, you know, every few minutes per agent subtasks are pretty much with all the subtasks in parallel all the time. That GPU is redundantly re prefill re prefilling the early part of context over and over again. We're we're running these AI factories as if it was before the Model T moment, before we have assembly lines. And we have to get way more efficient in our token pipelines just so we're not wasting energy unnecessarily before we actually consume it for productive valuable things. So and Georgia, how are you thinking about the energy consumption challenge? Yeah. I think at Hugging Face, we really focus on small models. It's come up a bunch of times in this discussion. That's where we really invest and put our energy. And part of that is also to like enable a much broader community to use AI. I think otherwise, you know, hopefully with AI for science, we discover great methods for carbon capture. And maybe we should also use AI less, which is not a very good tagline for the World AI Summit, but yeah. But it is to what you we said earlier is use AI smartly. And and once you know that, Val, to your point, uploading a couple of PDFs is the equivalent of, you know, the energy to run your house, do I need to ask ChatGPT how to get to the World AI Summit? Or can I just do a super quick Google search? Or actually interact with a person and ask them how to do it? Crazy thought. Right? Weird. It's very weird. I know this is about the next step for the most advanced AI systems, but also human interaction might be a good thing from time to time. Thinking about human interaction, Marzieh, I'd love to talk with you a little bit about open source and why we should look at open source AI models. Yeah. I mean, this is this technology has been built on open research. What open source helps is everyone building on top of each other, everyone learning from each other's mistakes and success and adding to that. And also, it's a really great way transparency when you share your the details of your work, whether it's like your models or like you're publishing papers on your methods. Transparency also can help reproduce, replicate, and also check, make sure there are nothing wrong in in in the sense that you don't really when when something is open, it's less likely so that it's for the benefit of of a small group in terms of, like, some of the details of how it's designed. And at the end of the day, it just would help advance this technology, in my opinion. And I think it's Hugging Face is a great example of really advocating for open datasets, open models, and we have partnered with them a lot over the last couple of years. We have released also our models, the AIA models and the command models on Hugging Face, open weights, and we have seen how that actually helps really people picking that up and building their own something that we we wouldn't also predict even, like, what would be a particular use case or follow-up for a particular project. Oh, that's that's great. And I I like a lot of what you are talking about and doing at Cohere where it is almost AI for good of this is why we have multilingual support. This is why we lean into open source because truly democratizing AI is not meant for sort of the haves and have nots. And how can you make this more accessible for everyone so that we can do more more good in the world? Yeah. I think that's the, in my mind, an ideal scenario if we end up actually improving everyone's life with this technology. And there has been a few times in in the human history that has happened. I think, like, Internet is a good example of that. It really elevated connection globally and just, like, a lot of positive things that came out of that and how with this particular technology we can make it accessible for everyone and improving some parts of their lives or their works without also hopefully destroying the planet. And then Val, what advice do you have for the people in the room who are looking at how do we not just make AI better for the planet, but how can we make it commercially viable? How can we make cost effective, less power consumptive? Are there any things we should really be thinking about? Yeah. Actually, building on the open source theme, there's a lot we can learn. I mean, I can't imagine this industry without open source personally. And just to be pedantic, you know, there's a lot of controversy amongst the people in the open source AI community that most of the models are not fully open. The the weights are open and, you know, some of the recipes are open, but a lot of the datasets aren't necessarily open. So we need to do a much better job of encouraging, creating financial and social incentives to share the data as well as we talked about in certain fields particularly. But DeepSeek being an example, DeepSeek, I think, actually educated some of the big commercial labs on techniques like KV cache offloading early on and having, you know, discounted put tokens and their open infra index on GitHub. You know, they publish so so prolifically on GitHub. They've contributed so much. And one of the ways I think the audience can benefit is engaging, even if you just consume a lot of the great papers that are published that go with an open model. So you can actually look at the theory, look at the practice, play with it yourself on some of your own, you know, local PCs with a a llama type server. Or if you really wanna contribute to what's become probably the most popular community in in in the open source AI world, which is under the Linux Foundation, the VLLM community. It's a very popular inference server, and I see a lot of innovation happening around, you know, just around reinforcement learning there, around certainly inference at scale, around improving training. Some of the latest things they just published the other week around actually being able to modify the weights, you know, and do continual learning and so forth. So that's one of the greatest tips right now is even the closed labs publish a lot, but the open source model labs are by far the best place to learn. And not just a science, learn the engineering and the application of AI. And then we are unfortunately heading towards a close for a panel, but we have more questions. I love the forward looking. We all get to be pundits, decide what we think the future will look like, and also have no accountability on if we are correct or incorrect. But you are honestly some of the brightest minds in AI right now, so I genuinely think you will have so much insight for the people here. So frontier models will change what's possible. It's gonna change what's possible. It will reshape economies. It's currently right now reshaping all of our lives. It's reshaping industries. It's reshaping knowledge work. What do you think the future of AI is gonna look like in one year, three years? Oh, who should we have go first? I'll start because it's kind of a fun, slightly controversial take. I remember, you know, I'm old enough to remember when the Internet first started, you know, around Y2K or even before that. And it was a big thing in Silicon Valley to go and pitch VCs and say, oh, I'm an Internet startup. Right? And it meant something twenty, twenty five years ago. If you describe your company as an Internet company today, it's meaningless. Like, what does that even mean? I think at the pace of progress and acceleration in AI, within three years, I predict, if you say I'm an AI company or an AI startup, it'll be meaningless. Like, what do you mean? Are you applying it for health care? Are you applying it for security? You know, for for multilingual use cases? Are you applying it for science or just for entertainment or social media? You're gonna have to just basically assume AI is everywhere, that it will be some level of affordable or at least there'll be market pricing that'll let you choose the right models and the right infrastructure properly, and that you're actually solving real problems, whether they're business problems, health care problems, maybe even geopolitical problems. But it's it's gonna be just the ubiquity of AI, I think, will happen faster than we can predict today. Awesome. Yeah. I agree. I don't think that's controversial. Do you think that's it? No. I know. I think inside the industry, is. We think there'll be growth and and fascination forever. Yeah. For me, I I think next year is easier to predict. I I am very excited about coordination and collaboration. I think we we are now past the the first stage of developing models that are really good, and now with multi agent scenarios and how these models can coordinate and collaborate together to solve even more complex problems. That is something that we have seen happening now, and I think there's a lot of interesting questions there and interesting problems to to study for the next year or two. I would like to think that maybe in, five years or, like, longer in the future, we will have this technology in places that we never thought it was gonna be there, like, in a in a more positive way, like, some things that are just so out of reach right now because that's usually also how these, like, innovations happen. Like, these things that we can we we can build incrementally over the years, but at some point, you also have to, to your point of, like, explore without any objective a little bit, to to not just have this one objective that you want to optimize for. And I like to think that in in a few years, that exploration will land us in a place that we cannot really think of where it is right now. So, about you. I think it was a great vision for the future. There's so much to speak. A hundred percent. But also everything that exists beyond our imagination. I think that something we haven't talked about that much and a little bit more concrete than what you guys were talking about is something specific I'm looking forward to is the hopefully just decrease in manual labor that people will be doing. And I think that that is AI, it's also robotics. And I think that huge headway is being made in that, but so far actually, I think a lot of the sort of like AI reasoning community and AI for x task has been pretty different from the AI for robotics community, which has been, maybe you guys have seen the videos, very focused on folding clothes. Very difficult task. Absolutely no hate. But I'm really optimistic that say in three years, I don't know, something crazy would be, you know, fifty percent less manual labor will be done by humans in America, you know. Oh. Something like that. Oh, I I love that in all the answers. And I think about the work each of your companies are doing and how it is building for the future, whether the work you're doing on the science side, there very well could be a world where we are curing cancer at an accelerated rate because of AI, which is absolutely incredible. And then I believe the work that, Marzieh, you're doing with Cohere and bringing in more diversity from multiple languages is not only going to make our AI smarter and it's going to learn from having different perspectives. Can it build us a better can it build us a better world? And Val, the work you're doing at WEKA to, you know, increase token throughput. I know you've recently done a benchmark where you've more than more than four x token throughput on the exact same infrastructure. That's going to improve our costs and our energy consumption. If we can get this cost effective and less power hungry, we can build the viability to actually make the world a more diverse and inclusive place to cure cancer, to give us all this better future. And then it really becomes who knows what we're going to do when our lives involve more time for creativity, more time for thought, and hopefully less less backaches along the way. Yeah. Quick rapid fire final question. Okay. Rapid fire, less than thirty seconds. It's gonna be my hard question none of you know. What's the end game for AI? Like, what does this look like? I mean, hopefully a world where people are happy. Yeah. I think it's really easy to get lost in the idea of what progress is without any clear goal. So bring back values is my is my takeaway. Yeah. I like that. I I think, like, improving human life. Yeah. I like that. I think, you know, Georgia Georgia's point earlier on, the the end game is something we're not imagining yet. We're not we're not seeing yet. Because, again, going back twenty, twenty five years ago, if you were to say you're gonna get into a stranger's car, just let this stranger drive you somewhere, or if you're gonna, you know, rent your couch to a stranger, basically, you know, in your own house and share your bathroom, you would have just been considered completely crazy. And that's what the Internet really enabled, is these really creative use cases. We haven't seen this kind of creativity really yet. It's a skeuomorphic thing. That's an old Apple term of just doing something old but better. It's about doing brand new things that you haven't imagined yet. That's the end game. I love it. Thank you so much for joining our panel. Thank you everyone for being here. Please thank our panelists. Thank you. Thank you. Yeah. Thank you very much, Lauren and panel.
Speakers:
- Lauren Vaccallero, Chief Marketing Officer at WEKA
- Georgia Channing, AI for Science Team Lead at Hugging Face
- Marzieh Fadaee, Head of Cohere Labs
- Val Bercovici, Chief AI Officer at WEKA
Below is a transcript of the conversation, which has been lightly edited for clarity.
Transcript
Setting the Stage: What’s Next for Frontier Models
Lauren Vaccarello: Today we are going to be talking about frontier models: What’s next for the most advanced AI systems. We are thrilled to have an incredibly esteemed panel with us today.
Immediately to my left is Georgia Channing. She is the machine learning for science lead at Hugging Face. She works on enabling scientific discovery with AI and building tools for scientists in an open-source community. She’s been building in biotech, fusion engineering and materials discovery with Hugging Face. She has her Ph.D. in computer science from the University of Oxford, where she worked on multi-agent methods and distributed training.
To her left, we have Marzieh Fadaee. She’s the head of Cohere Labs, where she leads research on fundamental problems in artificial intelligence. Her work spans multilingual language models, data-efficient learning, model evaluation, and trustworthy AI, with a focus on building systems that are robust, inclusive, and globally impactful. She holds her Ph.D. from the University of Amsterdam, where she conducted foundational research on neural machine translation.
And last but not least, we have Val Bercovici. Val is the chief AI officer at WEKA, where he helps AI builders advance their enterprise and agentic AI research and innovation. He has extensive experience in the infrastructure industry. He’s been the CTO at NetApp and SolidFire. He also co-created the Cloud Native Compute Foundation, which is the home of Kubernetes. Incredible group of panelists with us today.
What Emerging AI Capabilities Are Most Exciting Right Now?
Lauren: First up, I want to ask each of you: What emerging capabilities in frontier models excite you the most? Georgia, do you want to kick us off?
Georgia Channing: Sure. As you mentioned, I have a bit of a science flavor, and I’m going to bring that here as well. The long context that we’re now able to achieve means we can consume so much more scientific knowledge than we ever could before. Say you want to synthesize a molecule. You need to go through hundreds of papers to find the information on how to synthesize it, and that’s something we’re only just getting to. It’s really cool.
Marzieh Fadaee: I think AI for science is a super interesting area that I’m very excited to see. A lot of more recent works — like Hugging Face and also new startups — are now working on problems that really impact humanity in a more direct way, like carbon capture and the environment. That is definitely something I look forward to seeing, where the capabilities go with these models. I’m also very curious to see what capabilities we haven’t thought about and what use cases we haven’t really looked at so far that we will have in a few years. When we look back at five years ago and what this technology could achieve, some things we do today were not really an option or not possible. So that realm — what is still something that’s just too hard to even think about — that is very exciting to me.
Val Bercovici: For me, the seasons of AI this year have been fascinating. We started the year transitioning from a lot of non-reasoning pretrained models to reasoning models in the spring. Then it was like the summer of coding agents, with agents becoming really real, as I mentioned in the keynote a couple hours ago. What’s most exciting to me right now are the advances in reinforcement learning, the real fusion of training and inference in reinforcement learning loops and episodes. If you spend time with a lot of frontier lab people, they’re seeing the direct path between reinforcement learning and AGI. We don’t know if that’s really going to happen, and there might be a couple of exits along the way, but there’s real material progress toward AGI.
New AI Use Cases That Weren't Possible a Year Ago
Lauren: As each of you has said, how advanced we’ve gotten even in the last year, in the last 18 months, is completely incredible. Are there any other new applications or use cases you’re seeing that are possible today that weren’t even possible a year ago? Marzieh, what do you think?
Marzieh: I think many math and code capabilities we see today — the more standardized ways of evaluating through games or math competitions like Olympiads — that is one area I think has been a little bit surprising. But there’s also just more things that maybe we got used to very quickly. We now expect the models to have long conversations, context switch, talk about completely different topics and areas, and the models are doing quite well there. This is related to long context, but also just really building all-around general models that are capable of knowing what is important — what you have to use from long memory, what you have to retrieve from short memory. This is something that is very exciting that we can do for the most part now, and I’m hoping this will also unlock new use cases for this technology.
What Is Holding Frontier AI Models Back?
Lauren: Slightly different question for you, Georgia. As we’re thinking about all this possibility, what do you think is holding frontier models back right now?
Georgia: I think about it more in the sense of what is holding us back from achievement. And I think that’s different from what is holding frontier models back. As Marzieh was talking about, a lot of the industry is focused on making really general models. But actually, most of the time you don’t need models that know a ton about art history and also how to code, particularly for business use cases. You have really limited tasks that you want to work on. I think making models compact enough to be cost-efficient, but also able to deliver for those specific tasks, is what is going to hold back advancements in AI. And then from the science perspective, I think it’s really just that we haven’t figured out how to format the data. We have a ton of data, but we haven’t figured out how to serve that to models yet. There’s actually a lot of different problems going on in different areas, and that’s what’s holding back the realization of what we want AGI (agentic AI) to be.
The Real Cost of Running AI: Inference, Tokens, and Infrastructure
Lauren: You said something I want to double-click on, this idea of cost. We all know running these models is extremely costly from a business perspective and from an end-user perspective. Val, I know you work with some of the most innovative AI companies on the planet. What are you seeing as some of their challenges around cost?
Val: It’s very much like the Uber scenario. If you remember taking Uber about 10 years ago, it was very cheap. We found out it was highly subsidized. That’s why they were able to disrupt a lot of taxi industries everywhere. Today, we’re all afraid of surge pricing. We’re kind of in the surge pricing era of AI inference right now, where the more you start to use particular agents, the more tokens they consume and you run out. Instead of just being able to pay for more, we’re such a capacity-constrained industry that you literally hit your rate limit or you’re throttled and you have to pause artificially because of supply constraints.
We see it particularly because in order to scale, you’ve got to scale in a very homogeneous way — where if you need more memory, you’ve got to add more GPUs just to get the memory and underutilize the GPUs. If you need more GPUs, you’re basically tagging along stranded memory for the ride. We haven’t disaggregated the hardware the way we’re, fortunately, disaggregating the software from a prefill-decode perspective.
Lauren: So even if I wanted more advancement, even if I wanted to run more large or small models, I might reach a point where the fundamental infrastructure is just going to block me from being able to work on carbon capture or scientific development?
Val: And I can clear up a misconception about small models too. Everyone thinks small models are automatically cheaper, automatically going to save you money. Not really. You end up using more tokens. The more efficient these small models are, Jevons’ paradox kicks in, and you end up consuming more tokens because a small model is more efficient. So you’re back to the same problem even though these are the problems we want. We’re advancing the science, we have more capability, and we’ve got to figure out how to afford it all.
Small Models vs. Large Models: Misconceptions and Trade-Offs
Lauren: Going to small models, when would you want a small model vs. a large model or a custom model? Marzieh, are there any other misconceptions with small models?
Marzieh: There was a period of time that everyone was racing to scale. The scaling law was great, and the more parameters and the more data we threw at these models, the better they became. But more recently, we are revisiting how the learning is actually happening, because it’s very clear scale helps. With the right setup and more parameters, you have more capacity to learn different capabilities. But it’s not always necessarily required, depending on whether you want a specialized model in an area or even a more general model. We now see multilingual models that are quite good in a smaller size.
So it really boils down to where we do this optimization, whether it’s about the easy part of increasing the size of everything, or, in my opinion, the harder part of doing optimization on a smaller scale, whether it’s about the quality of your data, what data is useful for learning from, or the learning and optimization algorithms you use. These have all been shown in the last year or two: How you can reach the capabilities of models 10 times, 100 times bigger from a year before, with much smaller sizes.
One particular project we worked on recently was training a language model specialized in code. The general idea is that we should have really high-quality code data to train these models — code that passes all unit tests, so code that is right. What we actually saw was that if we relaxed the passing threshold — so the code really didn’t have to be perfect, it was fine if it still failed a few unit tests — that was actually more useful for the model to learn from. It helped the model generalize to unseen cases and also allowed harder problems to get in. So there are a lot of still-interesting open questions in the data space where you’ll be able to train a smaller model that can have the capability of a bigger model that has seen a lot of noisy internet data at a larger scale.
I also think it’s important for the developer community, because smaller models are much easier and more accessible for everyone to use.
Can You Use ChatGPT to Cure Cancer? Custom Models for Scientific Discovery
Lauren: Georgia, I know we’ve talked about not just large and small models, but custom models. I use ChatGPT. I used it to plan my trip to Chile, and it planned an incredible trip to South America for me. Can I just use ChatGPT to cure cancer or do some scientific development?
Georgia: It’s a little bit tougher than that, for a couple of reasons. One thing that’s actually really interesting is even in that example — planning a trip to Chile — unlike with coding, where there’s a verifiable reward, there’s not a clear answer to what is a good trip to Chile. Maybe you looked at some other sites, so you can evaluate that. But for evaluating cures to cancer, the verification pipeline might be 500 million dollars and six months of work. That’s already a really significant jump, not in the same class of problems.
And beyond that, fundamentally most science data is really high-dimensional. We almost all work with transformers — sequence-to-sequence models — which works really well for text and even for proteins, where you also have a sequence of amino acids. But when you’re thinking about cancer data, you’re often talking about whole-slide images: Probably an image that’s a gigabyte by itself for a very small sample of your skin, where every single pixel has a multidimensional embedding of the genes that are in that cell, whether or not they’re cancerous, and other information like that. How do you tokenize that? I don’t think anybody knows the answer to that question.
Lauren: Where would you like to see custom model development go in the scientific community?
Georgia: Honestly, I think the barrier there — though I was just highlighting a technical question — is not actually a technical one. I think it’s mostly a social problem. It has been really difficult for people from the machine learning community and from the domain sciences community to get together and really collaborate. A lot of that is because in the machine learning community we’re often asking, “What do we optimize? What’s my loss?” And when you go talk to a material scientist who’s interested in a particular property and says it would be cool if we understood this — those two things are not inherently compatible at all. So lots of stuff ends up not being done. I think the fundamental issue we have right now is a social one, rather than a technical one. There’s so much more we can do with transformers that we have not been able to do for reasons that have nothing to do with the technology.
Why Multilingual AI Models Matter for a More Inclusive World
Lauren: Marzieh, this reminds me of what we talked about with building more open AI and building more diversity in AI, even with incorporating more languages. I know you’re doing a lot of work at Cohere on how you bring in multilingual support. Where do you see the importance of that? How do you see that advancing models?
Marzieh: The importance of working on multiple languages, I don’t think we need to justify it in a room like this. We all come from different backgrounds, speaking different languages. I think everyone here speaks at least two languages.
And everyone can agree that when you speak different languages, there are concepts and specific nuances to each language that you might not be able to literally transfer and translate between them. The diversity of each language and how each of them captures the human experience is also where this technology started from. If you look at the history of the transformer, and before that attention models, they were all really trying to use neural networks in the field of machine translation, because it’s very challenging to capture meaning in a language and also transfer meaning in a language.
There are practical challenges: The data for many of these languages and how represented that data is, versus how much it’s a translation of existing English data. And then there are societal challenges of how humans would interact with these multilingual models in their own languages: from the safety side, from the bias side, and how requirements and sensitivities may be different in different languages. At the end of the day, it is also a multi-objective optimization problem. There are multiple things you want to learn at the same time.
A big part of what we’ve done at Cohere Labs has been on multilingual models, and I would say — even more importantly — multilingual evaluation. That is something that, to your question about what is holding this technology back, good evaluation is something we definitely need.
How Should We Evaluate AI Models? The Problem with Benchmarks
Lauren: How do you think the role of good evaluation fits in?
Georgia: You have no idea if you’re making progress unless you have good evals. And evals are kind of unsexy, so not that many people want to work on them, but they’re actually the building blocks of all progress. There’s a great site for people who want to check it out called Artificial Analysis, which has some really interesting evals including IQ points versus cost for different models. There is no progress without good evals.
Lauren: I think you bring up a really good point about IQ points vs. cost and how we are measuring the efficacy or success of models today. I don’t think it always makes the most sense that you’re judging on how many tokens and what is the quality of it. I think there needs to be a fundamental rethink and evolution of how we measure the success of this.
Marzieh: What we actually see nowadays is a disconnect between the benchmarks that we have and the real-world capabilities of the models — the “vibe checking” of the model. The disconnect partially comes from how the evals become saturated or contaminated very quickly. We have been working on creating benchmarks in our research lab, but more recently I feel like that might not be the way to go, because very quickly everyone can train their models on that benchmark, intentionally or unintentionally. Each benchmark might capture one specific specialty and expertise, but this overall sense of what model is doing better — what it feels like — is not really captured. I think just rethinking the framework of evaluation: Should it be just one score or a leaderboard? And what are other ways, now that this technology is so good that it’s catching up with every benchmark right away, that we can test it out?
Lauren: I couldn’t agree with you more. In some ways, we don’t even know how much some of this costs, and how can you understand if it’s good or bad without even understanding costs? And how would you run a business without knowing this is what success looks like, and this is what my customer needs, and this is the fundamental costs? We are not even on day one of AI if we think about this. Val, I know you work with so many businesses right now thinking about the success and efficacy of AI. What are you seeing? How do you think we should be looking at the success and evaluating the frontier models that we’re using?
Val: It’s such an interesting topic. Artificial Analysis is finally starting to benchmark multi-turn, longer-context conversations. Even their data from the past year and 18 months is outstanding — not just on quality and the IQ portion, but they’ve added cost. They measure the amount of reasoning tokens each model uses if it’s a reasoning model. A lot of the traditional benchmarks we relied on are really saturated right now. I’m also a big fan of the ARC Prize and the ARC-AGI benchmarks. ARC-AGI-1 was cool because it proved you could literally reason out of distribution, outside of the training dataset, but at pretty enormous cost. People were guessing: Was it 16 hours to complete some of these? Was it over $1 million to saturate ARC-AGI-1? ARC-AGI-2 is largely unsaturated right now because they’re factoring in efficiency as a core part of the result. It’s important to have quality and intelligence, but if it’s not practical, it’s just not going to get used.
Some of the techniques that are coming out now — NVIDIA is leading the way and saying our general purpose GPUs are amazing at training and surprisingly good at inference, but inference now at scale is so important that they preannounced the Rubin CPX processor, which is just for the prefill part of inference, showing you that you really do need to disaggregate prefill and decode. You do need to really optimize your infrastructure to be able to afford all of the ambitious aspirational goals we have for AI at scale.
Lauren: I think a lot about this idea of commercially viable AI. To the point you made earlier about the early days of Uber being subsidized — this is the advent of where AI is going to go. For everybody here, what fundamental trade-offs are we navigating to get to this idea of commercially viable, but also viable for society? What are the trade-offs we have to navigate from model size, efficiency, cost, capacity, success?
Val: There’s a classic trade-off between throughput and latency for users. You can have a private jet to get a few people somewhere really, really fast. But if you put people on a bus, it’s way more efficient — but that bus is going to get to the destination way, way slower. We still struggle with that right now in terms of having really low latency, but for an affordable broad community of users, and ideally in the same batch inference. So that’s one of the fundamental trade-offs we still make.
Marzieh: I would add to that, just trying things out to understand them a little bit better first. Reasoning models are super popular and very impressive, but there has been more recent research on how much of the reasoning trace is actually useful, or does the model actually need it to get to the answer? There have been papers showing just essentially randomly removing the bottom half of the reasoning trace — the model would still do great. How much of it is for the benefit of the human who is looking at it to interpret the steps and how the model got to the answer? And how much of it is actually needed to get to the answer? These reasoning traces are just long inference-time compute that we are spending. One thing I think is important is that, with every new way we find that we can use these models even a little bit better, we should do that in a systematic, scientific way — really studying what it is about this particular way that we’re training or using these models that is helping, and how we can do that more efficiently and more effectively.
Georgia: I think also part of the reason people were motivated to do reasoning tracing was so that you could do, particularly in math and coding, step-wise corrections. That makes a lot of sense when you’re training a model, but it’s not clear that you need this at inference. Could you have it in training in a way that would cut it at inference?
Marzieh: Exactly.
Georgia: And I think from a business perspective, it’s important for people to think about when they need AI. There have been papers about how AI can lead to lots of work inefficiency. It’s worthwhile to think before you go and ask ChatGPT to do the presentation you don’t want to make, whether or not that’s a good thing to have AI doing for you. That would also benefit the environment. I think there’s also a place for small models here — particularly if you have an agent in your email. Banal responses to very common work interactions do not need GPT-4 or 5. You could probably have a Quinn half-billion-parameter model do that for you just as well. As prices rise for models that are actually really expensive to run, it’s going to become much more about matching the correct model to the correct task.
AI Energy Consumption: Making Artificial Intelligence Sustainable
Lauren: You mentioned the energy side of this, how power-hungry AI is. Val, what are you seeing in terms of power consumption? Are there ways we can make AI not just commercially viable, but also sustainable?
Val: It’s crazy when you actually do the math. Just putting in two PDFs into a medium-sized prompt — whether it’s a ChatGPT session or the first turn of hundreds of turns in an agent session — just the prefill for doing that consumes the entire energy usage of a household for a day. More than 20 KW to just start a chat session, or certainly an agent session. And because we run out of memory so quickly in a parallel, concurrent, multi-sub-agent task environment, we’re re-prefilling again after minutes. Every few minutes per agent subtask, that GPU is redundantly re-prefilling the early part of context over and over again. We’re running these AI factories as if it was before the Model T moment, before we had assembly lines. We have to get way more efficient in our token pipelines so we’re not wasting energy unnecessarily before we actually consume it for productive, valuable things.
Lauren: Georgia, how are you thinking about the energy consumption challenge?
Georgia: At Hugging Face, we really focus on small models — that’s where we really invest and put our energy. Part of that is also to enable a much broader community to use AI. I think otherwise, hopefully with AI for science, we discover great methods for carbon capture. And maybe we should also use AI less (which is not a very good tagline for the World Summit AI).
Lauren: But it is, to what we said earlier, use AI smartly. Once you know that uploading a couple of PDFs is the equivalent of the energy to run your house, do I need to ask ChatGPT how to get to the World Summit AI? Or can I just do a super-quick Google search? Or actually interact with a person and ask them? I know this is about the next step for the most advanced AI systems, but human interaction might be a good thing from time to time.
The Case for Open Source AI: Transparency, Access, and Innovation
Lauren: Marzieh, I’d love to talk about open source and why we should look at open source AI models.
Marzieh: This technology has been built on open research. What open source helps is everyone building on top of each other, everyone learning from each other’s mistakes and successes and adding to that. It’s also a really great way to create transparency when you share the details of your work, whether it’s your models or publishing papers on your methods. Transparency also helps you reproduce, replicate, and check: When something is open, it’s less likely to be for the benefit of a small group in terms of how it’s designed. At the end of the day, it just helps advance this technology. Hugging Face is a great example of really advocating for open datasets and open models. We have partnered with them a lot over the last couple of years. We’ve released our Aya models and the Command models on Hugging Face with open weights, and we have seen how that actually helps people pick this up and build something we wouldn’t have even predicted, (something that) would be a particular use case or follow-up for a particular project.
Lauren: I like a lot of what you are talking about and doing at Cohere, where it is almost AI for good. This is why you have multilingual support, this is why you lean into open source, because truly democratizing AI is not meant for the haves and have-nots. How can you make this more accessible for everyone so we can do more good in the world?
Marzieh: I think that’s an ideal scenario if we end up actually improving everyone’s life with this technology. There have been a few times in human history when that has happened. The Internet is a good example; it really elevated connection globally and a lot of positive things came out of that. With this technology, we can make it accessible for everyone and improve some parts of their lives or their work without also hopefully destroying the planet.
Val: Building on the open source theme, I can’t imagine this industry without open source. Just to be pedantic, there’s a lot of controversy in the open source AI community that most of the models are not fully open. The weights are open and some of the recipes are open, but a lot of the datasets aren’t necessarily open. We need to do a much better job of encouraging — creating financial and social incentives — to share the data as well as the models, in certain fields particularly.
DeepSeek is an example. It actually educated some of the big commercial labs on techniques like KV cache offloading early on, and their open infrastructure index on GitHub. They publish so prolifically and have contributed so much. One of the ways the audience can benefit is engaging; even if you just consume a lot of the great papers that are published alongside an open model, you can look at the theory, look at the practice, and play with it yourself on your own local PCs with a Llama-type server. Or if you really want to contribute to what’s become probably the most popular community in the open source AI world — which is under the Linux Foundation, the vLLM community — it’s a very popular inference server and there’s a lot of innovation happening there around reinforcement learning, inference at scale, and improving training. The closed labs publish a lot, but the open source model labs are by far the best place to learn — not just the science, but the engineering and the application of AI.
The Future of AI: Predictions for One Year, Three Years, and Beyond
Lauren: Frontier models will change what’s possible. They will reshape economies. It’s currently reshaping all of our lives, industries, and knowledge work. What do you think the future of AI is going to look like in one year, three years? Who should go first?
Val: I’ll start because it’s kind of a fun, slightly controversial take. I’m old enough to remember when the Internet first started, around Y2K or even before that. It was a big thing in Silicon Valley to go pitch VCs and say, “I’m an Internet startup.” And it meant something 20, 25 years ago. If you describe your company as an Internet company today, it’s meaningless — what does that even mean? I think at the pace of progress and acceleration in AI, within three years I predict if you say you’re an AI company or an AI startup, it’ll be meaningless. Like, are you applying it for healthcare? Are you applying it for security, for multilingual use cases, for science, or just for entertainment or social media? You’re going to have to basically assume AI is everywhere, that it will be at some level affordable, or at least there’ll be market pricing that lets you choose the right models and the right infrastructure properly, and that you’re actually solving real problems — whether they’re business problems, healthcare problems, or even geopolitical problems. The ubiquity of AI, I think, will happen faster than we can predict today.
Marzieh: For me, next year is easier to predict. I’m very excited about coordination and collaboration. I think we are now past the first stage of developing models that are really good, and now with multi-agent scenarios and how these models can coordinate and collaborate together to solve even more complex problems — that is something we’ve seen happening now, and there are a lot of interesting questions and problems to study for the next year or two. And I like to think that maybe in five years or longer in the future, we will have this technology in places we never thought it was going to be in a more positive way. Things that are so out of reach right now. These innovations, you can build them incrementally over the years, but at some point you also have to explore without any objective, to not just have one objective you want to optimize for. I like to think that in a few years, this exploration will land us in a place we cannot really think of right now.
Georgia: I think something we haven’t talked about much — and something more concrete — is the hopefully significant decrease in manual labor that people will be doing. I think that’s AI, but it’s also robotics. Huge headway is being made there, but so far the AI reasoning community has been pretty different from AI for the robotics community, which has been — you may have seen the videos — very focused on folding clothes. It’s a very difficult task, absolutely no hate. But I’m really optimistic that in three years, something like 50% less manual labor will be done by humans. Something like that.
Lauren: I love all those answers. I think about the work each of your companies is doing and how it is building for the future. Whether the work you’re doing on the science side (Georgia), where there very well could be a world where we are curing cancer at an accelerated rate because of AI. The work, Marzieh, you’re doing with Cohere and bringing in more diversity from multiple languages is not only going to make our AI smarter, it’s going to learn from having different perspectives. And Val, the work you’re doing at WEKA to increase token throughput — I know you’ve recently done a benchmark where you’ve achieved more than 4x token throughput on the exact same infrastructure — that’s going to improve our costs and energy consumption. If we can get this cost-effective and less power-hungry, we can build the viability to actually make the world a more diverse and inclusive place, to cure cancer, to give us all a better future.
Rapid Fire: What Is the End Game for Artificial Intelligence?
Lauren: Quick rapid-fire final question. Less than 30 seconds each. It’s going to be my hard question that none of you know in advance.
What’s the end game for AI?
Georgia: Hopefully a world where people are happy. I think it’s really easy to get lost in the idea of what progress is without any clear goal. So, bring back values. That’s my takeaway.
Marzieh: Improving human life. Yeah. I like that.
Val: To Georgia’s point earlier, the end game is something we’re not imagining yet, something we’re not seeing yet. Going back 20, 25 years ago, if you were to say you’re going to get into a stranger’s car and let this stranger drive you somewhere, or rent your couch to a stranger and share your bathroom, you would have been considered completely crazy. And that’s what the Internet really enabled, these really creative use cases. We haven’t seen this kind of creativity really yet. It’s a skeuomorphic thing, which is an old Apple term for just doing something old but better. It’s about doing brand-new things that you haven’t imagined yet. That’s the end game.
Lauren: I love it. Thank you so much for joining our panel. Thank you everyone for being here. Please thank our panelists.
Related Resources
Thank you for your WEKA Innovation Network program inquiry.
A WEKA channel representative will respond promptly.
You’re on your way to solving your most complex data challenges.
A WEKA solutions expert will be in contact with you shortly.
Thank you for your interest in WEKA’s Technology Alliance Program (TAP).
A member of the WEKA Alliances team will follow-up with you shortly.
Thank you!
A WEKA representative will be in touch with you shortly.
Thank you!
A WEKA representative will be in touch with you shortly.