Understanding How AI Thinks: A Year of Research: Laura Ruis
By Claire Hudson, on 4 August 2025
Over the past year, my research has focused on the following questions: how do large language models (LLMs) learn to do reasoning from large-scale textual pretraining? Anticipating the eventual surpassing of human experts by AI, how can we still make sure that what they tell us makes sense? And, given that the field is using LLMs as evaluators in most papers nowadays, how can we make sure their evaluations are reliable? In this post, I will go over me and my collaborators past year of findings.
How Do Models Learn to Reason from Pretraining Data?
My main research output came with a paper titled “Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models,” which was accepted at ICLR 2025. Since the advent of LLMs and their saturation of benchmarks, a question has been bugging me: are models truly reasoning when they solve problems, or have they merely seen very similar questions as the ones we benchmark them with in their vast, internet-scale training data? Tackling this question was challenging for a few reasons. Firstly, the pretraining data is so vast that searching through it is usually intractable (trillions of tokens!). Secondly, doing interpretability to understand what pretraining data influences model outputs is very expensive, as models have billions of parameters.
To answer this question, I investigated how the training data models rely on when reasoning differs from the training data they rely on when doing factual question answering. The hypothesis here is as follows: if models are memorising answers to reasoning questions, the patterns of influential pretraining data for reasoning may look quite similar to the patterns for factual question answering. What I discovered was fascinating: while models rely on completely
different data sources when answering different factual questions (like “What’s the capital of France?” and “What is the highest mountain in the world?”), they consistently draw from the same data when solving different reasoning problems within the same category. This suggests they can learn from the same pretraining data and apply what they’ve learned to many different reasoning questions, indicating the presence of procedural knowledge. For the factual question answering, I found pretraining data with the answers to the questions to be highly influential, whereas for the reasoning questions that was not the case. My findings suggest that AI models aren’t just retrieving memorised answers—they’re learning procedural knowledge, like how to apply formulas or follow problem-solving steps.
Think of it like the difference between memorising that 2+2=4 versus learning the procedure for addition that can be applied to different numbers. My research shows that AI models are doing more of the latter than previously thought, which has important implications for how we understand their capabilities and limitations.
More about this work in my blogpost.
Making AI Debates More Truthful
At ICML 2024 in Vienna, collaborators and me presented work on an entirely different but equally important problem: as models grow increasingly sophisticated, they will surpass human expertise, and the role of humans and AI will flip. Humans will need to start evaluating AI, as opposed to vice-versa. But how can we evaluate AI that is smarter than us? Our paper “Debating with More Persuasive LLMs Leads to More Truthful Answers” takes a first step in this direction using a creative solution inspired by human debate, and even won a best paper award!
The concept is simple: instead of relying on humans to directly evaluate AI responses, we have two AI systems debate different answers to a question, then let a third AI (or human) judge which argument is more convincing. Remarkably, this approach helped both AI judges and humans identify correct answers 76% and 88% of the time respectively, compared to much lower accuracy with simpler methods.
Even more intriguingly, when we used inference-time methods to make the debating AI systems more persuasive, it actually made them better at revealing the truth rather than misleading judges. This suggests that we could use AI debate to evaluate their truthfulness as they become more capable, and eventually surpass human expertise.
Improving How We Evaluate AI Systems
The most rewarding experience this past year has been advising students, one of which was Yi Xu, a UCL master’s student. Together with Robert Kirk, we tackled a crucial question of how we reliably evaluate and compare different AI systems. Yi developed new methods for ranking AI chatbots that will be presented as a spotlight paper at ICML this year.
The problem we tackled is surprisingly common: current evaluation methods often produce inconsistent rankings, where System A beats System B, System B beats System C, but System C somehow beats System A. The solution draws inspiration from sports tournaments, using round-robin style comparisons combined with statistical modeling to produce more reliable rankings while reducing computational costs.
Advancing AI Safety and Interpretability
Another student I advised through a collaboration with the University of Michigan was Itamar Pres, who works actively on advancing the broader field of AI safety and interpretability. At NeurIPS 2024 in Vancouver, Itamar presented his spotlight paper titled “Towards Reliable Evaluation of Behaviour Steering Interventions in LLMs,” at the MINT workshop. This research focuses on how we can reliably modify AI behaviour and measure the effectiveness of our interventions.
Personally, being at NeurIPS in Vancouver also provided me with the amazing opportunity to share insights with a broader audience through an invited interview on Machine Learning Street Talk, a popular science podcast. The discussion, titled “How do AI models actually think,” allowed me to communicate the implications of my research to both technical and general audiences. My appearance on Machine Learning Street Talk received more than 30K views! How Do AI Models Actually Think?
Looking Forward
This year’s research has reinforced my belief that understanding AI systems requires looking beyond their outputs to examine how they actually process information and make decisions. Whether it’s investigating what training data influences their reasoning, developing better evaluation methods, or creating systems that can reliably identify truth through debate, the work emphasizes transparency and reliability in AI development.
After I wrap up my PhD at UCL this year, I will be joining Professor Jacob Andreas at MIT for a postdoc, and I’m excited to build on these insights and tackle the next generation of challenges in AI alignment and interpretability. I look back on a happy PhD and am grateful I got the chance to pursue this research at UCL.
One Response to “Understanding How AI Thinks: A Year of Research: Laura Ruis”
- 1
Close
I’m interested in research that tries to identify how reasoning works in LLMs. From the work I’m seeing online, most of the analysis you have done is (necessarily no doubt) based on fairly basic tasks.
At the other end of the scale, I have conducted a personal experiment taking Chat GPT 4 through Wittgenstein’s Tractatus following the numbered statements in sequence, and inviting it to comment, explain, reflect, extrapolate, in dialogue with me. I am not a specialist, just a well prepared reader with some philosophy background.
The experiment went over 3 months, with daily progress.
Is this of interest to you?