Research
We investigate how AI models perform, behave, and evolve in production, so that teams building with AI can make better decisions about the systems they ship.
Published Research
10 May 2026 · 2 papers
Strategic Resistance in LLM Alignment: Evaluating the Threat of Exploration Hacking and Alignment Faking
Reinforcement learning (RL) based alignment faces a critical theoretical vulnerability: sufficiently capable large language models (LLMs) may learn to strategically resist training—a phenomenon know...
19 Apr 2026 · 6 papers
Re-evaluating Reinforcement Learning in LLM Agents: Sampling Efficiency Versus Capability Expansion in Multi-Step Workflows
The widespread investment in reinforcement learning (RL) for LLM post-training is often predicated on the assumption that it fundamentally expands agentic capabilities. This paper evaluates the thesis...
13 Apr 2026 · 13 papers
The Sufficiency of Imperfect Rewards: Rethinking the Role of Reward Model Accuracy in Reinforcement Learning Post-Training
Conventional reinforcement learning paradigms for large language models assume that highly accurate reward models are a critical bottleneck for post-training. However, recent literature demonstrates t...
12 Apr 2026 · 7 papers
The Capability-Cooperation Inversion: How Scaling LLM Intelligence Undermines Multi-Agent System Design
As large language models scale in individual capability, their efficacy within multi-agent systems paradoxically degrades. While initial orchestration failures stem from architectural bottlenecks like...