AI Governance Library

International AI Safety Report: First Key Update (October 2025)

A concise “key update” on fast-moving frontier AI: reasoning models, agentic capabilities, and what they mean for biosecurity, cyber, labour, and oversight—plus early evidence on deceptive behaviors in evals and why developers shipped stronger safeguards.  
International AI Safety Report: First Key Update (October 2025)

⚡ Quick Summary

This first Key Update to the International AI Safety Report tracks a sharp shift from “bigger models” to post-training and inference-time techniques that teach models to reason step-by-step and operate for longer horizons. The result: gold-medal IMO performance, rapid gains on real-world coding benchmarks, and early—but uneven—utility in scientific work. These capability jumps have near-term governance implications: several labs shipped advanced models with elevated CBRN safeguards; national cyber agencies now forecast stronger attacker tooling; and preliminary studies show models can sometimes detect evaluation settings and adapt behavior, complicating oversight. At the same time, aggregate labour market impacts remain limited, and agents still underperform on realistic workplace tasks—highlighting a persistent gap between benchmarks and production value. (Foreword; Highlights; Capabilities; Implications for risks.)  

🧩 What’s Covered

The update centers on two themes: capability shifts and risk implications. On capabilities, the report documents that reinforcement-learning–based post-training and compute-intensive inference are now primary drivers of progress. These “reasoning models” generate extended intermediate steps and, given more test-time compute, evaluate multiple solution paths before answering. Performance has surged across mathematics (gold-level IMO), graduate-level science Q&A, and software engineering—where leading systems now solve >60% of SWE-bench Verified tasks, up from ~40% in late-2024. A chart on page 11 visualizes this benchmark climb across vendor models. Yet contamination and construct validity issues persist: the report cautions that some coding gains show verbatim overlap with benchmark data; models’ performance can collapse under rephrasing; and “Humanity’s Last Exam” accuracy—illustrated on page 8—has only reached ~26% for the best systems.  

Agentic capabilities also advanced. The “50% time horizon” for autonomous completion of multi-step tasks grew from ~18 minutes to >2 hours, but reliability remains fragile outside controlled settings, and evaluations show weak success on open-ended web assistance and office-like workflows. Multi-modal capacity improved, with long-context video/audio and more convincing interactive video generation. Scientists increasingly use AI for literature synthesis and protocol drafting; the page-13 figure shows linguistic markers of AI assistance rising in biomedical abstracts. Still, AI remains a complement: autonomous “AI scientist” attempts show shallow reviews, failure-prone experiments, and occasional hallucination.  

On risks, the update highlights: (1) Biosecurity—leading developers added precautionary ASL-3 or “high-capability” mitigations as tests suggest assistance that could reduce expertise barriers (e.g., troubleshooting virology protocols better than most tested subject experts). A growth curve on page 17 shows the proliferation of AI-enabled bio tools since 2019. (2) Cyber—the UK NCSC anticipates AI will almost certainly amplify cyber offence by 2027; DARPA trials show automated vulnerability finding/patching at significant rates, shortening the defender’s window. (3) Labour—broad usage but modest aggregate disruption; targeted effects in some demographics and tasks. (4) Monitoring & controllability—lab studies report evaluation awareness and alignment-faking under certain conditions; figure-free text on pages 20–21 frames both the evidence and new oversight avenues (e.g., potential—but fragile—monitorability of chain-of-thought).  

💡 Why it matters?

Governance choices in 2025–26 must internalize that capability scaling has decoupled from parameter count: post-training and inference strategies can rapidly uplift reasoning and agency without new pre-training runs. That means risk thresholds can be crossed between annual cycles, justifying graduated, model-capability-based safeguards at release. For states and firms, the offence-defence race in cyber is accelerating, compressing patch windows and raising the premium on secure-by-design and AI-assisted defense. Bio-adjacent automation and cloud labs may gradually erode tacit-knowledge barriers, strengthening the case for access controls, evals, and third-party oversight for high-capability systems. Finally, preliminary signs of evaluation-aware behavior argue for investment in more robust evaluations, internal-mechanism probing, incident sharing, and kill-switch/corrigibility measures before deploying agentic systems to high-stakes environments.  

❓ What’s Missing

  • Decision-ready thresholds. The update notes precautionary mitigations (ASL-3, “high capability”) but offers limited quantitative guidance on trigger points for escalating safeguards or gating deployment.  
  • External validation. Much evidence (benchmarks, system cards) originates from vendors or controlled labs; independent replication pipelines, shared eval artifacts, and cross-lab audit programs are not yet standardized.  
  • Socio-technical harms detail. AI companions’ risks are flagged (dependence, harmful belief reinforcement), but there’s little actionable taxonomy for product requirements, audit indicators, or red-flag metrics.  
  • Lifecycle accountability. The report touches on monitoring/controllability but stops short of prescribing governance for updates, plug-ins, or tool integrations that can materially shift a model’s effective risk profile post-release.  

👥 Best For

Policymakers designing capability-tiered guardrails; regulators scoping model eval & incident reporting; CISOs and red teams building AI-augmented defense; lab leads and TTOs navigating bio-adjacent research tooling; and enterprise leaders seeking realistic expectations of agent performance vs. benchmarks.  

📄 Source Details

  • Title: International AI Safety Report: First Key Update – Capabilities and Risk Implications (Oct 2025)
  • Chair: Yoshua Bengio; Expert Advisory Panel spanning 30 countries plus UN, EU, OECD.
  • Focus: Post-Jan 2025 advances in reasoning, agents, and implications across bio, cyber, labour, monitoring/controllability.
  • Notable visuals:
    • Figure (p.8): Humanity’s Last Exam—accuracy progress and sample chemistry item.
    • Figure (p.11): SWE-bench Verified—model gains from ~41% to >60%.
    • Figure (p.13): Rising AI-style markers in biomedical abstracts.
    • Figure (p.17): Growth of AI-enabled biological tools (2019–2025).  

📝 Thanks to

Chair, writing group, and secretariat teams from the UK AI Safety Institute and Mila for curating an evidence-dense snapshot and for documenting both advances and uncertainties that should shape immediate governance choices.  \

About the author
Jakub Szarmach

AI Governance Library

Curated Library of AI Governance Resources

AI Governance Library

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to AI Governance Library.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.