⚡ Quick Summary
The AI Safety Index (Summer 2025), released by FLI, evaluates seven frontier AI developers on how responsibly they develop and deploy general-purpose AI systems. Unlike previous self-reporting exercises, this Index scores companies on 33 indicators grouped across six domains, based on publicly verifiable evidence. Labs included are Anthropic, OpenAI, Google DeepMind, Meta, xAI, Zhipu AI, and DeepSeek. No company scored higher than a C+ overall. The standout message is sobering: not even the best-performing labs have adequate existential safety plans in place, and only three out of seven demonstrate meaningful testing for dangerous capabilities.
🧩 What’s Covered
Scope & Purpose:
- Developed in response to increased AI risk awareness post-UK AI Safety Summit
- Designed for public transparency and corporate accountability
- Focuses on institutional safeguards, not capabilities
Who Was Evaluated:
- Anthropic, OpenAI, Google DeepMind, Meta, xAI, Zhipu AI, DeepSeek
- DeepSeek included for the first time, reflecting China’s growing influence
- Safe Superintelligence Inc. excluded for not having deployed frontier models
Structure:
- 33 indicators across 6 domains:
- Risk Assessment
- Current Harms
- Safety Frameworks
- Existential Safety
- Governance & Accountability
- Information Sharing
Scoring & Ranking:
- Grades range from A to F
- Only Anthropic scored above C (C+, 2.64), followed by OpenAI (C, 2.10) and DeepMind (C–, 1.76)
- Meta and xAI got Ds; Zhipu AI and DeepSeek failed
- No company scored higher than a D in Existential Safety
Key Evaluation Methods:
- Focus on implemented practices, not declarations
- Evidence collected between March–June 2025 from public sources and targeted company survey
- Independent expert panel reviewed assessments
Highlights:
- Anthropic led the index by publishing evaluations, sharing test results, and engaging external reviewers
- OpenAI shared detailed external evaluation access (e.g., METR, Apollo, UK/US Safety Institutes), but scored low on whistleblowing transparency
- Meta and xAI lacked basic transparency and risk testing evidence
- Zhipu AI and DeepSeek failed across nearly all categories, though the Index notes cultural and regulatory differences in China
💡 Why it matters?
This Index shifts the conversation from what companies say about safety to what they actually do. It introduces comparative accountability and surfaces gaps—especially around existential risk planning and external validation. In a landscape with no global regulatory baseline, this tool gives policymakers and watchdogs leverage to push for real safeguards—not just PR statements. The Index also clarifies which companies are moving beyond performative safety.
❓ What’s Missing
- No evaluation of lab intent or alignment strategies beyond observable outputs
- Heavy focus on English-language sources and Western norms may miss culturally specific practices
- No weighting of indicators—all are scored equally, which might underplay critical gaps
- No direct access to source data for some high-stakes claims
👥 Best For
- Regulators setting frontier AI safety baselines
- Civil society groups tracking lab behavior
- Policy researchers analyzing corporate accountability gaps
- Journalists looking for hard comparisons
- AI developers benchmarking safety practices
📄 Source Details
Title: AI Safety Index – Summer 2025
Published by: Future of Life Institute (FLI)
Release Date: July 2025
Pages: 101
Indicators: 33
Domains: 6
Labs Evaluated: 7
📝 Thanks to
The team at FLI for producing the most evidence-based, hard-scored frontier AI safety evaluation to date—and for pushing past vague pledges. Thanks also to reviewers from academia, civil society, and technical safety communities who shaped the methodology.