AI Safety Index – Summer 2025 Edition

⚡ Quick Summary

The AI Safety Index (Summer 2025), released by FLI, evaluates seven frontier AI developers on how responsibly they develop and deploy general-purpose AI systems. Unlike previous self-reporting exercises, this Index scores companies on 33 indicators grouped across six domains, based on publicly verifiable evidence. Labs included are Anthropic, OpenAI, Google DeepMind, Meta, xAI, Zhipu AI, and DeepSeek. No company scored higher than a C+ overall. The standout message is sobering: not even the best-performing labs have adequate existential safety plans in place, and only three out of seven demonstrate meaningful testing for dangerous capabilities.

🧩 What’s Covered

Scope & Purpose:

Developed in response to increased AI risk awareness post-UK AI Safety Summit
Designed for public transparency and corporate accountability
Focuses on institutional safeguards, not capabilities

Who Was Evaluated:

Anthropic, OpenAI, Google DeepMind, Meta, xAI, Zhipu AI, DeepSeek
DeepSeek included for the first time, reflecting China’s growing influence
Safe Superintelligence Inc. excluded for not having deployed frontier models

Structure:

33 indicators across 6 domains:
1. Risk Assessment
2. Current Harms
3. Safety Frameworks
4. Existential Safety
5. Governance & Accountability
6. Information Sharing

Scoring & Ranking:

Grades range from A to F
Only Anthropic scored above C (C+, 2.64), followed by OpenAI (C, 2.10) and DeepMind (C–, 1.76)
Meta and xAI got Ds; Zhipu AI and DeepSeek failed
No company scored higher than a D in Existential Safety

Key Evaluation Methods:

Focus on implemented practices, not declarations
Evidence collected between March–June 2025 from public sources and targeted company survey
Independent expert panel reviewed assessments

Highlights:

Anthropic led the index by publishing evaluations, sharing test results, and engaging external reviewers
OpenAI shared detailed external evaluation access (e.g., METR, Apollo, UK/US Safety Institutes), but scored low on whistleblowing transparency
Meta and xAI lacked basic transparency and risk testing evidence
Zhipu AI and DeepSeek failed across nearly all categories, though the Index notes cultural and regulatory differences in China

💡 Why it matters?

This Index shifts the conversation from what companies say about safety to what they actually do. It introduces comparative accountability and surfaces gaps—especially around existential risk planning and external validation. In a landscape with no global regulatory baseline, this tool gives policymakers and watchdogs leverage to push for real safeguards—not just PR statements. The Index also clarifies which companies are moving beyond performative safety.

❓ What’s Missing

No evaluation of lab intent or alignment strategies beyond observable outputs
Heavy focus on English-language sources and Western norms may miss culturally specific practices
No weighting of indicators—all are scored equally, which might underplay critical gaps
No direct access to source data for some high-stakes claims

👥 Best For

Regulators setting frontier AI safety baselines
Civil society groups tracking lab behavior
Policy researchers analyzing corporate accountability gaps
Journalists looking for hard comparisons
AI developers benchmarking safety practices

📄 Source Details

Title: AI Safety Index – Summer 2025

Published by: Future of Life Institute (FLI)

Release Date: July 2025

Pages: 101

Indicators: 33

Domains: 6

Labs Evaluated: 7

📝 Thanks to

The team at FLI for producing the most evidence-based, hard-scored frontier AI safety evaluation to date—and for pushing past vague pledges. Thanks also to reviewers from academia, civil society, and technical safety communities who shaped the methodology.