What’s Covered
The classification pipeline is designed around two main tasks per incident:
- Task 1: Summarize the incident and classify it into causal and domain categories using the MIT Risk Taxonomy.
- Task 2: Independently rate the severity of harm across 10 impact categories (e.g., discrimination, privacy infringement, labor harm), applying the Center for Security and Emerging Technology (CSET) harm ratings.
Each result comes with:
- A confidence score for transparency.
- A text explanation to make the model’s reasoning traceable.
Additionally, the system maps each incident to one of the EU AI Act’s risk categories (unacceptable, high, limited, minimal risk), providing multi-framework labeling.
Technical highlights:
- Built on top of the AI Incident Database (AIID).
- Powered by the Claude Sonnet model through OpenRouter.
- Integrates automated quality controls: a cross-checking function reruns task evaluations to flag low-confidence classifications.
The project also provides an interactive dashboard, allowing users to filter and explore incidents by risk type, severity, and domain.
According to Mylius’s own evaluation, the classification system shows about 80% consistency with expert human ratings—a solid result given the inherent difficulty of qualitative harm assessments.
In his blog post, Mylius stresses that the goal is not perfect judgment, but scalability and auditability. The system is intended to support reclassification as taxonomies mature and community standards evolve.
💡 Why It Matters?
The AI governance field needs scalable, structured ways to make sense of incidents before they spiral into noise.
This project demonstrates how LLMs can be responsibly used to amplify expert capacity, rather than replace it—classifying and highlighting critical trends across hundreds (soon thousands) of reported cases.
It also shows how technical infrastructure can bridge multiple taxonomic frameworks—something essential for governments, researchers, and companies navigating the messy, fragmented AI safety landscape.
Importantly, it sets a transparent, repeatable baseline for future AI safety incident monitoring systems.
What’s Missing
The tool is a strong foundation, but limitations remain:
- Misclassifications still happen, especially when incidents are thinly described.
- Multi-incident reports (those describing several failures) are hard for the current pipeline to parse accurately.
- No end-to-end human review layer yet—necessary if the system is deployed in high-stakes contexts.
- Broader context fields like resolutions, organizational responses, and recurrence patterns aren’t yet captured.
Mylius himself notes that in a future version, better retrieval, richer metadata, and more robust LLM chains could close these gaps.
Best For
- AI Governance Researchers building taxonomic frameworks.
- Regulators and policymakers designing AI incident reporting systems.
- Risk Management and Compliance Teams monitoring AI safety internally.
- Developers working on scalable, auditable AI oversight infrastructure.
Source Details
- Title: Scalable AI Incident Classification
- Author: Simon Mylius
- Published by: AI Incident Database (AIID) Project, 2025
- Link: Full write-up and code access
- Credentials: Winter Fellow at the Centre for the Governance of AI (GovAI); developer of open-source AI incident monitoring tools.