COMPAS Criminal Recidivism AI Performs No Better Than Random Untrained People

High

Dartmouth researchers found that COMPAS recidivism prediction AI used in criminal sentencing performed no better than random untrained people, despite being widely deployed across US courts.

Full Description

In January 2018, researchers at Dartmouth College published a study in Science Advances that fundamentally challenged the effectiveness of AI-driven criminal recidivism prediction tools used throughout the US justice system. The study, led by Julia Dressel and Hany Farid, examined the widely-deployed Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) algorithm developed by Northpointe (later Equivant). The researchers conducted a systematic comparison between COMPAS predictions and assessments made by untrained volunteers recruited through Amazon Mechanical Turk. They provided 400 online participants with basic information about defendants - age, sex, criminal history, and current charge - the same core data points that inform COMPAS assessments. Remarkably, these untrained individuals predicted recidivism with the same 65% accuracy rate as the sophisticated COMPAS algorithm. The study analyzed over 7,000 defendants from Broward County, Florida, examining both the accuracy of predictions and demographic bias patterns. The research revealed that COMPAS exhibited significant racial bias, incorrectly flagging Black defendants as high-risk at nearly twice the rate of white defendants (45% vs 23% false positive rate). When researchers created a simple two-factor model using only defendant age and prior convictions, it matched COMPAS accuracy while being completely transparent in its methodology. The implications extended far beyond academic interest, as COMPAS and similar risk assessment tools were being used in sentencing decisions, parole hearings, and bail determinations across multiple states. The research estimated that such tools influenced decisions affecting over 1 million defendants annually. The study's findings suggested that courts were relying on proprietary algorithms that provided no meaningful improvement over basic demographic factors or even random predictions by untrained individuals, while introducing systematic racial bias into criminal justice outcomes.

Root Cause

The COMPAS algorithm relied on demographic and socioeconomic factors that correlated with race and class rather than actual predictive criminal behavior indicators. The model's complexity provided no accuracy benefit over simple heuristics or random predictions.

Mitigation Analysis

This incident highlights the need for rigorous algorithmic auditing and validation testing against simple baselines before deployment in high-stakes decisions. Requiring transparency in algorithmic decision-making, regular bias testing across demographic groups, and comparative accuracy studies against human judgment could prevent deployment of ineffective but seemingly sophisticated systems.

Regulatory Framework References

All frameworks →

EU AI Act

Art. 10—Data & Data GovernanceArt. 9—Risk Management SystemArt. 71—Fundamental Rights Impact Assessment

ISO/IEC 42001

A.8.4—Data Quality for AIA.6.2.6—Fairness in AI Systems

NIST AI RMF

MAP 2.3—AI System Bias AssessmentMEASURE 2.6—Fairness Assessment

Lessons Learned

The incident demonstrates the critical importance of validating AI systems against simple baselines before deployment in high-stakes decisions. Algorithmic complexity does not guarantee superior performance, and proprietary systems can perpetuate bias while providing false confidence in their accuracy.

Sources

The accuracy, fairness, and limits of predicting recidivism

Science Advances · Jan 17, 2018 · academic paper

How we examined racial bias in automated criminal-risk assessments

The Washington Post · Jan 17, 2018 · news