COMPAS Recidivism Algorithm Showed Racial Bias in Criminal Sentencing

Critical

Northpointe's COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) algorithm, used in courts across the United States to assess the likelihood of criminal defendants reoffending, was found to exhibit significant racial bias. A ProPublica investigation revealed that Black defendants were nearly twice as likely to be falsely flagged as future criminals compared to white defendants.

Full Description

The Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) algorithm, developed by Northpointe (later renamed Equivant), was deployed across hundreds of U.S. courts beginning in 2013 to assess defendants' risk of reoffending. The proprietary risk assessment tool generated scores from 1 to 10 that were intended to inform critical criminal justice decisions including bail determinations, sentencing recommendations, and parole considerations. By 2016, COMPAS was being used in courtrooms across at least 24 states, processing thousands of defendants annually and directly influencing their interactions with the justice system. COMPAS utilized a complex algorithmic model incorporating 137 different factors to generate recidivism risk predictions, including criminal history, employment status, substance abuse history, family relationships, and residential stability. While the algorithm did not explicitly use race as a direct input variable, it incorporated numerous socioeconomic and demographic factors that served as proxies for race, such as neighborhood characteristics, employment patterns, and arrest histories that reflect broader systemic inequalities. The proprietary nature of Northpointe's algorithm meant that the specific weighting and interaction of these variables remained opaque to courts, defendants, and their legal representatives using the system. ProPublica's May 2016 investigative analysis of over 7,000 defendants in Broward County, Florida revealed systematic racial bias in COMPAS predictions with significant real-world consequences. Black defendants who did not reoffend were incorrectly flagged as high-risk at nearly twice the rate of white defendants (44.9% versus 23.5% false positive rate), while white defendants who did reoffend were more likely to be incorrectly classified as low-risk for future crimes. These biased risk scores directly influenced judicial decisions, with higher scores correlating with longer sentences, higher bail amounts, and reduced likelihood of parole approval. The investigation estimated that thousands of defendants had potentially received harsher treatment due to algorithmically amplified racial bias. Northpointe initially disputed ProPublica's methodology and conclusions, arguing that COMPAS maintained calibration across racial groups, meaning that defendants of different races with the same risk score had similar actual reoffending rates. The company contended this demonstrated fairness, while critics pointed to the disparate error rates as evidence of discriminatory impact. Several academic researchers entered the debate, highlighting the mathematical impossibility of simultaneously achieving multiple definitions of algorithmic fairness when base rates differ across demographic groups. Courts in Wisconsin, California, and other jurisdictions faced legal challenges to their use of COMPAS, with some reducing reliance on algorithmic risk assessments in response to bias concerns. The COMPAS controversy catalyzed broader academic and policy discussions about algorithmic accountability in high-stakes government applications. The case became a foundational example in the emerging field of algorithmic fairness research, spawning dozens of academic papers examining different mathematical definitions of bias and fairness in automated decision systems. Multiple state legislatures introduced bills requiring algorithmic transparency and bias testing for government AI systems, while organizations like the Partnership on AI and the Algorithmic Justice League cited COMPAS as evidence for stronger AI governance frameworks. The incident's legacy continues to influence criminal justice reform efforts and AI policy development as of 2024. Several jurisdictions have implemented algorithmic impact assessments or bias auditing requirements for government AI systems, while others have moved away from algorithmic risk assessment tools entirely. The fundamental tension between different mathematical definitions of fairness revealed by the COMPAS case remains an active area of research and policy debate, with ongoing implications for AI deployment in education, hiring, lending, and other domains where algorithmic bias can perpetuate or amplify existing social inequalities.

Root Cause

The COMPAS algorithm used 137 features to predict recidivism risk, but its predictions showed strong racial disparities. While the tool did not directly use race as an input, it relied on factors that served as proxies for race, including neighborhood, employment history, and prior arrests. The algorithm achieved similar overall accuracy across racial groups but had dramatically different error patterns.

Mitigation Analysis

An audit trail documenting the algorithm's predictions alongside actual outcomes, disaggregated by race, would have revealed the disparate error patterns earlier. Provenance records linking each risk score to the specific model version, input features, and weighting would have enabled systematic fairness audits. This case demonstrates that accuracy is not a sufficient metric for AI systems in criminal justice — the distribution of errors matters as much as the overall error rate.

Regulatory Framework References

All frameworks →

EU AI Act

Art. 10—Data & Data GovernanceArt. 9—Risk Management SystemArt. 71—Fundamental Rights Impact Assessment

ISO/IEC 42001

A.8.4—Data Quality for AIA.6.2.6—Fairness in AI Systems

NIST AI RMF

MAP 2.3—AI System Bias AssessmentMEASURE 2.6—Fairness Assessment

Lessons Learned

AI systems used in criminal justice require mandatory fairness audits with disaggregated error analysis. Overall accuracy can mask severe disparate impact. The mathematical impossibility of satisfying all fairness criteria simultaneously must be explicitly addressed in system design and policy.

Sources

Machine Bias: Risk Assessments in Criminal Sentencing

ProPublica · May 23, 2016 · news