AI Essay Grading Systems Systematically Penalize Non-Native English Speakers

High

Research revealed AI essay grading systems like e-rater systematically gave lower scores to non-native English speakers despite equivalent content quality, affecting standardized test outcomes for international students.

Full Description

Multiple AI-powered essay grading systems used in high-stakes standardized testing, including Educational Testing Service's e-rater and Vantage Learning's IntelliMetric, were found to exhibit systematic bias against non-native English speakers. Academic research published in 2019 demonstrated that these systems consistently assigned lower scores to essays written by non-native speakers, even when the content quality, argumentation, and ideas were equivalent to those of native speakers. The bias manifested through the algorithms' heavy weighting of linguistic features such as sentence complexity, vocabulary sophistication, and grammatical structures that favor native English writing patterns. Non-native speakers, despite demonstrating strong content knowledge and critical thinking skills, were penalized for using simpler sentence structures, more common vocabulary, or slight grammatical variations that are typical of second-language acquisition patterns. The impact was particularly significant for international students taking standardized tests like the GRE, TOEFL, and state assessment exams, where AI grading was increasingly being used to supplement or replace human scorers. Research indicated that approximately 200,000 test-takers annually could be affected by these scoring disparities, with potential consequences for college admissions, scholarship opportunities, and academic placement decisions. Educational Testing Service and other testing companies initially defended their systems, arguing that language proficiency was a legitimate component of writing assessment. However, researchers demonstrated that the bias persisted even when controlling for overall English proficiency levels, suggesting the algorithms were not accurately measuring writing quality but rather privileging specific linguistic styles associated with native speakers. The controversy intensified debates about the appropriate role of AI in educational assessment, particularly for diverse student populations. Critics argued that automated scoring systems risked perpetuating educational inequalities by systematically disadvantaging students from non-English speaking backgrounds, while proponents emphasized the need for consistent and scalable assessment methods in an era of increasing test volume.

Root Cause

AI grading algorithms were trained primarily on native English speaker writing samples and weighted linguistic features like sentence complexity and vocabulary sophistication over content quality, creating systematic bias against non-native speakers' different linguistic patterns.

Mitigation Analysis

Bias could have been reduced through diverse training data including non-native speaker samples, bias testing across demographic groups, and content-focused scoring rubrics that de-emphasize linguistic complexity. Regular algorithmic auditing and human oversight for score discrepancies would help identify systematic bias patterns before deployment.

Regulatory Framework References

All frameworks →

EU AI Act

Art. 10—Data & Data GovernanceArt. 9—Risk Management SystemArt. 71—Fundamental Rights Impact Assessment

ISO/IEC 42001

A.8.4—Data Quality for AIA.6.2.6—Fairness in AI Systems

NIST AI RMF

MAP 2.3—AI System Bias AssessmentMEASURE 2.6—Fairness Assessment

Lessons Learned

The incident highlights the critical importance of bias testing across diverse demographic groups before deploying AI systems in high-stakes educational contexts, and demonstrates how seemingly objective automated scoring can perpetuate systemic inequalities.

Sources

The new SAT will be graded by computer, and that has some people worried

Washington Post · Feb 27, 2016 · news

Study finds bias in AI grading systems against non-native English speakers

Inside Higher Ed · Apr 16, 2019 · news