← Back to incidents

Amazon AI Recruiting Tool Showed Systematic Gender Bias

Major

Amazon developed an internal AI recruiting tool that evaluated job applicants by scoring resumes. The system taught itself to penalize resumes containing indicators of female gender, systematically downranking women for technical roles. Amazon scrapped the tool after discovering the bias.

Category
Bias
Industry
HR / Recruiting
Status
Resolved
Date Occurred
Jan 1, 2014
Date Reported
Oct 10, 2018
Jurisdiction
US
AI Provider
Other/Unknown
Model
Custom ML Model
Application Type
api integration
Harm Type
discriminatory
Human Review in Place
No
Litigation Filed
No
resume_screeninggender_biashiring

Full Description

Starting in 2014, Amazon's machine learning team in Edinburgh developed an automated recruiting tool designed to streamline the hiring process by scoring job applicants' resumes on a scale of one to five stars. The system was intended to function as a recommendation engine that would help hiring managers identify the most promising candidates from the large volume of applications Amazon received. The tool was developed over several years as part of Amazon's broader efforts to leverage artificial intelligence across its business operations. The project involved multiple engineers and data scientists working to create what they hoped would be a more efficient and objective hiring process. The AI system was built using machine learning algorithms trained on a dataset comprising resumes submitted to Amazon over the previous ten years, from approximately 2004 to 2014. However, this historical training data reflected the heavily male-dominated composition of Amazon's technical workforce and the broader tech industry during that period. The algorithm learned to identify patterns associated with successful hires, inadvertently encoding gender bias present in the historical hiring decisions. The system systematically penalized resumes containing explicit female indicators, including downgrading candidates who mentioned participation in "women's chess club" or similar activities, and specifically discriminating against graduates of two all-women's colleges. The discriminatory behavior of the system meant that qualified female candidates were systematically ranked lower than their male counterparts for technical positions, particularly software engineering roles. While Amazon has not disclosed the exact number of candidates affected, the tool potentially impacted hundreds or thousands of female job applicants who applied to technical positions during the period when the system was being tested and refined. The bias extended beyond obvious gender markers, as the AI likely learned to identify and penalize other subtle linguistic or experiential indicators associated with female candidates. This systematic discrimination could have perpetuated and amplified existing gender disparities in Amazon's technical workforce if the tool had been fully deployed. Amazon's engineering team recognized the bias issues and attempted to implement fixes by editing the algorithm to neutralize its response to specific gendered terms like "women's." However, the team could not guarantee that the system would not identify and exploit other indirect proxies for gender that were embedded in the resume data. Recognizing the fundamental flaws in the approach and the impossibility of ensuring fair outcomes, Amazon ultimately scrapped the entire project in 2017. The company never used the tool as the primary or sole basis for making hiring decisions, though it had been used experimentally to supplement traditional recruiting processes during its development phase. The incident gained widespread public attention when Reuters reported on it in October 2018, making it one of the most prominent and frequently cited examples of algorithmic bias in employment practices. The revelation sparked significant discussion among policymakers, civil rights advocates, and technology companies about the risks of using AI in hiring decisions without adequate bias testing and oversight. The incident contributed to growing awareness of how historical discrimination can be perpetuated and amplified through machine learning systems trained on biased datasets. The Amazon case became a catalyst for regulatory action and industry policy changes regarding AI in employment. It directly influenced the development of New York City's Local Law 144, which requires companies to conduct bias audits of automated employment decision tools before deployment. The incident also prompted other major technology companies to examine their own AI systems for similar biases and led to increased investment in fairness research and bias detection methodologies across the industry. Amazon's experience demonstrated the critical importance of diverse training data and rigorous bias testing in AI development, particularly for applications with significant social and economic impact.

Root Cause

The model was trained on 10 years of historical hiring data at Amazon, which reflected the male-dominated composition of the tech industry. The AI learned to penalize any signals associated with female candidates, including the word "women's" (as in "women's chess club") and attendance at all-women's colleges.

Mitigation Analysis

Provenance tracking would not have prevented the underlying bias, but an audit trail documenting the training data composition, model decisions, and output patterns would have enabled earlier detection. Systematic logging of AI recommendations alongside demographic data would have revealed the discriminatory pattern. This case demonstrates that AI systems used in employment decisions require ongoing bias audits and output monitoring, not just input controls.

Lessons Learned

AI systems trained on historical data inherit the biases present in that data. Employment-related AI requires mandatory bias audits and cannot be validated solely on accuracy metrics. Attempting to patch bias after the fact is insufficient — fairness must be designed into the system from the ground up.