Google AI Dermatology Tool Shows Racial Bias in Skin Condition Detection

Medium

Google's AI dermatology tool demonstrated significant accuracy disparities across skin tones due to biased training data, highlighting the critical need for diverse datasets in medical AI applications.

Full Description

Google developed an AI dermatology tool called DermAssist in collaboration with dermatologists to help identify potential skin conditions from smartphone photos. The tool was designed to assist both healthcare professionals and consumers in early detection of skin cancers and other dermatological conditions. The AI system was trained on a large dataset of dermatological images to recognize patterns associated with various skin conditions. In May 2021, researchers published findings in Nature Medicine documenting significant performance disparities in Google's dermatology AI across different skin tones. The study revealed that the AI tool's accuracy dropped substantially when analyzing images of darker skin, particularly for individuals with Fitzpatrick skin types IV, V, and VI. The tool showed reduced sensitivity and specificity for detecting skin cancers and inflammatory conditions on darker skin compared to lighter skin tones. The bias stemmed from the training dataset composition, which contained a disproportionate number of images featuring lighter skin tones. This data imbalance caused the machine learning model to optimize for features more commonly present in lighter skin, leading to systematic underperformance on darker skin. The researchers found that conditions like melanoma, which can appear differently on darker skin, were particularly affected by this bias. Google acknowledged the limitations and began working to address the bias through expanded data collection efforts focused on underrepresented skin types. The company partnered with dermatologists specializing in skin of color and implemented additional validation processes. The FDA also initiated a review of AI-based dermatology tools, including DermAssist, examining their performance across diverse populations as part of their regulatory evaluation process. This incident highlighted broader concerns about algorithmic bias in medical AI systems and the potential for perpetuating healthcare disparities. The findings prompted discussions within the medical AI community about the importance of diverse training data and equitable algorithm development practices to ensure fair healthcare outcomes across all patient populations.

Root Cause

Training dataset was heavily skewed toward lighter skin tones, with limited representation of darker skin types (Fitzpatrick skin types IV-VI), causing the AI model to learn features predominantly associated with lighter skin conditions.

Mitigation Analysis

This bias could have been significantly reduced through diverse training data collection ensuring representative samples across all skin types, algorithmic bias testing during development, and fairness metrics evaluation across demographic groups. Post-deployment monitoring with stratified performance metrics would have detected the disparity earlier.

Regulatory Framework References

All frameworks →

EU AI Act

Art. 10—Data & Data GovernanceArt. 9—Risk Management SystemArt. 71—Fundamental Rights Impact Assessment

ISO/IEC 42001

A.8.4—Data Quality for AIA.6.2.6—Fairness in AI Systems

NIST AI RMF

MAP 2.3—AI System Bias AssessmentMEASURE 2.6—Fairness Assessment

Lessons Learned

This incident demonstrates the critical importance of diverse, representative training data in medical AI systems and the need for systematic bias testing across demographic groups during development and deployment.

Sources

AI dermatology research paper on bias

Nature Medicine · May 18, 2021 · academic paper

AI dermatology tools show bias against darker skin

STAT News · May 18, 2021 · news

Google's response to dermatology AI bias concerns

Google · May 19, 2021 · company statement