← Back to incidents

Babylon Health AI Misdiagnosed Medical Conditions as Non-Urgent in NHS Service

High

Babylon Health's AI triage system in the NHS GP at Hand service incorrectly classified serious medical conditions as non-urgent. BBC investigation revealed systematic failures that could have delayed critical care for patients.

Category
Medical Error
Industry
Healthcare
Status
Resolved
Date Occurred
Jan 1, 2019
Date Reported
Oct 14, 2019
Jurisdiction
UK
AI Provider
Other/Unknown
Application Type
chatbot
Harm Type
physical
People Affected
100,000
Human Review in Place
No
Litigation Filed
No
Regulatory Body
Care Quality Commission
medical_ainhstriagepatient_safetyhealthcare_aibabylon_healthmisdiagnosisuk_healthcare

Full Description

Babylon Health operated an AI-powered medical triage service called GP at Hand, which was integrated into the UK's National Health Service (NHS) starting in 2017. The service used artificial intelligence to assess patient symptoms through a chatbot interface and determine the urgency of medical conditions before routing patients to appropriate care pathways. By 2019, the service was serving over 100,000 NHS patients in London. In October 2019, a BBC investigation revealed serious flaws in Babylon Health's AI system. The investigation found that the AI consistently misclassified potentially dangerous medical conditions as non-urgent, including symptoms that could indicate heart attacks, strokes, and other life-threatening emergencies. The BBC tested the system with various symptom combinations and found multiple instances where conditions requiring immediate emergency care were classified as routine or non-urgent matters. The investigation highlighted specific cases where the AI failed to recognize red flag symptoms. For example, chest pain combined with other cardiac symptoms was sometimes categorized as a minor issue rather than a potential heart attack requiring emergency intervention. Similarly, neurological symptoms that could indicate stroke were downgraded to non-urgent categories. These failures occurred despite Babylon's claims that their AI was equivalent to or better than human doctors in diagnostic accuracy. Following the BBC investigation and subsequent regulatory scrutiny from the Care Quality Commission, Babylon Health faced significant criticism from medical professionals and patient safety advocates. The company was forced to review and modify its algorithms, though the damage to its reputation was substantial. The incident contributed to growing skepticism about AI in healthcare and highlighted the risks of deploying insufficiently tested systems in clinical settings. Babylon Health eventually collapsed in 2023, with its assets sold to creditors for a fraction of its former valuation of over $2 billion.

Root Cause

The AI triage algorithm lacked sufficient training data for edge cases and complex symptom presentations, leading to systematic misclassification of serious conditions as non-urgent. The system's confidence thresholds were likely set too aggressively, causing it to downplay symptoms that required immediate medical attention.

Mitigation Analysis

A mandatory human clinician review step for all high-risk symptom combinations could have caught dangerous misclassifications. Enhanced model testing with adversarial medical scenarios and comprehensive validation against established triage protocols would have revealed these gaps. Real-time monitoring of patient outcomes and feedback loops from emergency department visits could have identified systematic errors earlier.

Lessons Learned

The incident demonstrated that AI systems in healthcare require extensive validation against real-world clinical scenarios before deployment. It highlighted the critical importance of conservative safety margins when human life is at stake and the need for transparent testing protocols that can be independently verified by medical professionals.