← Back to incidents

Epic Systems Sepsis Prediction AI Tool Shows 67% False Alert Rate in University of Michigan Study

High

Epic Systems' sepsis prediction AI tool deployed at University of Michigan showed a 67% false alert rate, with only 7% of predictions confirmed as sepsis, causing alert fatigue among clinicians.

Category
Medical Error
Industry
Healthcare
Status
Reported
Date Occurred
Apr 1, 2017
Date Reported
Jul 17, 2019
Jurisdiction
US
AI Provider
Other/Unknown
Model
Epic Sepsis Model
Application Type
embedded
Harm Type
operational
People Affected
38,000
Human Review in Place
Yes
Litigation Filed
No
sepsisfalse_alertsalert_fatigueclinical_aiepic_systemshospitalpatient_safetyfalse_positives

Full Description

Between April 2017 and April 2018, the University of Michigan Health System implemented Epic Systems' sepsis prediction AI tool as part of their electronic health record system. The tool was designed to analyze patient data continuously and alert clinicians when patients showed signs of developing sepsis, a potentially fatal condition requiring immediate intervention. The system was intended to improve patient outcomes by enabling earlier detection and treatment of this life-threatening condition. Researchers at the University of Michigan conducted a comprehensive evaluation of the AI tool's performance during its first year of implementation, analyzing data from approximately 38,000 patient encounters. The study, published in JAMA Internal Medicine in July 2019, revealed significant performance issues with the sepsis prediction system. Of all the sepsis alerts generated by the AI system, only 33% represented true positive cases where patients actually developed sepsis. This translated to a false positive rate of 67%, meaning that two-thirds of all alerts were incorrect. Even more concerning was the finding that only 7% of the AI's sepsis predictions were ultimately confirmed by clinical diagnosis. This extremely low positive predictive value meant that for every 100 alerts generated by the system, only 7 represented actual cases of sepsis. The high volume of false alerts created a phenomenon known as alert fatigue, where clinicians became desensitized to the constant stream of warnings and began to ignore or dismiss alerts without proper evaluation. The study revealed that the excessive false alerts were undermining the tool's intended purpose of improving patient safety. Rather than enhancing clinical decision-making, the unreliable predictions were creating operational burden and potentially dangerous situations where clinicians might dismiss both false and genuine alerts. The researchers noted that the alert fatigue problem was so severe that it could lead to missed opportunities for early sepsis intervention, potentially putting patients at greater risk than if no AI system had been implemented at all. The University of Michigan's findings highlighted broader issues with the deployment of AI tools in clinical settings without adequate validation and calibration. The study demonstrated that even well-intentioned AI systems from established healthcare technology vendors could produce unreliable results when deployed in real-world clinical environments, emphasizing the need for rigorous testing and ongoing monitoring of AI performance in healthcare applications.

Root Cause

The machine learning model was inadequately trained or calibrated, resulting in a high false positive rate that generated unreliable sepsis alerts. The model lacked sufficient specificity to distinguish between actual sepsis cases and patients with similar symptoms or risk factors.

Mitigation Analysis

Enhanced model validation using diverse patient populations and longitudinal testing could have identified the high false positive rate before deployment. Implementing adjustable sensitivity thresholds, requiring human confirmation for alerts, and establishing feedback loops to continuously retrain the model based on clinician responses would have reduced alert fatigue and improved accuracy.

Lessons Learned

The incident demonstrates the critical importance of extensive validation testing before deploying AI tools in clinical settings, as high false positive rates can create alert fatigue that undermines patient safety. Healthcare AI systems require continuous monitoring and recalibration to maintain accuracy in real-world clinical environments.