← Back to incidents

IBM Watson for Oncology Recommended Unsafe Cancer Treatments

Critical

IBM Watson for Oncology recommended dangerous cancer treatments, including chemotherapy for patients with severe bleeding, due to flawed training on hypothetical cases rather than real outcomes data.

Category
Medical Error
Industry
Healthcare
Status
Resolved
Date Occurred
Jan 1, 2016
Date Reported
Jul 25, 2018
Jurisdiction
International
AI Provider
Other/Unknown
Model
Watson for Oncology
Application Type
embedded
Harm Type
physical
Estimated Cost
$62,000,000
Human Review in Place
Yes
Litigation Filed
No
medical_aicancer_treatmentclinical_decision_supportibm_watsonhealthcare_deploymentinternational_healthcare

Full Description

IBM Watson for Oncology, launched in 2012 as a flagship AI healthcare product, was designed to analyze cancer patient data and recommend treatment options to oncologists worldwide. The system was developed in partnership with Memorial Sloan Kettering Cancer Center and marketed as a tool that could democratize access to expert cancer care globally. IBM invested heavily in the platform, positioning it as a breakthrough application of artificial intelligence in medicine. Internal IBM documents obtained by STAT News in July 2018 revealed serious safety issues with Watson's treatment recommendations. In one documented case, the system recommended a chemotherapy drug for a patient with severe bleeding, despite the fact that the medication could exacerbate bleeding and potentially cause life-threatening complications. The documents showed that Watson's recommendations were frequently unsafe, inappropriate, or contradicted standard medical guidelines. These problems were identified across multiple cancer types and patient scenarios. The root cause of these dangerous recommendations traced back to Watson's training methodology. Rather than learning from real patient outcomes and evidence-based medicine, the system was primarily trained on hypothetical patient cases and the treatment preferences of a small group of oncologists at Memorial Sloan Kettering. This approach meant Watson reflected institutional biases and subjective preferences rather than proven medical effectiveness. The training data lacked the diversity and real-world validation necessary for safe clinical deployment. International deployments of Watson for Oncology revealed additional problems when the system encountered different patient populations and medical practices. Hospitals in India and South Korea reported that Watson's recommendations often didn't align with local treatment protocols or patient characteristics. The system struggled to adapt to variations in patient demographics, healthcare infrastructure, and medical practice patterns outside the United States. These limitations became apparent as IBM aggressively marketed Watson globally without adequate localization or validation. Following the STAT News investigation and mounting criticism from the medical community, IBM began retreating from healthcare AI applications. The company scaled back Watson Health operations, sold off health-related assets, and shifted focus away from clinical decision support tools. Multiple hospitals that had invested in Watson for Oncology discontinued its use, citing concerns about recommendation quality and clinical utility. The incident highlighted fundamental challenges in developing and deploying AI systems for high-stakes medical decisions.

Root Cause

The AI was trained primarily on hypothetical cases and treatment preferences of doctors at Memorial Sloan Kettering rather than real patient outcomes data. The system reflected institutional biases and lacked sufficient real-world clinical validation.

Mitigation Analysis

More rigorous clinical validation with real patient outcome data rather than expert opinions could have prevented many errors. Independent medical review boards should validate AI recommendations before deployment. Continuous monitoring of recommendation patterns and patient outcomes could identify problematic suggestions. Training on diverse, international patient populations rather than single-institution preferences would improve generalizability.

Lessons Learned

The incident demonstrates that AI medical systems require rigorous validation with real patient outcome data rather than expert opinions alone. Training AI on diverse, global patient populations is essential for safe international deployment. Continuous monitoring and validation remain critical even after initial deployment.

Sources

IBM Watson for Oncology: What Went Wrong?
IEEE Spectrum · Jul 25, 2017 · academic paper
IBM Had a Watson Problem in Healthcare
Wall Street Journal · Aug 15, 2019 · news
IBM Watson for Oncology Recommended Unsafe Cancer Treatments | Provyn Index