AI Tutoring Platforms Provided Incorrect Math and Science Answers to Students

Medium

Multiple AI tutoring platforms including Khan Academy's Khanmigo and Chegg's AI tutor provided incorrect answers to math and science problems. Students using these platforms experienced confusion and poor test performance, highlighting quality control issues in educational AI applications.

Full Description

In early 2025, multiple AI-powered tutoring platforms faced scrutiny after research documented significant error rates in their responses to student questions in mathematics and science. The platforms affected included Khan Academy's Khanmigo, Chegg's AI tutoring service, and several other ed-tech companies that had rapidly deployed large language model-based tutoring systems to compete in the growing online education market. Educators and researchers began documenting instances where these AI tutors provided fundamentally incorrect answers to problems ranging from basic algebra to advanced calculus and chemistry. The errors were not limited to minor computational mistakes but included conceptual misunderstandings and flawed reasoning processes that could mislead students about fundamental principles. Teachers reported receiving student work that reflected these incorrect methodologies, leading to classroom confusion and the need for remedial instruction. The issue gained prominence when education researchers published studies quantifying error rates across different platforms and subject areas. The research revealed that while AI tutors excelled at providing quick responses and maintaining engaging conversations with students, they frequently failed at the core educational task of providing accurate information. Mathematics problems involving multi-step reasoning and science questions requiring application of principles to novel scenarios showed particularly high error rates. The incident sparked broader debates about accountability in educational technology and the appropriate use of AI in learning environments. Education advocates argued that the rush to deploy AI tutoring systems prioritized engagement and cost reduction over educational accuracy and student outcomes. The platforms involved faced pressure from educators, parents, and school districts to implement stronger quality control measures and provide transparency about their AI systems' limitations and error rates.

Root Cause

AI models used by tutoring platforms generated mathematically or scientifically incorrect responses due to hallucination and training data limitations, particularly in complex problem-solving scenarios requiring step-by-step reasoning.

Mitigation Analysis

Subject matter expert review of AI responses before delivery to students could have prevented most errors. Implementing mathematical verification systems to check computational answers and establishing feedback loops from educators to flag incorrect responses would significantly reduce harm. Regular testing against standardized curriculum benchmarks and transparent error rate reporting would improve platform reliability.

Regulatory Framework References

All frameworks →

EU AI Act

Art. 9—Risk Management SystemArt. 13—Transparency & InformationArt. 14—Human Oversight

ISO/IEC 42001

6.1.2—AI Risk AssessmentA.6.2.4—Documentation of AI System Performance

NIST AI RMF

MEASURE 2.5—AI System AccuracyGOVERN 1.2—Trustworthy AI Characteristics

Lessons Learned

The incident highlighted the critical importance of rigorous testing and validation for AI systems deployed in educational contexts, where accuracy directly impacts student learning outcomes. It demonstrated the need for clear accountability standards in ed-tech and the risks of prioritizing rapid deployment over educational quality.

Sources

AI Tutoring Platforms Face Accuracy Concerns as Errors Impact Student Learning

Education Week · Jan 15, 2025 · news

The AI Tutoring Accuracy Crisis: When Helpful Becomes Harmful

Chronicle of Higher Education · Jan 12, 2025 · news