← Back to incidents

Samsung Engineers Leaked Proprietary Code via ChatGPT

High

Samsung semiconductor division engineers submitted proprietary source code, internal meeting notes, and hardware test data to ChatGPT on at least three separate occasions within 20 days. Samsung subsequently restricted employee use of generative AI tools and began developing an internal alternative.

Category
Privacy Leak
Industry
Technology
Status
Resolved
Date Occurred
Mar 1, 2023
Date Reported
Apr 2, 2023
Jurisdiction
International
AI Provider
OpenAI
Model
ChatGPT
Application Type
chatbot
Harm Type
operational
Human Review in Place
No
Litigation Filed
No

Full Description

On March 1, 2023, Samsung Electronics engineers in the company's semiconductor division inadvertently leaked sensitive proprietary information through ChatGPT, OpenAI's conversational AI system. The incident occurred within 20 days of Samsung initially permitting employees to use the AI chatbot for work-related tasks. Three separate engineers submitted confidential company data to ChatGPT on different occasions, seeking assistance with debugging code, analyzing performance metrics, and summarizing internal communications. The leaked information included semiconductor database source code, hardware testing data related to yield and defect measurements, and confidential meeting notes from internal discussions. The technical failure stemmed from the engineers' lack of understanding regarding how ChatGPT processes and potentially retains user inputs. When users submit text to ChatGPT, the data can be incorporated into OpenAI's training datasets for future model improvements, unless users specifically opt out of data usage. The submitted Samsung code and documentation therefore became part of OpenAI's broader training corpus, creating the possibility that fragments of Samsung's proprietary information could be surfaced in responses to other users' queries. This represents a fundamental data governance failure where sensitive intellectual property was exposed to a third-party system without proper security controls or data handling protocols. The incident exposed Samsung to significant competitive and operational risks, as the leaked information contained core semiconductor design elements and manufacturing processes that represent substantial research and development investments. While the exact financial impact remains undisclosed, semiconductor intellectual property typically involves billions of dollars in development costs and represents critical competitive advantages in the global chip market. The breach potentially compromised Samsung's position relative to competitors like TSMC, Intel, and other major semiconductor manufacturers who could theoretically access fragments of Samsung's proprietary methods through ChatGPT interactions. Beyond immediate competitive concerns, the incident raised serious questions about Samsung's data security protocols and employee training regarding emerging AI tools. Samsung responded swiftly to contain the damage, implementing a company-wide ban on ChatGPT and other generative AI tools effective immediately after discovering the breaches. The company began developing an internal AI alternative that would provide similar functionality while maintaining complete control over data handling and security protocols. Samsung also initiated comprehensive employee training programs focused on data security practices when interacting with external AI systems and updated corporate policies to explicitly prohibit the submission of proprietary information to third-party AI services. The company's response included enhanced monitoring systems to detect potential future violations of data security protocols. The Samsung incident became a watershed moment for corporate AI governance, highlighting the urgent need for clear policies governing employee use of generative AI tools. Major corporations across industries began reassessing their AI usage policies, with many implementing similar restrictions or developing internal AI capabilities to avoid third-party data exposure risks. The incident contributed to broader discussions about the default data retention practices of AI companies and whether opt-in rather than opt-out consent mechanisms should be standard for enterprise users. Technology companies including Apple, Amazon, and JPMorgan Chase subsequently implemented their own restrictions on generative AI tools, citing similar concerns about inadvertent data disclosure. The incident also accelerated regulatory scrutiny of AI data handling practices, particularly in jurisdictions with strict data protection laws. While no formal regulatory action was taken against Samsung or OpenAI, the case became frequently cited in policy discussions about AI governance and corporate data protection obligations. The breach demonstrated how rapidly emerging AI technologies can create new categories of operational risk that existing corporate security frameworks had not anticipated, leading to increased investment in AI-specific risk management capabilities across the technology sector.

Root Cause

Samsung semiconductor engineers pasted proprietary source code, internal meeting notes, and chip design data into ChatGPT for assistance with debugging and summarization. The data became part of OpenAI training data, making it potentially retrievable by other users.

Mitigation Analysis

Input-side provenance tracking (logging what data employees send to AI systems) combined with data loss prevention controls would have flagged sensitive code being pasted into external AI services. Output provenance is equally relevant: Samsung had no way to audit what AI-generated code suggestions were incorporated into production systems, or whether those suggestions were influenced by other companies data.

Lessons Learned

AI data leakage prevention requires input monitoring, not just output monitoring. Enterprise AI policies must be in place before tools are deployed. Internal AI alternatives may be necessary for organizations handling sensitive IP.

Sources