ChatGPT Maker Suspects China’s Dirt Cheap DeepSeek AI Models Were Built Using OpenAI Data — and the Irony Is Not Lost on the Internet

Home > News > ChatGPT Maker Suspects China’s Dirt Cheap DeepSeek AI Models Were Built Using OpenAI Data — and the Irony Is Not Lost on the Internet

ChatGPT Maker Suspects China’s Dirt Cheap DeepSeek AI Models Were Built Using OpenAI Data — and the Irony Is Not Lost on the Internet

Feb 21,25

OpenAI suspects that DeepSeek, a Chinese AI model significantly cheaper than Western counterparts, may have been trained using OpenAI's data. This revelation, coupled with DeepSeek's rapid rise in popularity, sent shockwaves through the US tech industry, causing a significant drop in the stock prices of major AI players. Nvidia, a key player in GPU technology crucial for AI model development, suffered the most substantial loss in Wall Street history, with a 16.86% share drop. Microsoft, Meta, Alphabet, and Dell also experienced considerable declines.

DeepSeek's R1 model, based on the open-source DeepSeek-V3, boasts significantly lower training costs (estimated at $6 million) compared to Western models like ChatGPT. While this claim is disputed by some, it has raised concerns about the billions invested by American tech companies in AI, unsettling investors.

OpenAI and Microsoft are investigating whether DeepSeek violated OpenAI's terms of service by using its API or employing "distillation," a technique that extracts data from larger models. OpenAI acknowledged that Chinese companies frequently attempt to replicate leading US AI models and stated their commitment to protecting their intellectual property through countermeasures and collaboration with the US government.

David Sacks, President Trump's AI czar, confirmed evidence suggesting DeepSeek used distillation to leverage OpenAI models. He anticipates that leading AI companies will implement measures to prevent such practices in the future.

The situation highlights a significant irony: OpenAI, itself accused of using copyrighted internet data to train ChatGPT, is now accusing DeepSeek of similar practices. This hypocrisy has been widely noted, especially considering OpenAI's previous statement to the UK's House of Lords that training leading AI models without copyrighted material is impossible. This position is further underscored by ongoing lawsuits, including one from the New York Times alleging unlawful use of its content and another from 17 authors claiming "systematic theft." The complex legal landscape surrounding AI training data and copyright continues to evolve, particularly in light of a 2018 US Copyright Office ruling that AI-generated art is not copyrightable.

DeepSeek is accused of using OpenAI’s model to train its competitor using distillation. Image credit: Andrey Rudakov/Bloomberg via Getty Images.

Exfil: Android's Latest Shooter Sensation Drops