DeepSeek AI: Not Affordable, Cost $1.6 Billion to Develop

May 18,25

DeepSeek, a prominent Chinese startup, has made significant waves in the AI industry with its latest chatbot, which boasts a unique introduction: "Hi, I was created so you can ask anything and get an answer that might even surprise you." This bold statement reflects the innovative technologies behind DeepSeek's AI models, which have contributed to one of NVIDIA's largest stock price drops due to their competitive edge in the market.

The standout features of DeepSeek's AI include:

  • Multi-token Prediction (MTP): Unlike traditional models that predict one word at a time, DeepSeek's model forecasts multiple words simultaneously, enhancing both accuracy and efficiency by analyzing different parts of a sentence.
  • Mixture of Experts (MoE): This architecture leverages 256 neural networks, with eight activated for each token processing task, speeding up AI training and improving performance.
  • Multi-head Latent Attention (MLA): This mechanism focuses on crucial parts of a sentence, repeatedly extracting key details to minimize the chance of missing important information, thereby capturing nuanced data effectively.

DeepSeek claims to have trained its powerful neural network, DeepSeek V3, for only $6 million using just 2048 graphics processors. However, a deeper investigation by SemiAnalysis revealed a more extensive infrastructure, including approximately 50,000 Nvidia Hopper GPUs across several data centers. This includes 10,000 H800 units, 10,000 H100s, and additional H20 GPUs, used not only for AI training but also for research and financial modeling. The company's total investment in servers reaches about $1.6 billion, with operational expenses estimated at $944 million.

As a subsidiary of the Chinese hedge fund High-Flyer, DeepSeek operates independently, owning its data centers. This autonomy allows for faster innovation and implementation, as the company is self-funded and not bogged down by external bureaucratic processes. DeepSeek also attracts top talent from leading Chinese universities, with some researchers earning over $1.3 million annually.

Despite the claim of a $6 million training cost, this figure only covers GPU usage during pre-training and does not include broader expenses such as research, model refinement, data processing, or infrastructure costs. Since its inception, DeepSeek has invested over $500 million in AI development, leveraging its compact structure to drive effective AI innovations.

DeepSeek's journey highlights how a well-funded, independent AI company can challenge industry giants. However, the company's success is attributed to substantial investments, technical breakthroughs, and a strong team, rather than a "revolutionary budget." While competitors' costs are notably higher—DeepSeek spent $5 million on R1 compared to ChatGPT4o's $100 million—DeepSeek remains a formidable player in the AI landscape.

DeepSeek TestImage: ensigame.com

DeepSeek V3Image: ensigame.com

DeepSeekImage: ensigame.com

DeepSeekImage: ensigame.com

Top News
MORE
Copyright © 2024 yuzsb.com All rights reserved.