Over the past 2 weeks, I have got messages and comments to deep dive into the training costs of Deepseek. This topic garnered so much interest that markets reacted, discussions exploded, and some even claimed the AI bubble had burst.

Now that the dust has settled, let’s take a step back and look at what the technical paper actually says.

šŸŽÆ Here’s what we do know from the DeepSeek-V3 paper:
šŸ‘‰ Pre-training:Ā 2,664K H800 GPU hours (~$5.328M)
šŸ‘‰Context Length Extension:Ā 119K H800 GPU hours (~$0.238M)
šŸ‘‰Post-Training (SFT + RLHF):Ā 5K H800 GPU hours (~$0.01M)
šŸ‘‰Total Training Cost:Ā 2.788M GPU hours (~$5.576M)

The paper clearly states that these costsĀ only cover the official trainingĀ of DeepSeek-V3. It doesĀ notĀ go deeper into costs associated withĀ prior research, ablation experiments on architectures, algorithms, or data. And that’s a big gap becauseĀ these costs are tough to estimate—they depend on the number of experiments, dataset curation, and model variations tested.

If you’re building anĀ LLM from scratch, without using pre-trained components, these costs can be massive.
1ļøāƒ£ Compute Needs – Training on raw, unoptimized data takesĀ hugeĀ GPU hours.
2ļøāƒ£ Longer Development Cycles – Teams must experiment with different architectures, optimizers, and data processing techniques.
3ļøāƒ£ No Transfer Learning – Without leveraging existing models, everything must be learned from the ground up, requiring more data and iterations.

That’s why training a next-gen model—one that pushes us closer toĀ AGI or ASI—demandsĀ huge budgets. If a paper doesn’t provide details on these steps, it’s tough to make anĀ apple-to-apple comparisonĀ with other models.

As a technologist, I’d be the happiest person to see a leading model trained at such a low cost. But to truly assess this claim, we need more data. Without it,Ā we’re just blindly following a headline.

Read the entire article here: