📌 China’s artificial intelligence lab didn’t just create cheap AI models it showed the inefficiencies of an industrywide approach.
– China’s artificial intelligence lab didn’t just create cheap AI models – it showed the inefficiencies of an industry-wide approach.
DeepSeek’s breakthrough showed how a small team looking to save money can rethink the way AI models are built. While tech giants such as OpenAI and Anthropic spend billions of dollars on computing power alone, DeepSeek is said to have achieved similar results by spending just over $5 million.
the company’s models are OpenAI’s GPT-4o (OpenAI’s best LLM) ), OpenAI’s o1-OpenAI (the best output model currently available) and Anthropic’s Claude 3.5 Sonnet, which meets or exceeds many benchmarks and uses about 27.88 million hours on an H800 GPU for full training. This is just a fraction of the hardware that is generally considered necessary.
the model is so good and efficient that within days it rose to the top of the iOS app category, challenging OpenAI’s dominance.
necessity is the mother of innovation. The team was able to accomplish this by utilizing techniques that American developers never even had to think about and that are not dominant today. Perhaps the most important of these techniques is that instead of full precision, DeepSeek used 8-bit learning, which reduced memory requirements by 75% .
they came up with 8-bit floating point learning. As far as I know, 8-bit floating point learning is not very well understood. In the US, most training programs still work with FP16.
FP8 uses half as much memory and storage bandwidth as FP16. For large AI models with billions of parameters, this reduction is significant; DeepSeek had to make do with weak hardware, but OpenAI has never faced such limitations.
On Monday, Chinese artificial intelligence company DeepSeek (DeepSeek) cast a shadow over Wall Street’s favorite script, sending the bitcoin price below the $98,000 mark.
Researchers at the startup, which unveiled an open-source artificial intelligence model called DeepSeek R1, said the model is a state-of-the-art OpenAI inference systems. According to them, thanks to a new training method, queries in DeepSeek R1 are 98% cheaper than the flagship OpenAI model and cost significantly less.
DeepSeek also processes entire phrases at once, rather than individual words, with
another method used by the company is called distillation, where smaller models reproduce the results of larger models without training on the same knowledge base. . This has resulted in a very efficient, accurate and competitive small model.
the company has also used a technique called expert blending, which improves the efficiency of the model. While in traditional models all parameters are always active, the DeepSeek system uses 671 billion parameters, but only 37 billion are active at any given time. It’s like having a large team of experts, but using only the ones needed to solve a particular problem.