Over the past week, Artificial Intelligence ( AI ) experts and investors have been paying close attention to the amount of computing power DeepSeek used to train its models because the answer may have significant implications for the technology’s future development.
Advertisement
In a published paper on its DeepSeek-V3 large language model ( LLM), which was launched in December, the Chinese start-up claimed that training took just 2.8 million” GPU hours” at a cost of US$ 5.6 million, a fraction of the time and money that US firms have been spending on their own models.
DeepSeek-R1, the company’s open-source logic type released on January 20, has demonstrated features similar to those of more advanced models from OpenAI, Anthropic and Google, but also with substantially lower education costs. The report on R1 did not mention the cost of growth.
DeepSeek’s individual records, and those of its affiliated hedge fund High-Flyer Quant, show that the company is one of the best-sourced companies for teaching AI. As early as 2019, Liang Wenfeng, the founder of High-Flyer and DeepSeek, had spent 200 million yuan ( US$ 27.8 million ) to buy 1, 100 graphics processing units ( GPUs ) to train algorithms for stock trading. High-Flyer said its technology centre at the time covered an area relative to a basketball judge, according to business records, which would have put it around 436.6 square feet (4, 700 square feet ).
Advertisement
In 2021, the bank spent 1 billion yuan on the development of its computer cluster Fire-Flyer 2, which was expected to reach 1, 550 petaflops, a measurement of computing power, according to High-Flyer’s website. This would be comparable in functionality to some of the world’s most powerful microprocessors.