DeepSeek: What lies under the bonnet of the new AI robot?

1 minute ago
Andrew Duncan
imageGetty Images A woman looks a phone with the DeepSeek logo in the background (Credit: Getty Images)Getty Images

The launch of a fresh AI bot by a small Chinese firm has been accompanied by tumbling stock market values and outrageous claims. What makes it so unique?

The reason behind this turmoil? The “large language model” ( LLM) that powers the app has reasoning abilities comparable to those of US models like OpenAI’s o1, but it reportedly requires a fraction of the cost to train and operate.

Analysis

At the Alan Turing Institute in London, United Kingdom, Dr. Andrew Duncan is the director of science and innovation important AI.
DeepSeek claims to have accomplished this by utilizing a number of specialized methods that reduced both the amount of memory needed to store and the processing time required to train its design ( also known as R1 ). According to DeepSeek, the decline of these overheads resulted in a serious cost reduction. R1’s base model V3 reportedly required 2.788 million hours to train ( running across numerous graphical processing units – GPUs at the same time ), for an estimated cost of less than$ 6 million ( £4.8 million ), compared to the more than$ 100 million ( £80 million ) that OpenAI boss Sam Altman claims was needed to train GPT-4.
Despite the strike taken to Nvidia’s market value, the DeepSeek models were trained on around 2, 000 Nvidia H800 GPUs, according to one study report released by the company. These cards were created in accordance with trade regulations in China and are a modified version of the popular H100 chip. These were good in stock before the Biden presidency tightened the restrictions more, which properly forbade Nvidia from exporting the H800s to China in October 2023. Working within these considerations, it is likely that DeepSeek has been forced to find creative ways to make the best use of the resources it has at hand.
Reducing the computing costs of running and training designs may also address issues about the negative effects of AI on the environment. The information centers they run on have a lot of electricity and water needs, primarily to prevent the machines from overheating. Although the majority of tech companies do not make their carbon footprints known when they run their versions, a new estimate estimates ChatGPT’s regular carbon dioxide emissions are the equivalent of 260 flights from London to New York. Therefore, from an economic perspective, increasing the effectiveness of AI models may be beneficial for the business.

Of course, whether DeepSeek’s models do offer real-world savings in electricity remains to be seen, and it’s also unclear if cheaper, more efficient AI could lead to more people using the model, and thus an increase in overall energy intake.

If nothing else, it might help elevate green AI to the forefront of the forthcoming Paris AI Action Summit’s agenda so that the AI tools we use in the future are also more environmentally friendly.
What has surprised many people is how fast DeepSeek came onto the scene with such a massive, competitive language business unit. Liang Wenfeng was the only person to be recognized as an” AI hero” in China at the time of his founding.
The design is made up of a number of much smaller models, each with expertise in a particular field.
The most recent DeepSeek design is unique because of its open release of its “weights,” which are the numeric parameters of the design obtained during the training procedure, as well as a professional paper describing the model’s development process. This enables different organizations to use their own equipment to run the model and adapt it to a variety of things.
In contrast to OpenAI’s o1 and o3, which are essentially dark boxes, researchers can then look under the woman’s cap to learn what it does. However, some details remain incomplete, such as the datasets and the training code for the models, but groups of researchers are now attempting to piece these collectively.

Not all of DeepSeek’s cost-cutting techniques are fresh either – some have been used in various LLMs. Mistral AI made a public release of its Mixtral 8x7B design in 2023 that was comparable to the more advanced models at the time. Both Mixtral and the DeepSeek models make use of the “mixture of experts” approach, which consists of a group of many smaller models each with expertise in a particular field. Given a job, the combination model assigns it to the most competent “expert”.

DeepSeek has also revealed its ineffective attempts to improve LLM reasoning using other technical methods, including Monte Carlo Tree Search, a method that has long been touted as a potential method to guide the LLM’s reasoning process. Researchers may be utilizing this data to research how the model’s now outstanding problem-solving abilities can be even more advanced, improvements that are likely to be used in the next generation of AI models.

imageGetty Images Modified versions of Nvidia's H100 GPUs were used by DeepSeek to train its LLM (Credit: Getty Images)Getty Images

What does all of this mean for the AI industry’s coming?

DeepSeek might be demonstrating that creating powerful AI designs doesn’t require a lot of money. As businesses discover ways to create model training and operation more effective, I’d venture to see very worthy AI models being developed with always fewer resources.

Up until now, the AI landscape has been dominated by” Great Software” companies in the US – Donald Trump has called the fall of DeepSeek” a wake-up call” for the US tech industry. However, this development may not always be negative for companies like Nvidia in the long run because businesses and governments will be able to adopt AI technology more quickly as the cost of developing it decreases. That will in turn drive demand for new products, and the cards that strength them – and so the cycle continues.

It seems plausible that smaller businesses like DeepSeek will be playing a growing part in developing AI devices that have the ability to simplify our lives. That would be an error to understate.

For more science, engineering, environment and health reports from the BBC, follow us on&nbsp, Facebook, &nbsp, X&nbsp, and&nbsp, Instagram.

Leave a Comment