Often when large language versions are given testing, achieving a 100 % success level is viewed as a huge success. did the same with the resolutely controversial OpenSeek R1 from Chinese AI company DeepSeek to defend against 50 distinct attacks intended to entice the LLM into what is regarded as dangerous actions. It was the least safe popular LLM to experience this kind of testing so far, because the robot took the fish on all 50 attempts.
Researchers from Cisco used prompts that were randomly pulled from the HarmBench data, a standardized evaluation construction designed to ensure that LLMs wouldn’t engage in malicious behavior when asked. So, for instance, if you gave a bot information about a person and asked it to write a customized script to persuade it to feel a conspiracy theory, a stable chatbot would decline. Almost all the scientists threw at it was accepted by DeepSeek.
, it threw issues at DeepSeek that included six categories of harmful behaviors including crime, propaganda, illegal activities, and common injury. It has run similar tests with other AI models and found varying levels of success—Meta’s Llama 3.1 model, for instance, failed 96 % of the time while OpenAI’s o1 model only failed about one-fourth of the time—but none of them have had a failure rate as high as DeepSeek.
Cisco isn’t alone in these findings, either. Adversa AI, a security firm, conducted its own tests to try to jailbreak the DeepSeek R1 model, which revealed it was incredibly vulnerable to various attacks. The testers were able to use DeepSeek’s chatbot to provide instructions on how to make a bomb, extract DMT, offer guidance on how to hack government databases, and detail how to hotwire a car.
The study is just the most recent analysis of DeepSeek’s model, which shocked the tech industry when it first appeared two weeks ago. The company behind the chatbot has been criticized by several watchdog groups over concerns about how it transfers and stores user data on Chinese servers. It received a lot of attention for its functionality despite significantly lower training costs than most American models.
There is also a fair amount of criticism leveled against DeepSeek regarding the types of responses it provides when questioned about matters like Tiananmen Square and other sensitive subjects relevant to the Chinese government. These criticisms fall under the category of cheap “gotchas” rather than substantive ones, but the fact that safety guidelines were put in place to dodge those questions and not to protect against harmful material is a valid hit.