Thought Leadership

What is DeepSeek, and why does it matter?

Client Updates

Everyone seems to be talking about DeepSeek, and its latest AI technologies. But what is DeepSeek? What has it produced? And why is everyone talking about them? This client update is intended to provide some of the basic facts around DeepSeek and identify a few new issues and opportunities that may be relevant to corporate cybersecurity and AI adoption efforts.

Who is DeepSeek?

DeepSeek is an arm of a Chinese hedge fund known as “High-Flyer.”1 One of the co-founders of High-Flyer, Liang Wenfeng, founded DeepSeek to make generally applicable generative AI models. Its first model was released on November 2, 2023.2 But the models that gained them notoriety in the United States are two most recent releases, V3, a general large language model (“LLM”), and R1, a “reasoning” model. According to DeepSeek’s benchmark scores, these new models provide strong performance across the board – including approaching or exceeding US frontier models in many key areas. For example, on the GPQA Diamond benchmark, which tests performance on Ph.D.-level science questions, DeepSeek R1 was able to achieve a score of 73.3%, which is close to the reported leading score of a US frontier model at 77.3% (humans with relevant credentials score about 75%).3  

Why Does DeepSeek R1 Matter?

But it is not the performance of R1 that is making waves. DeepSeek is mostly drawing attention due to its alleged very low cost to develop. DeepSeek claims that it trained these models on a compute budget of only about $5.6M,4  although this assertion remains contested.  In contrast, the training costs for other leading frontier LLMs in 2024 were estimated to be on the order of $100M.5 If the numbers reported by DeepSeek are correct, cutting-edge AI development and deployment may be within the reach of many more organizations. But how is such a dramatic reduction in training costs even possible? 

Fortunately, DeepSeek has open-sourced its models6, and provided numerous detailed technical reports describing those models.7 As a result, even if the costs reported by DeepSeek cannot be verified, the technology used by DeepSeek can be examined. In addition to incorporating a number of known optimizations for training, DeepSeek specifically points to three innovations that made this possible:

  • Improvements in Data Processing – DeepSeek V3 incorporates a technique called Multi-Head Latent Attention (MLA) which provides a much more efficient way to process large amounts of data during inference, using about half as much memory as comparable techniques.8 
  • Improvements to Model Architecture – DeepSeek V3 introduces an unique architecture for Mixture-of-Experts (“MoE”) models.9 In an MoE model, that is divided up into several segments referred to as “experts,” and during inference, a routing model chooses only a subset of the models to actually predict the next token.10 Because only a portion of the entire model needs to be calculated for any given token, a substantial amount of computation can be avoided for each forward pass of the model.11 
  • Improvements in Model Training – DeepSeek V3 uses a multi-token training objective to further improve model performance.12 In this technique, the model is used to predict multiple tokens (segments of words) for each training run.13 DeepSeek speculates that this approach means that the model can “pre-plan” a few tokens in advance, making its prediction of the next token more robust.14

DeepSeek’s technical reports also include a wealth of information on DeepSeek’s training pipeline, and numerous other optimizations that DeepSeek implemented to maximize the compute efficiency of training the model. 

In addition, DeepSeek’s R1 model also appears to be somewhat groundbreaking. R1 is a “reasoning” model that produces a chain-of-thought before arriving at an answer.15  The “breakthrough,” as it were, in the R1 model was that it was able to produce a strong reasoning model with minimal complexity. As the report describes, the approach for R1 was to start with a “cold start” set of training examples to train the model how to think, and then apply reinforcement learning techniques to the answer only – rather than on intermediate thinking steps.16 Using this technique, DeepSeek was able to achieve very high benchmark scores in fields such as science, coding, and mathematics.

Last, DeepSeek’s achievements appear to be a validation that open approaches to AI development may accelerate those efforts.17 By releasing detailed research reports and providing the models for free, everyone can benefit from the innovations made by others – including smaller players in the AI marketplace.  The previous assumption was that “big tech” incumbents and well-funded private companies would have a durable and large lead over smaller, more resource-constrained labs.

However, DeepSeek’s advancements have shown that smaller labs can compete with larger players by publicly sharing their own research – and benefiting from the research of others. This has the potential to drive more investment to smaller AI research labs, and spur those larger incumbents and startups to move more quickly – and possibly be more open about their own advancements. 

But this acceleration comes with risks. The AI arms race could reduce the opportunity for thorough safety testing and alignment before models are released, effectively shifting the risk of AI misuse from model providers to companies using and deploying those models. For example, while DeepSeek provided thorough details on how it made its models, the documentation is much lighter on explaining their approach to model safety, and does not suggest that much adversarial testing has been done. Emerging third-party adversarial testing has shown that R1 does not have effective mitigations in place to prevent it from providing criminal advice, developing malware, or assisting in weapons development.18

Opportunities and Challenges with Using DeepSeek Models

While the DeepSeek V3 and R1 models are quite powerful, there are some additional complexities to using either of these models in a corporate setting. First, the official DeepSeek applications and developer API are hosted in China. As a result, using models directly from DeepSeek means sending corporate data to servers located in China. Those servers are then subject to Chinese law, including laws permitting access to that information by government officials. This is, of course, in addition to the IP, cybersecurity, and data privacy concerns that apply to all LLMs, including DeepSeek’s.

Additionally, as measured by benchmark performance, DeepSeek R1 is the strongest AI model that is available for free. The models can be used either on DeepSeek’s website, or through its mobile applications at no cost. As of this writing, the DeepSeek iOS app was the most-downloaded application on the iOS app store. This may create additional incentives for employees to use DeepSeek as a form of “dark IT” to be used in their work. This is a similar problem to existing generally available AI applications, but amplified both due to its capabilities and the fact that user data is stored in China and is subject to Chinese law. 

However, because DeepSeek has open-sourced the models, those models can theoretically be run on corporate infrastructure directly, with appropriate legal and technical safeguards. DeepSeek has provided an entire family of V319 and R120  models for download, including the models themselves, and smaller models distilled from those base models. While the base models are still very large and require data-center-class hardware to operate, many of the smaller models can be run on much more modest hardware. Of course, as with all software, nothing should be deployed in a corporate environment without a thorough cybersecurity review.  If you are interested in local model adoption, please contact an author about how we can help in your evaluation of appropriate legal safeguards.

Conclusion

The innovations presented by DeepSeek should not be generally viewed as a sea change in AI development. Even the core “breakthroughs” that led to the DeepSeek R1 model are based on existing research, and many were already used in the DeepSeek V2 model. However, the reason why DeepSeek seems so significant is the improvements in model efficiency – reducing the investments necessary to train and operate language models. And it does not appear to have forfeited many capabilities in the process. As a result, the impact of DeepSeek will most likely be that advanced AI capabilities will be available more broadly, at lower cost, and more quickly than many anticipated. However with this increased performance comes additional risks, as DeepSeek is subject to Chinese national law, and additional temptations for misuse due to the model’s performance.

Additionally, there are still many unanswered questions regarding DeepSeek, including what data was used in training, how much the model cost to develop, and what additional risks may arise from using foreign-sourced AI technologies. Further, it is widely reported that the official DeepSeek apps are subject to considerable moderation to abide by the Chinese government's policy perspectives.21 We are actively monitoring these developments.


1Caiwei Chen, How a Top Chinese AI Model overcame US sactions, MIT Tech. Review (Jan. 24, 2025) (available at https://www.technologyreview.com/2025/01/24/1110526/china-deepseek-top-ai-despite-sanctions/)
2https://github.com/deepseek-ai/DeepSeek-LLM
3DeepSeek, DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, Jan. 22, 2025 (available at https://arxiv.org/abs/2501.12948) (“DeepSeek R1 Report”).
4DeepSeek, DeepSeek-V3 Technical Report, December 27, 2024 (available at https://arxiv.org/abs/2412.19437) (“DeepSeek V3 Report”)
5Nestor Maslej et al., The AI Index 2024 Annual Report, Institute for Human-Centered AI, Stanford University, at 63 (April 2024) (available at https://aiindex.stanford.edu/report/)
6https://huggingface.co/deepseek-ai/DeepSeek-R1
7DeepSeek V3 Report, DeepSeek R1 Report
8DeepSeek V3 Report, at 7-8.
9DeepSeek V3 Report, at 8-10. 
10DeepSeek, DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models, at 4-5 (Jan. 11, 2024). 
11Id.
12DeepSeek V3 Report, at 10-11.
13Id.
14Id.
15DeepSeek R1 Report.
16DeepSeek R1 Report, 9-11.
17Cade Metz & Mike Issac, N.Y. Times, Meta Gave Away it’s A.I. Grown Jewels. DeepSeek Vindicated Its Strategy. (Jan. 29, 2025) (available at https://www.nytimes.com/2025/01/29/technology/meta-deepseek-ai-open-source.html).
18KELA, DeepSeek R1 Exposed: Security Flaws in China’s AI Model, (Jan. 27, 2025) (available at https://kelacyber.com/blog/deepseek-r1-security-flaws).
19https://huggingface.co/deepseek-ai/DeepSeek-V3
20https://huggingface.co/deepseek-ai/DeepSeek-R1
21Eli Tan, N.Y. Times, First Impressions of DeepSeek’s AI Chatbot (Jan. 27, 2024) (available at https://www.nytimes.com/2025/01/27/technology/deepseek-ai-chatbot-first-impressions.html)

ABOUT BAKER BOTTS L.L.P.
Baker Botts is an international law firm whose lawyers practice throughout a network of offices around the globe. Based on our experience and knowledge of our clients' industries, we are recognized as a leading firm in the energy, technology and life sciences sectors. Since 1840, we have provided creative and effective legal solutions for our clients while demonstrating an unrelenting commitment to excellence. For more information, please visit bakerbotts.com.

Industries

Related Professionals