In the fast-paced world of artificial intelligence, the soaring costs of developing and deploying large language models (LLMs) have become a significant hurdle for researchers, startups, and independent developers.
As tech giants like OpenAI, Google, and Microsoft continue to dominate the field, the price tag for training state-of-the-art models keeps climbing, leaving innovation in the hands of a few deep-pocketed corporations. But what if this dynamic could change?
That is where DeepSeek comes in as a significant change in the AI industry. Operating on a fraction of the budget of its heavyweight competitors, DeepSeek has proven that powerful LLMs can be trained and deployed efficiently, even on modest hardware.
By pioneering innovative approaches to model architecture, training methods, and hardware optimization, the company has made high-performance AI models accessible to a much broader audience.
This blog dives into how DeepSeek has unlocked the secrets of cost-effective AI development. We will explore their unique strategies for building and training models, as well as their clever use of hardware to maximize efficiency.
Beyond that, we’ll consider the wider implications of their success – how it could reshape the AI landscape, level the playing field for smaller players, and breathe new life into open-source innovation. With DeepSeek’s approach, we might just be seeing the dawn of a new era in AI, where innovative tools are no longer reserved for the tech elite.
The High-Cost Barrier of Modern LLMs
OpenAI has become a dominant provider of cloud-based LLM solutions, offering high-performing, scalable APIs that are private and secure, but the model structure, weights, and data used to train it remain a mystery to the public. The secrecy around popular foundation models makes AI research dependent on a few well-resourced tech companies.
Even accepting the closed nature of popular foundation models and using them for meaningful applications becomes a challenge since models such as OpenAI’s GPT-o1 and GPT-o3 remain quite expensive to finetune and deploy.
Despite the promise of open AI fostering accountability, the reality is that most foundational models operate in a black-box environment, where users must rely on corporate claims without meaningful oversight.
Giants like OpenAI and Microsoft have also faced numerous lawsuits over data scraping practices (that allegedly caused copyright infringement), raising significant concerns about their approach to data governance and making it increasingly difficult to trust the company with user data.
Here’s a guide to know all about large language models
DeepSeek Resisting Monopolization: Towards a Truly ‘Open’ Model
DeepSeek has disrupted the current AI landscape and sent shocks through the AI market, challenging OpenAI and Claude Sonnet’s dominance. Nvidia, a long-standing leader in AI hardware, saw its stock plummet by 17% in a single day, erasing $589 billion from the U.S. stock market (about $1,800 per person in the US).
Nvidia has previously benefited a lot from the AI race since the bigger and more complex models have raised the demand for GPUs required to train them.
Learn more about the growth of Nvidia in the world of AI
This claim was challenged by DeepSeek when they just with $6 million in funding—a fraction of OpenAI’s $100 million spent on GPT-4o—and using inferior Nvidia GPUs, managed to produce a model that rivals industry leaders with much better resources.
The US banned the sale of advanced Nvidia GPUs to China in 2022 to “tighten control over critical AI technology” but the strategy has not borne fruit since DeepSeek was able to train its V3 model on the inferior GPUs available to them.
The question then becomes: How is DeepSeek’s approach so efficient?
Architectural Innovations: Doing More with Less
DeepSeek R1, the latest and greatest in DeepSeek’s lineup was created by building upon the base DeepSeek v3 model. R1 is a MoE (Mixture-of-Experts) model with 671 billion parameters out of which only 37 billion are activated for each token. A token is like a small piece of text, created by breaking down a sentence into smaller pieces.
This sparse model activation helps the forward pass become highly efficient. The model has many specialized expert layers, but it does not activate all of them at once. A router network chooses which parameters to activate.
Models trained on next-token prediction (where a model just predicts the next work when forming a sentence) are statistically powerful but sample inefficiently. Time is wasted processing low-impact tokens, and the localized process does not consider the global structure. For example, such a model might struggle to maintain coherence in an argument across multiple paragraphs.
Read about selective prediction and its role in LLMs
On the other hand, DeepSeek V3 uses a Multi-token Prediction Architecture, which is a simple yet effective modification where LLMs predict n future tokens using n independent output heads (where n can be any positive integer) on top of a shared model trunk, reducing wasteful computations.
Multi-token trained models solve 12% more problems on HumanEval and 17% more on MBPP than next-token models. Using the Multi-token Prediction Architecture with n = 4, we see up to 3× faster inference due to self-speculative decoding.
Here, self-speculative decoding is when the model tries to guess what it’s going to say next, and if it’s wrong, it fixes the mistake. This makes the model faster because it does not have to think as hard every single time. It is also possible to “squeeze” a better performance from LLMs with the same dataset using multi-token prediction.
The DeepSeek team also innovated by employing large-scale reinforcement learning (RL) without the traditional supervised fine-tuning (SFT) as a preliminary step, deviating from industry norms and achieving remarkable results. Research has shown that RL helps a model generalize and perform better with unseen data than a traditional SFT approach.
These findings are echoed by DeepSeek’s team showing that by using RL, their model naturally emerges with reasoning behaviors. This meant that the company could improve its model accuracy by focusing only on challenges that provided immediate, measurable feedback, which saved on resources.
Hardware Optimization: Redefining Infrastructure
DeepSeek lacked the latest high-end chips from Nvidia because of the trade embargo with the US, forcing them to improvise and focus on low-level optimization to make efficient usage of the GPUs they did have.
The system recalculates certain math operations (like RootMeanSquare Norm and MLA up-projections) during the back-propagation process (which is how neural networks learn from mistakes). Instead of saving the results of these calculations in memory, it recomputes them on the fly. This saves a lot of memory since there is less data to be stored but it increases computational time because the system must do the math every time.
Explore the AI’s economic potential within the chip industry
They also use their Dual Pipe strategy where the team deploys the first few layers and the last few layers of the model on the same PP rank (the position of a GPU in a pipeline). This means the same GPU handles both the “start” and “finish” of the model, while other GPUs handle the middle layers helping with efficiency and load balancing.
Storing key-value pairs (a key part of LLM inferencing) takes a lot of memory. DeepSeek compresses key, value vectors using a down-projection matrix, allowing the data to be compressed, stored and unpacked with minimal loss of accuracy in a process called Low-Rank Key-Value (KV) Joint Compression. This means that these weights take up much less memory during inferencing DeepSeek to train the model on a limited GPU Memory budget.
Making Large Language Models More Accessible
Having access to open-source models that rival the most expensive ones in the market gives researchers, educators, and students the chance to learn and grow. They can figure out uses for the technology that might not have been thought of before.
DeepSeek with their R1 models released multiple distilled models as well, based on the Llama and Qwen architectures namely:
- Qwen2.5-Math-1.5B
- Qwen2.5-Math-7B
- Qwen2.5 14B
- Qwen2.5-32B
- Llama-3.1-8B
- Llama-3.3-70B-Instruct
In fact, using Ollama anyone can try running these models locally with acceptable performance, even on Laptops that do not have a GPU.
How to Run DeepSeek’s Distilled Models on Your Own Laptop?
- Step 1: Download Ollama Download Ollama on Windows
This will help us abstract out the technicalities of running the model and make our work easier.
- Step 2: Install the binary package you downloaded
- Step 3: Open Terminal from Windows Search
- Step 4: Once the window is open (and with Ollama running) type in:
ollama run deepseek-r1:1.5b
The first time this command is run, Ollama downloads the model specified (in our case, DeepSeek-R1-Distill-Qwen-1.5B)
- Step 5: Enjoy a secure, free, and open source with reasoning capabilities!
In our testing, we were able to infer DeepSeek-R1-Distill-Qwen-1.5B at 3-4 tokens per second on a Ci5, 12th Gen Machine with Intel Integrated Graphics. Performance may vary depending on your system, but you can try out larger distillations if you have a dedicated GPU on your laptop.
Case Studies: DeepSeek in Action
The following examples show some of the things that a high-performance LLM can be used for while running locally (i.e. no APIs and no money spent).
OpenAI’s nightmare: Deepseek R1 on a Raspberry Pi
We see Jeff talking about the effect of DeepSeek R1, where he shows how DeepSeek R1 can be run on a Raspberry Pi, despite its resource-intensive nature. The ability to run high-performing LLMs on budget hardware may be the new AI optimization race.
Use RAG to chat with PDFs using Deepseek, Langchain,and Streamlit
Here, we see Nariman employing a more advanced approach where he builds a Local RAG chatbot where user data never reaches the cloud. PDFs are read, chunked, and stored in a vector database. The app then does a similarity search and delivers the most relevant chunks depending on the user query which are fed to a DeepSeek Distilled 14B which formulates a coherent answer.
Potential Issues: Data Handling, Privacy, and Bias
As a China-based company, DeepSeek operates under a regulatory environment that raises questions about data privacy and government oversight. Critics worry that user interactions with DeepSeek models could be subject to monitoring or logging, given China’s stringent data laws.
However, this might be relevant when one is using the DeepSeek API for inference or training. If the models are running locally, there remains a ridiculously small chance that somehow, they have added a back door.
Another thing to note is that like any other AI model, DeepSeek’s offerings aren’t immune to ethical and bias-related challenges based on the datasets they are trained on. Regulatory pressures might lead to built-in content filtering or censorship, potentially limiting discussions on sensitive topics.
The Future: What This Means for AI Accessibility?
Democratizing LLMs: Empowering Startups, Researchers, and Indie Developers
DeepSeek’s open-source approach is a game-changer for accessibility. By making high-performing LLMs available to those without deep pockets, they’re leveling the playing field. This could lead to:
- Startups building AI-driven solutions without being shackled to costly API subscriptions from OpenAI or Google.
- Researchers and universities experiment with cutting-edge AI without blowing their budgets.
- Indie developers create AI-powered applications without worrying about vendor lock-in, fostering greater innovation and independence.
DeepSeek’s success could spark a broader shift toward cost-efficient AI development in the open-source community. If their techniques—like MoE, multi-token prediction, and RL without SFT—prove scalable, we can expect to see more research into efficient architectures and techniques that minimize reliance on expensive GPUs hopefully under the open-source ecosystem.
This can help decentralize AI innovation and foster a more collaborative, community-driven approach.
Industry Shifts: Could This Disrupt the Dominance of Well-Funded AI Labs?
While DeepSeek’s innovations challenge the notion that only billion-dollar companies can build state-of-the-art AI, there are still significant hurdles to widespread disruption:
- Compute access remains a barrier: Even with optimizations, training top-tier models requires thousands of GPUs, which most smaller labs can’t afford.
- Data is still king: Companies like OpenAI and Google have access to massive proprietary datasets, giving them a significant edge in training superior models.
- Cloud AI will likely dominate enterprise adoption: Many businesses prefer ready-to-use AI services over the hassle of setting up their own infrastructure, meaning proprietary models will probably remain the go-to for commercial applications.
DeepSeek’s story isn’t just about building better models—it’s about reimagining who gets to build them. And that could change everything.