DeepSeek's Reasoning Models Are Getting Smarter and Cheaper: Here's What Changed
DeepSeek has quietly released two major upgrades to its reasoning models, with the latest versions showing significant improvements in complex problem-solving while maintaining remarkably low training costs. The Chinese AI lab's DeepSeek-R1-0528, released in May 2025, represents a substantial leap forward from the original R1 model, pushing reasoning and inference capabilities further through advanced post-training optimizations. These updates signal that open-source reasoning models are rapidly closing the gap with proprietary alternatives from companies like OpenAI.
What Makes DeepSeek's Reasoning Models Different From Regular AI Chatbots?
Most AI chatbots give you an answer immediately. DeepSeek's reasoning models work differently. Instead of jumping straight to a conclusion, they generate a step-by-step chain of thought, showing their work before delivering the final answer. This approach makes them especially powerful for tasks that require logical thinking and complex problem-solving.
The original DeepSeek-R1 was trained using large-scale reinforcement learning, a technique where the model learned reasoning patterns entirely on its own through trial and error, rather than relying on structured instruction. While this approach produced remarkable results, it came with trade-offs. The model occasionally struggled with endless repetition, poor readability, and even language mixing.
To address these issues, DeepSeek developed an improved version using a more sophisticated multi-stage training pipeline. This included incorporating thousands of "cold-start" data points to fine-tune the underlying base model before applying reinforcement learning. The result was a model that kept the reasoning power of the original while significantly improving accuracy, readability, and coherence.
How Do You Actually Use These Models for Real Work?
If you're considering deploying DeepSeek's reasoning models, here are practical steps to get the best results:
- Prompt Structure: Avoid system prompts and include all instructions directly in the user prompt to ensure the model processes your request correctly.
- Math Problems: Add a directive like "Please reason step by step, and put your final answer within \boxed{}." to guide the model toward structured solutions.
- Encourage Reasoning: If the model skips its reasoning process, tell it to start the response with "
" in your prompt to force thorough reasoning output.
The newer DeepSeek-R1-0528 version supports system prompts and doesn't require the "
How Does the Cost Compare to Other Frontier AI Models?
The economics of DeepSeek's models are striking. The original R1 was trained for approximately $294,000 USD, primarily using NVIDIA H800 graphics processing units (GPUs). This builds on roughly $6 million spent to develop the underlying V3-Base model that both R1 and the newer R1-0528 are built upon. For context, training GPT-4 is estimated to cost between $50 million and $100 million.
DeepSeek-V3, the general-purpose model released in December 2024, required only 2.788 million H800 GPU hours, translating to around $5.6 million in training costs. These numbers represent a dramatic shift in AI development economics, suggesting that efficiency in model architecture and training methodology can dramatically reduce the resources needed to build competitive systems.
What's the Difference Between DeepSeek-V3 and DeepSeek-R1?
DeepSeek offers two distinct model families, each optimized for different purposes. Understanding which to use depends on your specific needs:
- DeepSeek-V3 Use Cases: Best for content creation, writing, translation, and general question-answering where you need direct answers without extensive reasoning steps.
- DeepSeek-R1 Use Cases: Designed for complex mathematics, coding challenges, scientific reasoning, and multi-step planning for agent workflows where step-by-step thinking is valuable.
- Response Style: V3 provides direct answers (for example, "The answer is 42"), while R1 explains its reasoning process (for example, "First, calculate X, then Y, so the answer is 42").
Both models use the same Mixture-of-Experts architecture with 671 billion total parameters, but only 37 billion parameters are activated for each token. This design allows the models to be powerful while remaining efficient, as they activate only the relevant "experts" needed for each specific task.
How Has DeepSeek Improved Its Models Recently?
DeepSeek has been actively updating its lineup. In March 2025, the company released DeepSeek-V3-0324, which uses the same base model as the original V3 but incorporates lessons learned from the reinforcement learning techniques used in R1. This update improved reasoning performance, coding skills, and tool-use capabilities. In mathematics and coding evaluations, DeepSeek-V3-0324 even outperforms GPT-4.5.
The May 2025 release of DeepSeek-R1-0528 represents the most significant upgrade to the reasoning model line. While built on the same V3 base model, this version leverages more compute and advanced post-training optimizations to push reasoning and inference capabilities further. The average token usage and reasoning quality show significant improvements over the original R1.
What sets DeepSeek apart is its commitment to transparency. R1 is thought to be the first major large language model (LLM) to undergo the peer-review process, with its research published in Nature. This marks a rare moment of openness in large-scale AI research, where the training methodology and results are subject to independent scrutiny rather than kept proprietary.
For developers and organizations evaluating AI models, DeepSeek's reasoning models represent a significant shift in what's possible with open-source systems. The combination of strong performance on complex reasoning tasks, dramatically lower training costs, and transparent research methodology suggests that the landscape of AI development is becoming more competitive and accessible beyond the largest technology companies.