Text reads: Exploring DeepSeek, with Nck Luzio and SallyAnn Delucia's headshots

How DeepSeek is Pushing the Boundaries of AI Development

Published Feb 21, 2025

Paper Readings

Sarah Welsh

Contributor

How do you train an AI model to think more like a human? That’s the challenge DeepSeek is tackling with its latest models, which push the boundaries of reasoning and reinforcement learning.

In a recent paper reading, SallyAnn DeLucia, a product manager at Arize, and Nick Luzio, a solutions engineer, broke down the key insights from DeepSeek’s research. We dive into DeepSeek, which has been dominating headlines for its significant breakthrough in inference speed over other models. What’s next for AI (and open source)? From training strategies to real-world performance, here’s what you need to know.

Watch

Listen

Resources

DeepSeek API docs
Sign up for future paper readings

Summary

AI Industry Updates: The Bigger Picture

Before diving into DeepSeek, we covered some of the latest trends shaping the AI landscape:

OpenAI’s Deep Research – A new autonomous search tool powered by OpenAI’s upcoming o3 reasoning model is on the horizon.
DeepSeek’s Training Costs – Initial estimates suggested DeepSeek’s training costs were around $5 million, but new data is challenging that figure.
AI Spending Surge – Amazon, Google, Meta, and Microsoft have spent a staggering $300 billion on AI infrastructure.
AI Goes Mainstream – AI-generated content is appearing in Super Bowl ads, art, and music, sparking debates about authenticity and ethics.

What is DeepSeek?

DeepSeek’s mission is clear: improve AI reasoning through reinforcement learning. Their latest models—DeepSeek R.1 and R.1.0—explore how RL can be used to refine reasoning without the need for traditional pretraining.

DeepSeek Model Evolution:

DeepSeek R.1.0 – Trained purely through reinforcement learning, with no pretraining. It showcased impressive reasoning abilities but struggled with language consistency.
DeepSeek R.1 – Introduced supervised fine-tuning before RL, significantly improving language coherence while maintaining strong reasoning performance.

Competitive Performance – DeepSeek R.1 performs on par with OpenAI’s O 1 model, even surpassing it in some math-related tasks.

Why does reasoning matter? It’s a critical step toward Artificial General Intelligence (AGI) and plays a key role in AI agent capabilities.

How DeepSeek’s Models Work

DeepSeek R.1.0 was trained without human input using reinforcement learning alone. Instead of relying on human annotations, the model was rewarded for:

Accuracy – Correct answers received higher scores.
Formatting – The model was guided toward structured reasoning, leading to the emergence of “thinking brackets”, a structured way to show its work.

One fascinating discovery: DeepSeek R.1.0 self-corrected its reasoning process, showing “Aha moments” where it adjusted its thought process mid-generation—a behavior not seen in earlier models.

To address R.1.0’s language inconsistency, DeepSeek R.1 introduced:

Supervised fine-tuning – The model was trained on high-quality, human-annotated data before reinforcement learning.
Improved readability – Ensured responses were formatted clearly and remained in one language.

While DeepSeek R.1 is more readable, it traded off some benchmark performance where R.1.0’s mixed-language approach had an advantage.

To make DeepSeek more accessible, the team distilled the massive R.1 model into a smaller, more efficient version.

Size Reduction – The original 67-billion-parameter model was distilled into a 1.5-billion-parameter version.
Local Demo – A Streamlit app demonstrated DeepSeek running locally on an M3 Mac, showcasing its ability to solve math problems with detailed reasoning.

Latency and Performance:

The 14-billion-parameter model was powerful but slow.
The 1.5-billion-parameter model was faster and more efficient, making it ideal for local deployment where speed and resource efficiency matter.

Why Run AI Locally?

Privacy – No need to send data to the cloud.
Customization – Fine-tune models for specific tasks.
Hardware Constraints – Ensures models fit within available system resources.

Use Cases and Considerations

DeepSeek’s models are still in the early stages of real-world adoption, but potential applications include:

Enterprise AI – Some companies are testing DeepSeek models for production use.
Prompt Engineering – Switching models may require refining prompts for optimal performance.
Privacy-Focused AI – DeepSeek’s models offer an alternative for those who need full control over their AI stack.
Traditional ML Tasks – The 1.5-billion-parameter model works well for sentiment analysis and topic modeling.
AI Agents & Tool Use – While the smaller models aren’t ideal for tool-calling, larger models might be better suited.

Final Thoughts: What Makes DeepSeek Unique?

Beyond performance, DeepSeek models have a distinct personality—offering a different experience compared to traditional cloud-based LLMs. Whether you need a reasoning powerhouse or a lightweight local model, DeepSeek’s innovative approach to reinforcement learning is pushing the boundaries of AI development.

Share

Suggested reading

Text reads: The Illusion of Thinking Understanding the Strengths and lImitations of Reasoning Models via the Lens of Problem Complexity

The Illusion of Thinking: What the Apple AI Paper Says About LLM Reasoning

Reads: Accurate KV Cache Quantization with Outlier Tokens Tracing

Accurate KV Cache Quantization with Outlier Tokens Tracing