AM-Thinking-v1

AM Thinking v1

AM-Thinking-v1 is a 32-billion-parameter open-source LLM that achieves breakthrough reasoning ability by combining supervised fine-tuning and reinforcement learning in its training. Starting from a base 32B model (Qwen-2.5 32B), the developers first applied reasoning-focused supervised fine-tuning, feeding the model curated datasets of complex math and coding problems with solutions. They then performed a reinforcement learning phase (akin to RLHF) to further hone the model’s reasoning and step-by-step thought processes. The result, AM-Thinking-v1, has demonstrated state-of-the-art performance on benchmarks like AIME math and LiveCodeBench coding tasks. Notably, it rivals much larger models on these reasoning challenges, proving that mid-scale models (≈30B) can excel when trained with a specialized reasoning-centric pipeline. AM-Thinking-v1’s success showcases how targeted fine-tuning + RL can imbue an open model with advanced problem-solving skills (paper).

Arize AX

Learn

Insights

Company

Arize AX

Learn

Insights

Company

What is AM Thinking v1?

AM Thinking v1

Bi-weekly AI Research Paper Readings

Arize AX

Learn

Insights

Company

What is AM Thinking v1?

AM Thinking v1

Bi-weekly AI Research Paper Readings

Subscribe to The Evaluator