Image shows a diagram comparing chain of thought and Chain of continuous thought

Training Large Language Models to Reason in Continuous Latent Space

Sarah Welsh

Contributor

LLMs have traditionally been restricted to reason in the “language space,” where chain-of-thought (CoT) is used to solve complex reasoning problems. But a new paper argues that language space may not always be the best for reasoning. In this week’s paper read, we covered an exciting new technique from a team at Meta called Chain of Continuous Thought—also known as “Coconut.” In the paper, “Training Large Language Models to Reason in a Continuous Latent Space” explores the potential of allowing LLMs to reason in an unrestricted latent space instead of being constrained by natural language tokens.

Watch

Watch the full discussion here:

Listen

Learn More

Discussion Summary

We broke down the novel technique introduced in a paper by Meta researchers called “Training Large Language Models to Reason in a Continuous Latent Space.” The authors introduce Chain of Continuous Thought, also known as “Coconut.” The research they did seeks to improve how large language models (LLMs) handle complex reasoning tasks by drawing inspiration from human cognition. This technique takes a step away from the traditional reliance on language-based reasoning, mirroring how humans often solve problems without verbalizing every thought. As the paper notes, “Neuroimaging studies have consistently shown that the language network…remains largely inactive during various reasoning tasks.”

To understand Coconut, let’s first get on the same page about Chain of Thought (CoT).

What is Chain of Thought (CoT)?

CoT involves prompting LLMs to reason step by step before arriving at a conclusion. This structured approach has proven to enhance accuracy by encouraging the model to deliberate more thoroughly.

Chain of thought can be implemented in two ways:

  1. Prompting: Providing the model with examples of step-by-step reasoning.
  2. Training: Teaching the model to consistently respond using this technique.

As the paper explains, “A prevalent approach, known as chain-of-thought (CoT) reasoning…involves prompting or training LLMs to generate solutions step-by-step using natural language.”

What is Chain of Continuous Thought (Coconut)?

Coconut builds on the CoT framework but fundamentally changes how models represent their reasoning. Instead of producing text tokens for each reasoning step, Coconut keeps these steps in the model’s internal state—a “continuous thought.” This method enhances efficiency by allowing the model to process thoughts without the overhead of converting them into language. According to the authors, “This frees the reasoning from being within the language space.”

How it Works

Here’s a simplified breakdown of Coconut’s process:

  1. Latent Mode: The model operates in a non-verbal “thinking” state, represented by its internal (hidden) states.
  2. Language Mode: After reasoning in latent mode, the model switches to language mode to produce a human-readable response.
  3. Switching Modes: The transition from latent to language mode is controlled by either a binary classifier or a fixed-length latent mode. Both methods perform comparably, according to the researchers.
  4. Training: A multi-stage curriculum is used to train the model, gradually replacing language steps with latent states. Loss functions are adjusted to ensure latent thoughts facilitate, rather than compress, reasoning.

Comparing Coconut to Other Methods

The authors compare Coconut against several reasoning techniques:

  • Base CoT: Traditional step-by-step reasoning in natural language.
  • No-CoT: Models respond without structured reasoning.
  • Integrated CoT (iCoT): CoT reasoning baked into the model through training.
  • Pause Token: Allows the model to “pause” before continuing its reasoning.

Coconut performs comparably or better than these methods in many cases, with added efficiency. As the paper highlights, “Coconut…even surpasses language-based CoT methods, while generating significantly fewer tokens during inference.”
Coconut’s advantage grows with problem complexity, benefiting from at least three latent “thoughts” to improve performance.

A Practical Example

The paper includes an example involving made-up words and logical relationships. While traditional CoT sometimes produces hallucinated answers, Coconut’s latent reasoning enables more accurate responses.

In one scenario, Coconut analyzes multiple potential outcomes in parallel, gradually narrowing possibilities. This resembles a breadth-first search (BFS) approach, allowing it to encode diverse reasoning paths before converging on a solution. The researchers note: “The model maintains significant diversity in its reasoning paths…transitioning from parallel exploration to more focused reasoning.”

Final Thoughts

Coconut introduces an exciting direction for reasoning in LLMs. While it doesn’t outperform all methods in every scenario, its efficiency and adaptability to complex problems make it a promising alternative to standard CoT approaches.