Tree of Thoughts (ToT) is a reasoning framework that lets a language model explore multiple intermediate reasoning steps as a search tree, evaluating and backtracking among branches. It generalizes chain-of-thought from a single linear path to deliberate search.
What is Tree of Thoughts (ToT)?
Tree of Thoughts (ToT) is a prompting and reasoning framework that frames problem solving as a search over a tree of intermediate steps, where each node is a thought, a coherent partial solution. Introduced by Yao and colleagues in 2023, it lets a language model generate several candidate next steps, evaluate how promising each is, and explore the most promising branches while backtracking from dead ends.
Where chain-of-thought produces one linear sequence of reasoning, Tree of Thoughts maintains many partial solutions at once and applies classic search to decide which to expand. This turns reasoning into a deliberate process of exploration and self-evaluation rather than a single forward pass.
- A thought is a coherent intermediate reasoning step, a node in the tree.
- The model generates candidate steps, evaluates them, then expands the best.
- Introduced by Yao et al., 2023.
How it differs from chain-of-thought
Chain-of-thought (CoT) prompts a model to produce a single reasoning trace from problem to answer. If an early step is wrong, the whole chain typically fails because there is no mechanism to reconsider. Self-consistency improves on this by sampling many independent chains and voting, but each chain is still linear and there is no exploration within a chain.
Tree of Thoughts adds explicit branching and backtracking. At each step the model proposes several alternative thoughts, scores them, and keeps the strongest, so it can abandon a poor branch and try another. This makes ToT suited to problems that need planning, lookahead, or trial and error, where committing to one path early is risky.
- CoT: one linear path, no backtracking.
- Self-consistency: many linear paths, voted at the end, still no in-path exploration.
- ToT: branching tree with evaluation and backtracking.
Search and evaluation
Tree of Thoughts navigates the tree with standard search algorithms, most commonly breadth-first search (BFS) or depth-first search (DFS). BFS keeps a fixed number of the best partial solutions at each level, while DFS pursues one branch deeply and backtracks when it stops looking promising.
The model also acts as its own evaluator. At each node it judges candidate thoughts, for example rating them as sure, maybe, or impossible toward solving the problem, and the search uses these self-evaluations to prune and prioritize. This pairing of generation and evaluation by the same model is what lets ToT search without an external solver.
- Search strategies: typically BFS or DFS over the thought tree.
- The model self-evaluates candidate thoughts to prune and prioritize.
- Branching factor and search depth are tunable per task.
When to use it
Tree of Thoughts shines on tasks that require search, planning, or exploration rather than recall. The original paper reported a large gap on the Game of 24, a math puzzle, where GPT-4 with chain-of-thought solved about 4 percent of cases while Tree of Thoughts reached about 74 percent. It was also evaluated on tasks such as creative writing and crosswords.
The cost is many more model calls per problem, since each node requires generation and evaluation across multiple branches. For simple questions, chain-of-thought or even direct prompting is cheaper and adequate. Tree of Thoughts is worth its overhead mainly when a problem genuinely benefits from exploring and pruning alternatives.
- Strong on search and planning tasks like Game of 24 and constrained puzzles.
- Yao et al. reported roughly 4 percent (CoT) versus 74 percent (ToT) on Game of 24.
- High inference cost makes it overkill for simple questions.
Key takeaways
- Tree of Thoughts frames reasoning as a search tree of intermediate thoughts with branching and backtracking.
- It generalizes chain-of-thought from a single linear path to deliberate exploration of alternatives.
- The model self-evaluates candidate thoughts, and search proceeds via BFS or DFS.
- On the Game of 24, the paper reported roughly 74 percent success versus about 4 percent for chain-of-thought.
- Its strength on planning tasks comes at the cost of many more model calls per problem.
Frequently asked questions
Related terms
Related reading
Sources
Put the idea into practice
MemX is an AI memory app built on these ideas: store anything, skip the folders, and find it again by asking in plain English.
Try MemX Free