Skip to content

LLM Agent Architectures: ReAct, Planning, and What Comes Next

The word "agent" gets used loosely in most product announcements, but underneath the marketing there are genuinely different architectural patterns for how an LLM-based system can reason and act. The two most established ones — ReAct and Planning — make meaningfully different bets about when to think and when to act. Understanding that difference matters if you are building systems rather than just using them.

ReAct: reasoning and acting interleaved

ReAct, introduced in a 2022 paper by Yao et al., is the pattern most widely deployed today. The name is a contraction of "Reasoning" and "Acting", and the core idea is exactly what that implies: the agent alternates between producing a reasoning trace and taking an action, using the observation from each action to inform the next step of reasoning.

The loop looks like this in practice: the agent thinks about what it knows, decides on an action (call a tool, run a search, read a file), receives the result, thinks again, decides on another action, and continues until it concludes it has an answer. The reasoning trace is not hidden internal state — it is explicitly produced as text, which means the chain of thought is visible and can be inspected.

The strength of this pattern is its responsiveness. Because each reasoning step has access to the actual result of the previous action, the agent can adapt mid-task. If the first search returns nothing useful, it can reason about why and try a different query. If a tool call fails, it can reason about the failure and try an alternative. The plan is never fully committed to upfront, which makes the system resilient to surprises.

The weakness is the same thing seen from the other direction. Because the plan is implicit and emerges step by step, ReAct agents can wander. They can get stuck in loops, repeating similar actions with slightly different parameters without making real progress. They can also hallucinate reasoning steps that sound locally coherent but lead to increasingly wrong conclusions. Without a global view of the task, there is nothing to prevent the agent from solving a subtask that turns out not to matter, or missing a constraint that was stated at the start.

Planning: think first, act later

Planning-based architectures separate the two concerns more explicitly. Before any action is taken, the agent produces a full plan: a decomposition of the task into subtasks, ideally with dependencies, priorities, and expected outputs. Only after the plan is assembled does execution begin.

The appeal is structural clarity. A plan gives you a representation of the task that can be inspected, criticized, and corrected before anything irreversible happens. It also makes parallelism possible — if two subtasks are independent, they can be executed simultaneously rather than sequentially. For complex, multi-step workflows where the overall shape of the problem is well understood, planning can produce better outcomes more efficiently than a reactive loop.

The weakness is brittleness. A plan is only as good as the assumptions it was built on, and those assumptions are made before any actions have been taken and any observations have been received. If the first step of execution reveals that the problem was not what it looked like — which is common in real codebases, real APIs, and real data — the plan can become outdated before it is half-executed. A ReAct agent would have adapted by then. A planner has to either abandon the plan and replan from scratch, or continue executing a plan that is no longer valid. Replanning is expensive and the decision about when to trigger it is itself a hard problem.

There are also planning architectures that use separate LLM instances for planning and execution — a planner that decomposes and a worker that executes. This can improve quality on each individual concern, but it adds coordination complexity and creates a new failure mode: the worker doing exactly what the plan says, faithfully, when the plan was wrong.

Where they differ, in practice

The difference is really a question of when you commit. ReAct commits one action at a time, paying for that flexibility with the risk of losing the thread. Planning commits to a structure upfront, paying for that structure with the risk of being wrong at the foundation.

In my experience, ReAct handles tasks better when the environment is unpredictable — when tool outputs are variable, when the task description is somewhat ambiguous, or when the path to a solution is not obvious in advance. Planning handles tasks better when the problem is well-understood, when the subtasks are largely independent, and when the cost of mid-task correction is high.

Neither is always the right choice, and the more interesting systems increasingly combine both: a planning pass to establish structure, followed by ReAct-style execution within each subtask.

What else is out there

The field is moving quickly enough that ReAct and Planning already feel like the established tier rather than the frontier. A few other patterns are worth knowing.

Reflexion adds a self-critique loop: after execution, the agent reflects on what went wrong, produces a verbal summary of the failure, and uses that summary as additional context in the next attempt. It is a way of introducing learning within a session without modifying model weights. The limitation is that the reflection is itself generated by the same model that made the original mistake, which puts a ceiling on how useful it can be.

Multi-agent systems split the work across multiple specialized agents that communicate, debate, or review each other's outputs. AutoGen and CrewAI are the most visible examples. The theory is that specialization improves quality and that peer review catches errors a single agent would miss. The practice is that coordination overhead is substantial and the failure modes multiply with each agent added.

Tree-of-Thought explores multiple reasoning paths simultaneously rather than committing to one, then selects the most promising branch. It is closer to search than to inference and can be effective when the space of reasonable approaches is small and evaluable. It is also expensive, since it requires generating and evaluating many parallel traces.

Memory-augmented agents attach external storage — episodic logs, vector databases, structured knowledge — to give the agent access to context that does not fit in a single prompt. This is less an architecture in itself and more an enhancement that can be layered onto any of the above. Done well, it is one of the more practical improvements to agent reliability for long-running tasks.

The honest summary

ReAct is the default for a reason: it is simple, transparent, and handles ambiguity reasonably well. Planning is the right addition when a task is complex enough to benefit from decomposition and stable enough for the decomposition to remain valid. The more advanced patterns — reflection, multi-agent collaboration, tree search — address real limitations but each introduces its own costs in complexity, latency, or failure surface.

The underlying constraint none of these architectures fully escape is the quality of the model's reasoning at each individual step. Better architecture helps a good model work more reliably. It does not rescue a model that reasons poorly on the domain at hand. That remains the most important variable, and the one least under the architect's control.