Google’s AI Presents ReasoningBank: A Framework for Strategy-Level AI Agents to Self-Evolve at Test Time

October 2, 2025

155

LLM agents, while capable of handling multi-step tasks like web browsing or software bug fixing, often struggle to learn from and reuse their experiences. Traditional memory systems either store raw logs or rigid workflows, which can be brittle and ignore valuable insights from failures. To address this, Google Research introduces ReasoningBank, an innovative AI agent memory framework that transforms an agent’s interaction traces—both successes and failures—into reusable, high-level reasoning strategies.

The Challenge: Inefficient Learning from Experience

LLM agents excel at tackling complex tasks but falter when it comes to accumulating and reusing their experiences. Conventional memory systems typically store raw logs or success-only workflows, which are inflexible and overlook crucial signals from failures. This limitation hinders agents from improving their performance over time and adapting to new tasks or environments.

ReasoningBank: A Novel Approach to Agent Memory

ReasoningBank reframes memory as compact, human-readable strategy items, making it easier to transfer knowledge between tasks and domains. Each experience is distilled into a memory item comprising a title, a one-line description, and content containing actionable principles or heuristics. The retrieval process is embedding-based: for a new task, top-k relevant items are injected as system guidance; after execution, new items are extracted and consolidated back into the memory.

The loop is intentionally simple—retrieve, inject, judge, distill, append—ensuring that improvements can be attributed to the abstraction of strategies rather than complex memory management. This approach enables agents to self-evolve, learning from their experiences and improving their decision-making capabilities over time.

Why ReasoningBank’s Strategies Transfer Well

ReasoningBank’s strategy items encode reasoning patterns and negative constraints, not website-specific DOM steps. For instance, an item might advise, “prefer account pages for user-specific data” or “avoid infinite scroll traps.” Failures are not ignored but converted into negative constraints like “do not rely on search when the site disables indexing.” By encoding these patterns and constraints, ReasoningBank prevents repeated mistakes and promotes more informed decision-making.

Memory-Aware Test-Time Scaling (MaTTS): Enhancing Learning

The researchers also propose Memory-aware test-time scaling (MaTTS), which integrates scaling with ReasoningBank to further improve learning. MaTTS comes in two flavors:

1. **Parallel MaTTS**: Generate multiple rollouts in parallel, then self-contrast them to refine strategy memory.
2. **Sequential MaTTS**: Iteratively self-refine a single trajectory, mining intermediate notes as memory signals.

The synergy between MaTTS and ReasoningBank is two-way: richer exploration produces better memory, and better memory steers exploration toward promising branches. Empirically, MaTTS yields stronger, more monotonic gains than vanilla best-of-N without memory.

Evaluating the Proposed Frameworks

The effectiveness of ReasoningBank and MaTTS is evident in their performance improvements:

Effectiveness: The combination of ReasoningBank and MaTTS improves task success by up to 34.2% relative to no-memory approaches and outperforms prior memory designs that reuse raw traces or success-only routines.
Efficiency: Interaction steps drop by 16% overall, with the largest reductions occurring on successful trials. This indicates fewer redundant actions rather than premature aborts.

Integration into the Agent Stack

ReasoningBank is designed as a plug-in memory layer for interactive agents that already use ReAct-style decision loops or best-of-N test-time scaling. It amplifies verifiers and planners by injecting distilled lessons at the prompt/system level. On web tasks, it complements BrowserGym/WebArena/Mind2Web; on software tasks, it layers atop SWE-Bench-Verified setups.

In conclusion, ReasoningBank and MaTTS offer promising avenues for enhancing LLM agents’ ability to learn from and reuse their experiences, ultimately leading to improved performance and adaptability. To explore these frameworks further, you can check out the paper, tutorials, codes, and notebooks on the project’s GitHub page. Additionally, you can follow the team on Twitter, join their 100k+ ML SubReddit, subscribe to their newsletter, and even connect with them on Telegram.