Chapter 2 of [[ai-ml/nvidia-certs/ncp-aai/cognition-planning-and-memory/Understanding-the-planning-of-LLM-agents-A-survey|Understanding the planning of LLM agents: A survey]]

Abstract

This chapter establishes a comprehensive taxonomy for planning mechanisms within Large Language Model (LLM) agents, categorizing current research into five primary interconnected directions. The central technical contribution is the formalization of planning strategies—specifically Task Decomposition, Multi-plan Selection, External Planner-Aided Planning, Reflection and Refinement, and Memory-augmented Planning—along with their associated algorithmic formulations. These classifications provide a structured reference for understanding how LLMs handle complexity, efficiency, and failure recovery, marking a progression from monolithic planning to modular, auxiliary-enhanced approaches.

Key Concepts

  • Task Decomposition: This methodology adopts the concept of “divide and conquer” to address the severe hardness of planning in real-life complicated, multi-step tasks. It operates by decomposing a complex task into several sub-tasks and then sequentially planning for each sub-task. The process is illustrated as a hierarchy where a Goal maps to an LLM Agent which produces Sub-plans for specific sub-goals.
  • Decomposition-First Strategy: A specific category within task decomposition where the system decomposes the task into subgoals first before planning for each sub-goal successively. This approach separates the “decompose” step from the “sub-plan step” distinctly. Representative works implementing this strategy include HuggingGPT, Plan-and-Solve, and ProgPrompt.
  • Multi-plan Selection: This direction focuses on leading the LLM to “think” more by generating various alternative plans for a single task. A task-related search algorithm is subsequently employed to select the optimal plan to execute from the generated candidates. The approach relies on search strategies, such as tree search algorithms, to evaluate the generated plans.
  • External Planner-Aided Planning: This methodology is crafted to employ an external planner to elevate the planning procedure, addressing issues of efficiency and infeasibility of plans generated solely by the language model. In this architecture, the LLM primarily plays the role in formalizing the tasks while the external module handles the execution logic. This division of labor mitigates the limitations of the LLM’s planning capability.
  • Reflection and Refinement: This methodology emphasizes improving planning ability through an iterative process of reflection and refinement. It encourages the LLM to reflect on failures encountered during execution and then refine the plan accordingly. This creates a feedback loop where the plan generation is conditioned on past reflection outcomes.
  • Memory-augmented Planning: This kind of approach enhances planning with an extra memory module, in which valuable information is stored, such as commonsense knowledge, past experiences, and domain-specific knowledge. The information is retrieved when planning, serving as auxiliary signals to guide the agent. This retrieval mechanism supplements the model’s context with persistent data.
  • Interleaved Decomposition: Illustrated in the taxonomic figures, this is an alternative manner to Decomposition-First where the decision-making and execution are interleaved. Unlike the strict hierarchy of decomposition-first, this method alternates between decomposition steps and planning steps based on the current state. The text contrasts this with Decomposition-First to highlight different structural approaches to the same problem.
  • Multimodal Agent Control: Exemplified by HuggingGPT, this concept applies the decomposition principle to multimodal tasks where the LLM acts as a controller. It utilizes various multimodal models from the Huggingface Hub to construct an intelligent agent capable of handling image generation, classification, and video annotation. The LLM decomposes these complex interactions into manageable sub-tasks for specialized models.
  • Interconnectedness of Directions: The five research directions are characterized as interconnected rather than mutually exclusive. Often, advanced systems involve the concurrent adoption of multiple techniques, combining decomposition with memory or reflection. This flexibility allows for hybrid architectures that leverage the strengths of different planning paradigms.

Key Equations and Algorithms

  • Task Decomposition Formulation: followed by . This expression defines the two crucial steps of the method: firstly, decomposing the complex task relative to environment into sub-tasks , and secondly, planning for the sub-tasks. The variables represent the model parameters and denotes the planning policy.
  • Multi-plan Selection Formulation: ; . This algorithm describes the process where alternative plans are generated using the LLM, and a search strategy is used to select the optimal plan . The search strategies can include specific algorithms like tree search.
  • External Planner Formulation: ; . Where denotes the external planner module and represents the formalized information provided by the LLM. The LLM handles formalization while the external module generates the final executable plan .
  • Reflection and Refinement Loop: ; ; . This iterative procedure outlines how a plan is generated, reflected upon to produce a reflection , and then refined into a new plan . It formalizes the feedback mechanism where past plans and reflections condition future plan generation.
  • Memory-augmented Planning Formulation: ; . Here, represents the memory module from which information is retrieved based on the environment and goal . The planning function incorporates this retrieved memory as auxiliary input alongside the standard environment and goal contexts.
  • Decomposition-First Workflow: . This procedural flow illustrates the specific ordering of the Decomposition-First category. It shows that the decomposition of the goal into sub-goals occurs before the planning for the sub-goals begins, as opposed to an interleaved approach.
  • Representative Method Formulation (Table 1): The taxonomy table provides specific mappings for works such as CoT, ReAct, HuggingGPT under Task Decomposition. These methods adhere to the general decomposition formulation where the LLM’s task is explicitly “Task decomposition” and “Subtask planning”.

Key Claims and Findings

  • Planning for complex, multi-step tasks is a formidable challenge due to the complexity and variability of real-world environments, necessitating methods that simplify the execution.
  • Task decomposition simplifies complicated tasks using an algorithmic strategy called “divide and conquer,” which decomposes a single task into several simpler sub-tasks.
  • The five research directions regarding LLM-agent planning—Task Decomposition, Multi-plan Selection, External Planner-Aided, Reflection, and Memory—are interconnected rather than mutually exclusive.
  • HuggingGPT utilizes various multimodal models from the Huggingface Hub to construct an intelligent agent capable of handling tasks such as image generation, video annotation, and speech-to-text.
  • In External Planner-Aided Planning, the LLM mainly plays the role in formalizing the tasks while the external planner module is employed to address efficiency and infeasibility.
  • Current methods for task decomposition generally fall into two categories: decomposition-first and interleaved decomposition, which dictate the ordering of the decompose and sub-plan steps.
  • Memory-augmented planning treats stored commonsense knowledge, past experiences, and domain-specific knowledge as auxiliary signals that are retrieved when planning.
  • Reflection and Refinement strategies improve planning ability by encouraging the LLM to reflect on failures and explicitly refine the plan based on those reflections.

Terminology

  • : Denotes the external planner module used in External Planner-Aided Planning to elevate the planning procedure and address infeasibility issues.
  • : Represents the memory module in Memory-augmented Planning, responsible for storing valuable information such as commonsense knowledge and past experiences.
  • : Represents the search strategies employed in Multi-plan Selection, such as tree search algorithms, used to select the optimal plan from generated alternatives.
  • : Represents the environment context provided to the LLM or the planning function within the algorithmic formulations.
  • : Represents the goal, which is the target state or outcome the LLM agent is attempting to achieve through planning.
  • : Represents the parameters of the LLM or the underlying model used in the planning, decomposition, or reflection functions.
  • : Represents the formalized information output by the LLM in the External Planner-Aided framework, which serves as input to the external planner .
  • : Represents the planning function or policy used to generate sequences of actions or sub-tasks given the environment and goal.
  • Decomposition-First: A specific manner of task decomposition where the task is decomposed into subgoals first, and planning for each sub-goal is executed successively.
  • Interleaved: A manner of task decomposition where the decomposition of sub-goals and the planning for them occur in an alternating, interleaved fashion rather than sequentially.
  • Sub-plan: The specific plan generated for a singular sub-task after the original complex task has been decomposed.
  • Multimodal tasks: Complex interactions involving different modalities such as image generation, speech-to-text, and video annotation, often requiring specialized models controlled by an LLM.