Ch. 4 — Multi-Plan Selection

Chapter 4 of [[ai-ml/nvidia-certs/ncp-aai/cognition-planning-and-memory/Understanding-the-planning-of-LLM-agents-A-survey|Understanding the planning of LLM agents: A survey]]

Abstract

Chapter 4 and the introductory segments of Chapter 5 address the critical mechanisms for selecting optimal plans from diverse candidates generated by Large Language Models (LLMs). The central technical contribution involves a comparative analysis of heuristic search strategies, specifically contrasting self-consistency voting with tree-based search algorithms such as ToT, MCTS, and A*. Additionally, these chapters examine External Planner-Aided Planning, detailing the integration of LLMs with symbolic formalisms like PDDL and neural decision models to handle intricate environmental constraints. This progression from internal search to external integration highlights the trade-offs between computational scalability and solution reliability in autonomous agent planning.

Key Concepts

Self-Consistency Voting: This strategy applies a naive majority vote mechanism to determine the optimal plan among multiple candidates generated by the LLM. By treating the plan receiving the most votes as the optimal choice, it leverages the model’s stochastic output without requiring a structured search topology. This approach serves as a baseline for plan selection but lacks the structured exploration capabilities of tree-based methods.
Tree-of-Thought (ToT): ToT extends standard prompting by introducing a tree architecture that supports conventional search algorithms such as Breadth-First Search (BFS) and Depth-First Search (DFS). When expanding a node within this structure, the LLM is used to evaluate multiple distinct actions to identify the optimal path for growth. This architecture enables a more deliberative process compared to linear generation strategies.
LLM-MCTS and RAP: These methods leverage the LLM as a heuristic policy function within the Monte Carlo Tree Search (MCTS) framework. Multiple potential actions are obtained through repeated calls to the model, which guides the exploration of the search space more effectively than standard rollout policies. Both approaches utilize the tree structure to assist in multi-plan search, combining LLM reasoning with probabilistic exploration.
LLM A Search*: This approach utilizes the classic A* algorithm from artificial intelligence to assist the LLM in navigating the search space efficiently. It employs the Chebyshev distance from the current position to the target position as the heuristic cost function for selecting the optimal path. This provides a directional bias towards the goal state that pure tree search methods may lack.
Symbolic Planner Integration: This concept involves integrating LLMs with established symbolic planning formalisms, such as PDDL, to address environments with intricate constraints. The LLM is responsible for organizing natural language problems into formalized text prompts that adhere to the required symbolic language format. Subsequently, a solver executes the planning process on the formalized description to generate valid action sequences.
LLM+P Procedure: This specific method enhances planning proficiency by incorporating a PDDL-based symbolic planner. The LLM organizes actions within the environment and specified tasks into the format of the PDDL language to create a formalized description. The Fast-Downward solver is then employed to process this description and derive the final plan.
LLM-DP Framework: Designed specifically for dynamic interactive environments, LLM-DP adapts the symbolic planning approach to incorporate real-time feedback. Upon receiving feedback from the environment, the LLM processes the information and re-formalizes it into the PDDL language. A BFS solver is subsequently utilized to generate an updated plan based on the dynamic state changes.
LLM+ASP Transformation: This method transforms problems described in natural language by the LLM into atomic facts to convert tasks into Answer Set Programming (ASP) problems. Once the transformation to the logical ASP format is complete, the CLINGO solver is utilized to generate the final plans. This shifts the planning burden from statistical inference to logical deduction.
Neural Planning Models: These are deep models trained on collected planning data using reinforcement learning or imitation learning techniques to show effective planning abilities within specific domains. For instance, DRRN models the planning process as a Markov Decision Process through reinforcement learning to obtain a deep decision model. Decision Transformer (DT) empowers a transformer model to clone human decision-making behavior with planning data.
Generalization Limitations: While well-trained neural planners exhibit excellent planning capabilities and superior efficiency due to smaller parameter sizes, they face challenges with complex and less frequently encountered problems. When training data is scarce, these small models tend to perform poorly due to insufficient generalization ability. This limitation motivates the investigation into combining LLMs with light-weight planners.

Key Equations and Algorithms

Self-Consistency Majority Vote: $Plan_{opt} = argmax_{p} \sum_{i} I (p = p_{i})$ This expression represents the selection of the optimal plan $p$ based on the plurality of votes across multiple generated plans $p_{i}$ . The indicator function $I$ counts the frequency of each plan, designating the most frequent one as the choice.
ToT Node Expansion: $Action_{exp} \leftarrow argmax LLM (state, candidate_actions)$ This algorithmic step describes the selection of a node for expansion by using the LLM to evaluate multiple actions. The model selects the optimal one to extend the tree structure based on the provided state and available candidates.
LLM-MCTS Policy Function: $π (s) \leftarrow LLM (s)$ This relationship defines the heuristic policy function where the LLM acts as the policy $π$ acting on state $s$ . It is used to obtain multiple potential actions through multiple calls during the Monte Carlo Tree Search process.
LLM A Heuristic*: $h (n) = ChebyshevDistance (n, target)$ This equation specifies the heuristic cost function used in LLM A*, where $h (n)$ represents the estimated cost from the current position $n$ to the target. The Chebyshev distance serves as the metric to assist the LLM in selecting the optimal path.
Symbolic Solver Execution: $Plan \leftarrow Solver (PDDL_{formalized})$ This procedural description outlines the execution where a solver (e.g., Fast-Downward, CLINGO) takes the textual PDDL representation as input. The output is a valid plan generated by the external reasoning engine rather than the LLM directly.

Key Claims and Findings

Scalability of multi-plan selection provides broader exploration of potential solutions in expansive search spaces but introduces inherent trade-offs regarding resource consumption.
Increased computational demands pose practical challenges, particularly in scenarios involving models with large token counts or where online service resource constraints are significant factors.
Reliance on the LLM for plan evaluation introduces new challenges because LLM performance in ranking tasks remains under scrutiny and requires further validation.
The stochastic nature of LLMs adds randomness to the selection process, potentially affecting the consistency and reliability of the chosen plans.
Integration with external symbolic planners addresses challenges in environments featuring intricate constraints such as mathematical problem-solving or generating admissible actions.
Well-trained neural planners demonstrate superior planning efficiency due to smaller parameter sizes but perform poorly on complex problems where training data is scarce.
LLM+PDDL proposes using the plan generated by the LLM as an initial heuristic solution to accelerate the search process of local search planners.
Dynamic interactive environments require methods like LLM-DP that can process feedback from the environment and re-formalize it into planning languages.

Terminology

PDDL: The Planning Domain Definition Language, a symbolic formalized model employed in automated planning to describe actions and states in text format.
ASP: Answer Set Programming, a logical programming paradigm used in conjunction with the CLINGO solver to generate plans from atomic facts derived by the LLM.
Neural Planners: Deep models trained on collected planning data, utilizing reinforcement learning or imitation learning to mimic decision-making behavior within specific domains.
Symbolic Planner: A fundamental component in automated planning that uses symbolic reasoning to identify optimal paths from initial states to desired goal states based on formalized models.
Fast-Downward: A specific solver referenced for the planning process when tasks are formalized into PDDL language descriptions by the LLM.
CLINGO: An ASP solver utilized to generate plans after the LLM transforms natural language problems into atomic facts and ASP logical structures.
Heuristic Policy Function: The role assigned to the LLM in algorithms like LLM-MCTS and RAP, where it guides the search by determining potential actions.
Chebyshev Distance: The metric serving as the heuristic cost function in LLM A*, measuring the distance from the current position to the target position.
DRRN: Deep Recurrent Q-Network, a model that represents the planning process as a Markov Decision Process trained through reinforcement learning.
Decision Transformer: A transformer model architecture empowered to clone human decision-making behavior using planning data through imitation learning.
MCTS: Monte Carlo Tree Search, an algorithmic strategy used in LLM-MCTS and RAP to manage the search tree using LLM as the heuristic guide.
ToT: Tree-of-Thought, a framework extending prompting by adding transformations of thoughts and supporting arbitrary thoughts aggregation within a tree structure.

Personal Wiki

Explorer

Ch. 4 — Multi-Plan Selection

Abstract

Key Concepts

Key Equations and Algorithms

Key Claims and Findings

Terminology

Graph View

Table of Contents

Backlinks