Section 3 of Building Agentic AI Applications with LLMs

Abstract

This section establishes the foundational architecture for Simple LLM Agent Systems, defining the transition from passive large language models to active systems capable of iterative reasoning and tool interaction. The central technical contribution is the formalization of the agent loop, which structures inference as a sequential process of observation, action, and state update rather than a single turn-generation task. This framework matters within the deck’s progression as it provides the minimal viable structure upon which more complex multi-agent or hierarchical systems are built. By isolating the mechanics of the single-agent loop, the section clarifies the computational costs and failure modes inherent to basic agentic workflows.

Key Concepts

  • Agentic Loop Structure: This concept defines the core operational cycle where the LLM observes a state, formulates a reasoning step, selects an action, and processes the resulting observation. It contrasts with standard inference by introducing a temporal dimension where the model must manage context across multiple turns to achieve a goal. The loop continues until a termination condition is met, typically defined by the completion of a task or the exhaustion of a step budget.

  • Tool Interface Standardization: To enable action, the system requires a standardized interface for external functions, often implemented via function calling schemas or JSON formats. This concept abstracts the complexities of API calls into discrete, parsable actions that the LLM can predict via next-token probability. It ensures that the model’s output can be reliably interpreted by the executor engine without ambiguity.

  • Context Window Management: As the agent loop iterates, the history of thoughts, actions, and observations accumulates within the model’s context window. Efficient management strategies are required to truncate or summarize this history to prevent exceeding token limits while retaining critical task state. Failure to manage this effectively results in information loss or prohibitively high inference latency.

  • Prompt Engineering for Control: Simple agents rely heavily on the system prompt to constrain the LLM’s behavior to the required output format for tool usage. This involves providing explicit schemas and examples (few-shot prompting) that guide the model to adhere to the action specification. The prompt acts as the control logic, replacing traditional code-based branching with probabilistic generation pathways.

  • Deterministic Execution Engine: The non-deterministic nature of the LLM generation must be coupled with a deterministic backend that executes the selected actions. This separation ensures that while the decision-making is probabilistic, the environmental changes (e.g., database updates) are consistent and reproducible. The engine validates inputs against tool schemas prior to execution to prevent runtime errors.

  • Error Recovery Mechanisms: When the execution engine encounters a failure or the LLM generates an invalid action, the system must inject the error message back into the context loop. This allows the model to reason about the failure and formulate a corrective strategy in the subsequent iteration. This mechanism transforms runtime exceptions into feedback signals for the agent.

  • Termination Criteria: The system requires explicit logic to identify when the agent loop should stop generating actions. This is usually implemented via a special token emitted by the model or a maximum step count enforced by the controller. Without robust termination criteria, the agent may enter infinite loops or continue generating actions after the task is complete.

  • State Serialization: To persist the agent’s progress across different sessions or to debug the loop, the internal state (including history and step counters) must be serialized. This allows the agent system to be paused and resumed or to allow for post-hoc analysis of the decision tree. It decouples the execution flow from the memory storage.

Key Equations and Algorithms

  • Next-Token Probability Distribution: , where represents tokens in the sequence generated by model . This equation describes how the LLM samples the next token in the agent’s thought or action sequence conditional on the entire history of the conversation and task context.

  • Agent Step Function: , where is the action at step , is the current observation, and is the history of previous states. This function encapsulates the LLM’s role in mapping the current environmental state and historical context to a specific executable command.

  • Termination Condition: . This logical expression defines when the loop halts, triggered either by the generation of a specific stop token or by the iteration count exceeding the maximum allowed steps .

  • Loop Transition Function: . This algebraic representation describes the update rule for the history buffer , appending the newly generated action and the system’s next observation to the context for the subsequent model call.

  • Error Injection Logic: . When an execution error occurs, this algorithm modifies the standard observation to include human-readable feedback, ensuring the error is visible to the model in the next step.

Key Claims and Findings

  • Simple LLM Agent Systems effectively decompose complex, multi-step problems into atomic sub-tasks via iterative reasoning. This decomposition allows the system to solve problems that exceed the model’s context or reasoning depth in a single pass.

  • The latency of an agentic system scales linearly with the number of reasoning steps required to complete a task. Each iteration introduces a full LLM inference cost plus the network latency of any tool calls.

  • Hallucinated tool arguments can be mitigated by strict schema enforcement during the parsing stage. If the parser rejects the output, the error is fed back to the model, forcing a correction in the next step.

  • The quality of the agent’s performance is highly sensitive to the prompt structure used to define the action space. Ambiguities in the schema definition lead to consistent output parsing failures.

  • Simple agents are susceptible to getting stuck in loops where they repeatedly generate the same action. This is typically caused by context noise or a lack of progress indicators in the system prompt.

  • Execution safety is improved by sandboxing the tool environment, ensuring that any generated code or commands are isolated from the host system. This prevents the agent from accessing sensitive resources even if the LLM output is malicious.

  • Token consumption in agent loops grows quadratically in the worst case if history is not summarized, as each step adds new observation and action tokens to the context.

Terminology

  • Agent: An autonomous system that uses an LLM as the reasoning engine to interact with an environment through a loop of perception and action.

  • Action Space: The set of all possible functions or tools that an agent can invoke, defined by their signatures and input parameters.

  • Observation: The feedback returned by the execution engine after an action is performed, which becomes part of the context for the next reasoning step.

  • Step Limit: A hyperparameter defining the maximum number of iterations the agent is allowed to perform before forced termination to prevent infinite looping.

  • Schema: A formal specification (e.g., JSON Schema) describing the arguments required for each tool in the action space.

  • Context Window: The maximum number of tokens the underlying LLM can process in a single inference call, limiting the total history of the agent’s interaction.

  • Executor: The component of the system responsible for validating and running the actions selected by the LLM.

  • Thought Trace: The textual sequence generated by the LLM where it reasons about the current state before selecting an action, often visible in the observation log.