An Introduction to Large Language Models: Prompt Engineering and P-Tuning (Cognition Perspective)

Cognition, Planning, and Memory cross-section. Full summary: Agent Development — LLM Prompt Engineering and P-Tuning.

This article is relevant to the Cognition, Planning, and Memory topic area because prompt engineering and p-tuning are the primary mechanisms by which an agent’s reasoning behaviour is shaped at inference time — complementing the memory systems described in What Is Agent Memory?.

Prompting as a Cognitive Strategy

Zero-shot prompting relies entirely on the LLM’s pretraining knowledge; it tests the model’s latent capabilities without any in-context guidance. For agents, zero-shot prompting is appropriate when the task is well-covered by pretraining (e.g., standard JSON extraction, translation).

Few-shot prompting provides examples that prime the model’s pattern-matching: the LLM generalises from the examples to the new input without updating any weights. For agents, this is the standard approach for tasks requiring a specific output format or reasoning style.

Chain-of-thought (CoT) prompting explicitly instructs the LLM to externalise its reasoning steps before producing an answer. This is the cognitive analogue of “thinking out loud” and improves accuracy on multi-step tasks. For agents operating in a plan-act-observe loop, CoT is directly relevant: the Thought phase of a ReAct loop is a form of chain-of-thought reasoning.

Zero-shot CoT (“Let’s think about this logically”) triggers reasoning without labelled examples — useful when few-shot examples are unavailable or when the task space is too broad to enumerate examples.

P-Tuning as Cognitive Customisation

P-tuning (prompt tuning) customises the cognitive “stance” of an LLM for a specific domain or task type — without modifying the model’s weights. The virtual tokens learned during p-tuning encode domain-specific priors that are prepended to every inference call, shifting the LLM’s output distribution toward the desired task.

For agents, p-tuning is relevant when:

  • The base model’s default responses are not well-calibrated for the agent’s domain (e.g., enterprise customer support vocabulary).
  • Full fine-tuning is cost-prohibitive or would risk catastrophic forgetting of general capabilities.
  • Multiple domain-specific stances are needed (multiple p-tuned token sets can be swapped at runtime without storing separate full models).

This connects to the memory taxonomy in that p-tuning encodes a form of procedural memory — learned task-execution style — into the LLM’s input pipeline rather than into an external memory store.