Abstract
This survey establishes a structured taxonomy for memory mechanisms in LLM-driven AI systems by systematically mapping them to human cognitive neuroscience. The authors introduce a 3D-8Q classification framework that organizes AI memory across three axes: object (personal vs. system), form (parametric vs. non-parametric), and time (short-term vs. long-term), yielding eight functional quadrants. By synthesizing literature on memory construction, management, retrieval, and inference optimization, the paper provides a unified architectural lens for understanding how agents retain, recall, and adapt information. It concludes by outlining critical research frontiers, including multimodal integration, real-time stream memory, collaborative architectures, and collective privacy preservation.
Key Concepts
- 3D-8Q Memory Taxonomy: A multidimensional classification framework mapping AI memory capabilities based on object (personal/system), form (parametric/non-parametric), and retention duration (short-term/long-term).
- Personal vs. System Memory: Personal memory stores user-specific data for behavioral adaptation and personalization; system memory captures intermediate task execution states to enable reasoning, planning, and self-reflection.
- Parametric vs. Non-Parametric Memory Storage: Parametric memory encodes information directly within model weights via fine-tuning or editing, while non-parametric memory utilizes external structures (vectors, graphs, KV caches) dynamically accessible during inference.
- Neuroscience-Inspired Memory Dynamics: AI memory processes analogously to human cognition, including consolidation (stabilizing working traces to long-term storage) and reconsolidation (reactivating and updating memories upon new interactions).
- KV Cache Management and Reuse: A parametric short-term system memory technique that accelerates autoregressive generation by caching and reusing attention keys and values across tokens, prompts, or serving requests.
- Paradigm Shifts in Memory Architecture: Evolution from unimodal to multimodal, batch/static to continuous stream processing, isolated to collaborative shared memory, and from rule-based to automated self-evolution.
Key Equations and Algorithms
- None (The source is a conceptual survey and taxonomy; it presents no mathematical formulations, loss functions, or algorithmic pseudocode.)
Key Claims and Findings
- Categorizing AI memory exclusively by temporal duration is structurally insufficient; a multidimensional taxonomy incorporating object and representation form is required to capture the full scope of LLM-driven memory systems.
- Personal non-parametric long-term memory operates through a pipeline of construction (extraction/consolidation), management (deduplication/conflict resolution/re-consolidation), retrieval (vector/graph/SQL-based), and downstream application integration.
- Parametric short-term system memory dominates inference optimization, where KV cache compression, quantization, and token/sentence-level reuse significantly reduce computational overhead and latency during multi-turn generation.
- System non-parametric long-term memory enables continual agent improvement by storing, abstracting, and reusing successful reasoning traces, workflow paths, and failure reflections to refine future task execution.
- Current memory architectures face fundamental constraints in multimodal fusion, real-time state updating, cross-domain knowledge sharing, and scalable privacy frameworks, necessitating a shift toward adaptive, connected, and collectively secure memory ecosystems.
Terminology
- 3D-8Q Taxonomy: The paper’s proposed classification schema that intersects three categorical dimensions (object, form, time) to define eight distinct memory quadrants, each with a specific cognitive role and architectural function.
- Parametric Memory: Knowledge or state encoded directly within a model’s trainable or editable parameters, enabling implicit retention, high compression, and direct inference access but at the cost of update scalability.
- Non-Parametric Memory: External, weight-independent storage structures (e.g., vector databases, knowledge graphs, KV caches) that AI systems dynamically query during inference to supplement context without weight modification.
- Memory Consolidation/Reconsolidation: The AI analog of neurobiological processes; consolidation stabilizes transient working memory into durable long-term storage, while reconsolidation dynamically updates or restructures memories upon reactivation.
- Stream Memory vs. Static Memory: Stream memory processes and updates information continuously as data arrives in real-time, contrasting with static memory that operates on discrete, delayed batches of accumulated information.
- KV Cache Reuse: A serving-level optimization that stores previously computed Q/K/V attention tensors and reuses them across sequential tokens or semantically similar prompts to eliminate redundant computation.