From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs

Abstract

This survey establishes a structured taxonomy for memory mechanisms in LLM-driven AI systems by systematically mapping them to human cognitive neuroscience. The authors introduce a 3D-8Q classification framework that organizes AI memory across three axes: object (personal vs. system), form (parametric vs. non-parametric), and time (short-term vs. long-term), yielding eight functional quadrants. By synthesizing literature on memory construction, management, retrieval, and inference optimization, the paper provides a unified architectural lens for understanding how agents retain, recall, and adapt information. It concludes by outlining critical research frontiers, including multimodal integration, real-time stream memory, collaborative architectures, and collective privacy preservation.

Key Concepts

3D-8Q Memory Taxonomy: A multidimensional classification framework mapping AI memory capabilities based on object (personal/system), form (parametric/non-parametric), and retention duration (short-term/long-term).
Personal vs. System Memory: Personal memory stores user-specific data for behavioral adaptation and personalization; system memory captures intermediate task execution states to enable reasoning, planning, and self-reflection.
Parametric vs. Non-Parametric Memory Storage: Parametric memory encodes information directly within model weights via fine-tuning or editing, while non-parametric memory utilizes external structures (vectors, graphs, KV caches) dynamically accessible during inference.
Neuroscience-Inspired Memory Dynamics: AI memory processes analogously to human cognition, including consolidation (stabilizing working traces to long-term storage) and reconsolidation (reactivating and updating memories upon new interactions).
KV Cache Management and Reuse: A parametric short-term system memory technique that accelerates autoregressive generation by caching and reusing attention keys and values across tokens, prompts, or serving requests.
Paradigm Shifts in Memory Architecture: Evolution from unimodal to multimodal, batch/static to continuous stream processing, isolated to collaborative shared memory, and from rule-based to automated self-evolution.

Key Equations and Algorithms

None (The source is a conceptual survey and taxonomy; it presents no mathematical formulations, loss functions, or algorithmic pseudocode.)

Key Claims and Findings

Categorizing AI memory exclusively by temporal duration is structurally insufficient; a multidimensional taxonomy incorporating object and representation form is required to capture the full scope of LLM-driven memory systems.
Personal non-parametric long-term memory operates through a pipeline of construction (extraction/consolidation), management (deduplication/conflict resolution/re-consolidation), retrieval (vector/graph/SQL-based), and downstream application integration.
Parametric short-term system memory dominates inference optimization, where KV cache compression, quantization, and token/sentence-level reuse significantly reduce computational overhead and latency during multi-turn generation.
System non-parametric long-term memory enables continual agent improvement by storing, abstracting, and reusing successful reasoning traces, workflow paths, and failure reflections to refine future task execution.
Current memory architectures face fundamental constraints in multimodal fusion, real-time state updating, cross-domain knowledge sharing, and scalable privacy frameworks, necessitating a shift toward adaptive, connected, and collectively secure memory ecosystems.

Terminology

3D-8Q Taxonomy: The paper’s proposed classification schema that intersects three categorical dimensions (object, form, time) to define eight distinct memory quadrants, each with a specific cognitive role and architectural function.
Parametric Memory: Knowledge or state encoded directly within a model’s trainable or editable parameters, enabling implicit retention, high compression, and direct inference access but at the cost of update scalability.
Non-Parametric Memory: External, weight-independent storage structures (e.g., vector databases, knowledge graphs, KV caches) that AI systems dynamically query during inference to supplement context without weight modification.
Memory Consolidation/Reconsolidation: The AI analog of neurobiological processes; consolidation stabilizes transient working memory into durable long-term storage, while reconsolidation dynamically updates or restructures memories upon reactivation.
Stream Memory vs. Static Memory: Stream memory processes and updates information continuously as data arrives in real-time, contrasting with static memory that operates on discrete, delayed batches of accumulated information.
KV Cache Reuse: A serving-level optimization that stores previously computed Q/K/V attention tensors and reuses them across sequential tokens or semantically similar prompts to eliminate redundant computation.

Personal Wiki

Explorer

From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs

Abstract

Key Concepts

Key Equations and Algorithms

Key Claims and Findings

Terminology

Connections to Existing Wiki Pages

Graph View

Table of Contents

Backlinks