Multi-Agent Memory from a Computer Architecture Perspective: Visions and Challenges Ahead

AuthorsZhongming Yu, Naicheng Yu, Hejia Zhang et al.

arXiv 20262026

TL;DR

Multi-Agent Memory Architecture reframes multi-agent context as a three-layer memory hierarchy with cache sharing and memory access protocols, but reports no quantitative MAIN_RESULT.

THE PROBLEM

Multi-agent systems lack formal memory consistency and protocols

SYS_NAME argues that multi-agent memory is a bottleneck, noting that context becomes a dynamic, multi-format, partially persistent memory system rather than a static prompt.

Without explicit cache sharing and memory access protocols, collaborative LLM agents risk stale reads, overwritten updates, and incoherent shared context across agents.

HOW IT WORKS

Architecture inspired agent memory hierarchy and protocols

SYS_NAME introduces a three-layer hierarchy with Agent IO Layer, Agent Cache Layer, Agent Memory Layer, plus protocols for Agent Cache Sharing and Agent Memory Access.

Like RAM, cache, and disk in classical computers, SYS_NAME treats agent memory as a hierarchy where fast caches sit between rich I O and large persistent stores.

This KEY_MECHANISM lets SYS_NAME reason about bandwidth, caching, and consistency in multi-agent systems in ways a plain context window cannot express.

DIAGRAM

Agent memory hierarchy and protocol framing

This diagram shows how SYS_NAME stacks Agent IO, cache, and memory layers with cache sharing and memory access protocols for multi-agent scenarios.

DIAGRAM

Consistency model comparison across architectures

This diagram shows how SYS_NAME compares traditional architecture memory, agent memory, and multi-agent memory consistency goals and features.

PROCESS

How Multi-Agent Memory Architecture Handles a multi-agent scenario

  1. 01

    Agent IO Layer

    SYS_NAME ingests multimodal inputs at the Agent IO Layer, preparing audio, text, images, and network signals for downstream memory handling.

  2. 02

    Agent Cache Layer

    SYS_NAME stores compressed context, recent trajectories, tool calls, and short term latent state in the Agent Cache Layer for fast reasoning.

  3. 03

    Agent Memory Layer

    SYS_NAME persists full dialogue history, external knowledge database content, and long term storage artifacts in the Agent Memory Layer.

  4. 04

    Agent Cache Sharing

    SYS_NAME uses Agent Cache Sharing so one agent can transform and reuse cached artifacts, analogous to cache transfers in multiprocessors.

KEY CONTRIBUTIONS

Key Contributions

  • 01

    Architecture inspired memory hierarchy

    SYS_NAME defines Agent IO Layer, Agent Cache Layer, and Agent Memory Layer to treat agent memory as a bandwidth and caching constrained hierarchy.

  • 02

    Protocol extension for multi agent scenarios

    SYS_NAME identifies Agent Cache Sharing and Agent Memory Access as missing protocols for cache reuse and structured cross agent memory operations.

  • 03

    Vision for multi agent memory consistency

    SYS_NAME compares architecture memory, agent memory, and multi agent memory, calling for formal consistency models and verification frameworks.

RESULTS

By the Numbers

Metric unavailable

N/A

No quantitative baseline reported

Second metric

N/A

Conceptual position paper only

Third metric

N/A

No tables or numeric experiments

Fourth metric

N/A

Focus is qualitative architectural framing

SYS_NAME is a position paper with no benchmark tables, so there are no numeric MAIN_RESULT values or BASELINES to report.

BENCHMARK

Benchmark: No experimental table reported

No quantitative metric is provided for SYS_NAME, so this benchmark chart is intentionally left without numeric values.

KEY INSIGHT

The Counterintuitive Finding

SYS_NAME argues that multi-agent memory consistency is harder than hardware consistency, even though hardware deals with low level atomic operations and strict ordering.

This is counterintuitive because many assume semantic agents are easier to coordinate, yet SYS_NAME shows heterogeneous artifacts and implicit dependencies complicate safety guarantees.

WHY IT MATTERS

What this unlocks for the field

SYS_NAME unlocks a shared vocabulary to treat agent memory as a hierarchy with explicit cache sharing and memory access protocols.

With SYS_NAME, builders can design multi-agent systems that reason about bandwidth, coherence, and consistency instead of relying on ad hoc prompt stitching.

~10 min read← Back to papers

Related papers

RAGMemory ArchitectureLong-Term Memory

From RAG to Memory: Non-Parametric Continual Learning for Large Language Models

Bernal Jiménez Gutiérrez, Yiheng Shu et al.

ICML 2025 · 2025

HippoRAG 2 combines **Offline Indexing**, a schema-less **Knowledge Graph**, **Dense-Sparse Integration**, **Deeper Contextualization**, and **Recognition Memory** into a neuro-inspired non-parametric memory system for LLMs. On the joint RAG benchmark suite, HippoRAG 2 achieves 59.8 average F1 versus 57.0 for NV-Embed-v2, including 71.0 F1 on 2Wiki compared to 61.5 for NV-Embed-v2.

Agent MemoryMemory Architecture

General Agentic Memory Via Deep Research

B.Y. Yan, Chaofan Li et al.

arXiv 2025 · 2025

General Agentic Memory (GAM) combines a **Memorizer**, **Researcher**, **page-store**, and **memory** to keep full trajectories while constructing lightweight guidance for deep research. On RULER 128K retrieval, GAM achieves 97.70% accuracy compared to 94.25% for RAG using GPT-4o-mini, while also reaching 64.07 F1 on HotpotQA-56K.

Agent MemoryLong-Term MemoryMemory Architecture

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Prateek Chhikara, Dev Khant et al.

arXiv 2025 · 2025

Mem0 incrementally processes conversations using the **extraction phase**, **update phase**, **asynchronous summary generation module**, **tool call mechanism**, and a **vector database** to build scalable long-term memory. On the LOCOMO benchmark, Mem0 attains a J score of 67.13 on single-hop questions versus 63.79 for OpenAI and cuts p95 latency from 17.117s to 1.440s compared to the full-context baseline.