Multi-Agent Memory from a Computer Architecture Perspective: Visions and Challenges Ahead

AuthorsZhongming Yu, Naicheng Yu, Hejia Zhang et al.

arXiv 20262026

TL;DR

Multi-Agent Memory Architecture reframes multi-agent context as a three-layer memory hierarchy plus cache and access protocols to ground future consistency models.

SharePost on X LinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Multi-agent systems lack principled memory consistency and access control

Multi-Agent Memory Architecture argues that multi-agent systems face a bottleneck that mirrors computer architecture: memory hierarchy, bandwidth, and consistency constraints.

Without structured shared or distributed memory, collaborative LLM agents risk stale reads, overwrites, and incoherent context, undermining long-horizon reasoning and safety.

HOW IT WORKS

Architecture-inspired agent memory hierarchy and protocols

Multi-Agent Memory Architecture introduces a three-layer hierarchy: Agent IO Layer, Agent Cache Layer, and Agent Memory Layer, plus Agent Cache Sharing and Agent Memory Access Protocol.

Think of Multi-Agent Memory Architecture like CPU I O, cache, and DRAM, where agent context flows through fast working memory into slower but larger persistent stores.

This KEY_MECHANISM lets Multi-Agent Memory Architecture reason about multi-agent context as data movement and consistency, something a plain context window cannot express or control.

DIAGRAM

Shared vs distributed multi-agent memory architectures

This diagram contrasts how Multi-Agent Memory Architecture conceptualizes shared and distributed memory for coordinating multiple LLM agents.

DIAGRAM

Protocol framing for multi-agent memory

This diagram shows how Multi-Agent Memory Architecture layers Agent Context IO, Agent Cache Sharing, and Agent Memory Access Protocol over the memory hierarchy.

PROCESS

How Multi-Agent Memory Architecture Handles a Multi-Agent Session

01
Agent IO Layer
Multi-Agent Memory Architecture first routes user inputs through Agent IO Layer so audio, text, images, and network calls become structured context.
02
Agent Cache Layer
Multi-Agent Memory Architecture stores compressed context, recent trajectories, and tool calls in Agent Cache Layer for low-latency multi-hop reasoning.
03
Agent Memory Layer
Multi-Agent Memory Architecture persists full dialogue history and external knowledge database entries in Agent Memory Layer for long-term retrieval.
04
Agent Cache Sharing
Multi-Agent Memory Architecture uses Agent Cache Sharing and Agent Memory Access Protocol so other agents can safely transform and reuse cached artifacts.

KEY CONTRIBUTIONS

Key Contributions

01
Agent memory hierarchy
Multi-Agent Memory Architecture defines Agent IO Layer, Agent Cache Layer, and Agent Memory Layer to mirror computer memory hierarchies for LLM agents.
02
Protocol extension for multi-agent scenarios
Multi-Agent Memory Architecture proposes Agent Cache Sharing and Agent Memory Access Protocol to move beyond raw connectivity like MCP and JSON RPC.
03
Multi-agent consistency agenda
Multi-Agent Memory Architecture identifies multi-agent memory consistency as the next frontier and calls for formal consistency models and verification frameworks.

RESULTS

By the Numbers

Memory layers

3 layers

Agent IO Layer, Agent Cache Layer, Agent Memory Layer

Protocol gaps

2 protocols

Agent Cache Sharing and Agent Memory Access Protocol

Memory paradigms

2 types

Shared memory and distributed memory architectures

Consistency models

3 tiers

Architecture memory, agent memory, multi-agent memory comparison

Multi-Agent Memory Architecture is a position paper without benchmarks, so the key numbers describe the proposed hierarchy, protocol gaps, and consistency comparison. These counts show how Multi-Agent Memory Architecture structures the design space rather than reporting accuracy gains.

BENCHMARK

By the Numbers

BENCHMARK

Multi-Agent Memory Architecture design space breakdown

Relative emphasis across memory layers and protocol gaps in Multi-Agent Memory Architecture.

KEY INSIGHT

The Counterintuitive Finding

Multi-Agent Memory Architecture argues that multi-agent systems are already hitting a memory wall, even without explicit token-length failures or benchmark collapses.

This is counterintuitive because many builders still treat prompts as static strings, underestimating how bandwidth, caching, and coherence silently limit multi-agent performance.

WHY IT MATTERS

What this unlocks for the field

Multi-Agent Memory Architecture gives researchers a shared vocabulary to design agent memory like computer architects design caches and consistency models.

Builders can now reason about Agent Cache Layer, Agent Memory Layer, Agent Cache Sharing, and Agent Memory Access Protocol instead of ad hoc prompt hacks.

~10 min read← Back to papers

Related papers

BenchmarkAgent Memory

Active Context Compression: Autonomous Memory Management in LLM Agents

Nikhil Verma

· 2026

Focus Agent adds start_focus, complete_focus, a persistent Knowledge block, and an optimized Persistent Bash plus String-Replace Editor scaffold to actively compress context during long software-engineering tasks. On five hard SWE-bench Lite instances against a Baseline ReAct agent, Focus Agent achieves 22.7% token reduction (14.9M → 11.5M) while matching 3/5 = 60% task success.

arXiv:2601.07190 Read explainer

Agent Memory

ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents

Xiaohui Zhang, Zequn Sun et al.

· 2026

ActMem transforms dialogue history into atomic facts via Memory Fact Extraction, groups them with Fact Clustering, links them through a Memory KG Construction module, and uses Counterfactual-based Retrieval and Reasoning for action-aware answers. On ActMemEval, ActMem reaches 76.52% QA accuracy with DeepSeek-V3, beating LightMem’s 63.97% by 12.55 points and NaiveRAG’s 61.54%.

arXiv:2603.00026 Read explainer

RAGBenchmarkAgent MemoryMemory Architecture

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

Xingyu Lyu, Jianfeng He et al.

· 2026

ADAM combines Anchor extraction, Distribution estimation, Anchor selection, and Query generation to adaptively probe agent memory via an auxiliary generator and entropy based selection. On the EHRAgent benchmark with Llama2-7b-chat, ADAM reaches EQ=77 and ASR=1.00, compared to MEXTRA’s EQ=44 and ASR=0.89.

arXiv:2604.09747 Read explainer

Questions about this paper?

Answers use this explainer on Memory Papers.

Checking…