Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

AuthorsPrateek Chhikara, Dev Khant, Saket Aryan et al.

arXiv 20252025

TL;DR

Mem0 uses an extraction and update pipeline with tool calls over a vector database to reach 67.13 J on LOCOMO single-hop questions, +3.34 over OpenAI.

SharePost on X LinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Long conversations exceed context windows and break coherence with 17.117s full context latency

Mem0 targets LLMs that lose persistent memory once conversations exceed fixed context windows, forcing full-context runs with 17.117 seconds p95 latency.

When LOCOMO conversations reach around 26000 tokens, full-context processing becomes too slow and expensive, causing forgetful agents and degraded multi-session coherence.

HOW IT WORKS

Mem0 — extraction and update for scalable long term memory

Mem0 centers on an extraction phase, update phase, asynchronous summary generation module, tool call mechanism, and a vector database to manage conversational memories.

You can think of Mem0 like a human with a notepad and filing cabinet: recent dialogue goes to scratchpad, then distilled facts are filed into organized, searchable memory.

This architecture lets Mem0 selectively store, merge, and delete facts over time, enabling consistent reasoning across sessions that a plain context window cannot maintain.

DIAGRAM

Mem0g graph memory extraction and query interaction

This diagram shows how Mem0g converts dialogue into entities and relationship triplets, updates the knowledge graph, and serves queries via dual retrieval.

DIAGRAM

LOCOMO evaluation pipeline for Mem0 and baselines

This diagram shows how Mem0 is evaluated on LOCOMO conversations, question categories, and metrics against multiple baseline systems.

PROCESS

How Mem0 Handles a Conversation Session

01
Extraction phase
Mem0 takes the current message pair with the conversation summary and recent messages to run the extraction phase and produce salient memory candidates.
02
Asynchronous summary generation module
In parallel, Mem0 periodically refreshes the conversation summary using the asynchronous summary generation module so extraction always sees up to date global context.
03
Update phase
Mem0 feeds each candidate fact and top s similar memories from the vector database into the update phase to decide how to modify stored memories.
04
Tool call mechanism
Within the update phase, Mem0 uses the tool call mechanism to let the LLM choose ADD, UPDATE, DELETE, or NOOP and then applies changes in the vector database.

KEY CONTRIBUTIONS

Key Contributions

01
Mem0 memory architecture
Mem0 introduces an extraction phase and update phase with a tool call mechanism over a vector database, achieving J 67.13 on single-hop LOCOMO questions.
02
Mem0g graph memory architecture
Mem0g extends Mem0 with entity extractor and relationship generator modules to build a Neo4j knowledge graph that improves temporal J to 58.13.
03
Comprehensive LOCOMO evaluation
Mem0 is compared against LoCoMo, ReadAgent, MemoryBank, MemGPT, A-Mem, LangMem, Zep, OpenAI, RAG, and full-context baselines across four LOCOMO question types and deployment metrics.

RESULTS

By the Numbers

Single Hop J

67.13

+3.34 over OpenAI

Temporal J

55.51

+6.20 over Zep

Overall J

66.88

+13.98 over A-Mem

Total p95 latency

1.440 seconds

-15.677 seconds vs Full-context

On the LOCOMO long term conversational memory benchmark, Mem0 is evaluated across single-hop, multi-hop, temporal, and open-domain questions. These results show that Mem0 raises factual quality while dramatically reducing latency compared to full-context and A-Mem baselines.

BENCHMARK

By the Numbers

BENCHMARK

Performance comparison of memory enabled systems across LOCOMO single hop J

LLM-as-a-Judge score J on LOCOMO single-hop questions.

KEY INSIGHT

The Counterintuitive Finding

Mem0g achieves an overall J of 68.44 while using only about 14k memory tokens per conversation, compared to Zep’s more than 600k tokens.

This is surprising because many expect richer graph memories to require more storage, yet Mem0g’s compact graph plus text memory beats Zep by 2.45 J with far fewer tokens.

WHY IT MATTERS

What this unlocks for the field

Mem0 enables production agents to maintain coherent long term memory with 91 percent lower p95 latency than full-context processing on LOCOMO conversations.

Builders can now deploy agents that remember user preferences across sessions without paying the 26031 token context and 17.117 second latency cost of full-context baselines.

~14 min read← Back to papers

Related papers

Agent MemoryLong-Term Memory

Adaptive Memory Admission Control for LLM Agents

Guilin Zhang, Wei Jiang et al.

· 2026

A-MAC scores candidate memories using Utility, Confidence, Novelty, Recency, and Type Prior combined by a learned linear admission policy with Algorithm 1 A-MAC Memory Admission. On the LoCoMo benchmark, A-MAC achieves F1 0.583 and 2644 ms latency, improving F1 by 0.042 and reducing latency by 1187 ms compared to A-mem.

arXiv:2603.04549 Read explainer

Long-Term Memory

Advancing Open-source World Models

Robbyant Team, Zelin Gao et al.

arXiv 2026 · 2026

LingBot-World combines a Data Engine, Fundamental World Model, Action-Conditioned World Model, and Post-Training causal adaptation to turn a 28B-parameter video generator into a real-time interactive world simulator. On the VBench benchmark, LingBot-World achieves a dynamic degree of 0.8857 versus 0.7612 for Yume-1.5, while also improving imaging quality to 0.6683.

arXiv:2601.20540 Read explainer

BenchmarkBenchmarkLong-Term Memory

AgenticAI-DialogGen: Topic-Guided Conversation Generation for Fine-Tuning and Evaluating Short- and Long-Term Memories of LLMs

Manoj Madushanka Perera, Adnan Mahmood et al.

· 2026

AgenticAI-DialogGen chains ChatPreprocessor, KnowledgeExtractor, TopicAnalyzer, KnowledgeGraphBuilder, PersonaGenerator, DuelingChat Agent, ConversationValidator, ConversationRefiner, QAGeneration, and PostProcessing to turn raw multi-session chats into topic-guided, persona-grounded conversations with explicit short- and long-term memories. On the TGC / KG memory QA benchmark, Mistral-7B fine-tuned within AgenticAI-DialogGen achieves 87.36 F1, compared to GPT-4’s 83.77 F1 in a zero-shot setting on the same task.

arXiv:2604.12179 Read explainer

Questions about this paper?

Answers use this explainer on Memory Papers.

Checking…