APEX-MEM: Agentic Semi-Structured Memory with Temporal Reasoning for Long-Term Conversational AI

AuthorsPratyay Banerjee, Masud Moshtaghi, Shivashankar Subramanian et al.

2026

TL;DR

APEX-MEM uses an append-only temporal property graph plus multi-tool Graph QnA agents to reach 88.88% accuracy on LOCOMO, +3.50 points over MIRIX.

SharePost on XLinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Long-context agents add noise and collapse under extended history (51.6% to 15.7% F1)

LLMs with larger context windows still fail on long conversations: GPT-4-Turbo drops from 51.6% F1 to 15.7% F1 under adversarial noise.

This breakdown hurts long-term conversational memory, causing inconsistent entities, broken temporal coherence, and unreliable answers across multi-session dialogues.

HOW IT WORKS

APEX-MEM — Property graphs, append-only events, and graph agents

APEX-MEM combines Ontology, Entity and Property Resolution, Fact Extraction, and Graph Agents into a temporal property graph that stores evolving conversational facts.

Think of APEX-MEM as a card catalog plus timeline: entities are cards, events are dated entries, and Graph Agents are librarians using specialized tools.

This design lets APEX-MEM resolve conflicts at query time, track temporal validity, and answer complex questions that a plain context window cannot handle.

DIAGRAM

APEX-MEM Graph QnA Agent Tool-use Sequence

This diagram shows how the APEX-MEM Graph QnA agent uses SCHEMAVIEWER, ENTITYLOOKUP, GRAPHSQL, and SEARCH tools to answer a question over the property graph.

DIAGRAM

Evaluation Pipeline across LOCOMO, LongMemEval, and SealQA-Hard

This diagram shows how APEX-MEM is constructed and evaluated on LOCOMO, LongMemEval, and SealQA-Hard with different QnA agents and baselines.

PROCESS

How APEX-MEM Handles a Conversational Question

  1. 01

    APEX-MEM Graph Construction

    APEX-MEM uses Fact Extraction and Entity and Property Resolution to build an append-only temporal property graph from conversational turns.

  2. 02

    Ontology and Fact Extraction

    APEX-MEM applies the Ontology during Fact Extraction to type entities, events, and subject property value assertions with temporal validity intervals.

  3. 03

    Graph Agents with Tools

    APEX-MEM Graph Agents invoke SCHEMAVIEWER, ENTITYLOOKUP, GRAPHSQL, and SEARCH to plan retrieval and reasoning over the property graph.

  4. 04

    Retrieval Time Temporal Resolution

    APEX-MEM resolves conflicting facts at query time using GRAPHSQL over events and facts to compute temporally valid answers for the user.

KEY CONTRIBUTIONS

Key Contributions

  • 01

    Hybrid entity event ontology for conversational memory

    APEX-MEM introduces an Ontology with 35 entity classes and temporally grounded events, enabling Fact Extraction to attach subject property value assertions with validity intervals.

  • 02

    Append only event storage with temporal validity

    APEX-MEM stores all facts as append-only events instead of overwriting entities, allowing Graph Agents to perform retrieval time temporal resolution over evolving information.

  • 03

    Multi tool Graph QnA agent over property graph

    APEX-MEM Graph Agents combine SCHEMAVIEWER, ENTITYLOOKUP, GRAPHSQL, and SEARCH, reaching 88.88% accuracy on LOCOMO and 86.2% on LongMemEval.

RESULTS

By the Numbers

Overall accuracy LOCOMO

88.88%

+3.50 over MIRIX

Temporal accuracy LOCOMO

90.63%

vs MIRIX 65.62% temporal

Overall score LongMemEval

86.2%

+11.6 over Nemori 74.6%

Accuracy SealQA Hard

40.1%

+5.5 over O3 34.6%

On LOCOMO and LongMemEval, which test long term conversational memory and long context reasoning, APEX-MEM’s 88.88% and 86.2% scores show robust temporal and multi hop reasoning over extended histories.

BENCHMARK

By the Numbers

On LOCOMO and LongMemEval, which test long term conversational memory and long context reasoning, APEX-MEM’s 88.88% and 86.2% scores show robust temporal and multi hop reasoning over extended histories.

BENCHMARK

LOCOMO Category Type Evaluation Results

Overall accuracy on LOCOMO Question Answering benchmark.

BENCHMARK

APEX-MEM Ablations of different tools

Overall LOCOMO accuracy for APEX-MEM Graph QnA Agent with different tool subsets.

KEY INSIGHT

The Counterintuitive Finding

APEX-MEM with full tools reaches 87.0% on LOCOMO, while GraphSQL only configuration needs 3.3x more tool calls for just 79.45%.

This is surprising because many expect more structured SQL reasoning alone to be enough, but APEX-MEM shows hybrid SEARCH plus GRAPHSQL is both more accurate and more efficient.

WHY IT MATTERS

What this unlocks for the field

APEX-MEM unlocks temporally coherent, entity consistent conversational memory that can resolve conflicting facts at query time instead of overwriting history.

Builders can now create assistants that survive weeks long, noisy interactions while still answering temporal and multi hop questions with over 88% accuracy on challenging benchmarks.

~14 min read← Back to papers

Related papers

Memory Architecture

A Control Architecture for Training-Free Memory Use

Yanzhen Lu, Muchen Jiang et al.

· 2026

TAG routes low-confidence steps to uncertainty-based routing, filters them with guarded acceptance with rollback, chooses between bank selection across rule and exemplar memory, and prunes via evidence-based retirement inside a unified control loop. On SVAMP and ASDiv, TAG reaches 81.0% and 85.2% accuracy, improving over the 74.0% and 77.5% no-memory baselines while a compute-matched Retry baseline stays flat.

Questions about this paper?

Paper: APEX-MEM: Agentic Semi-Structured Memory with Temporal Reasoning for Long-Term Conversational AI

Answers use this explainer on Memory Papers.

Checking…