developermemorytechnical

How AI Agent Memory Works: From Conversations to Long-Term Knowledge

Mario Simic

ยท6 min read
ShareXLinkedIn

Every time you start a new ChatGPT conversation, it has no memory of who you are. It does not know you prefer concise answers, what timezone you are in, what projects you are working on, or how you communicate. You start from zero every single time. This is not an inherent limitation of language models โ€” it is a deliberate design choice. Most AI products treat conversations as stateless sessions. AI agents that are genuinely useful long-term cannot afford to make that choice.

Three Types of Memory

Working memory is the context window โ€” the text the model can see during a single inference call. Modern models support 8,000 to 128,000 tokens of context. Every message in the current conversation exists in working memory. It is ephemeral: when the conversation ends, working memory is gone.

Episodic memory is the conversation history saved to disk. When you reopen a previous conversation, the prior messages can be loaded back into the context window. This gives basic persistence, but has limits: long conversations overflow the context window, and loading extensive history is slow and expensive.

Semantic memory is extracted, abstracted knowledge โ€” not "on January 7th Mario said he prefers dark mode" but simply "Mario prefers dark mode." Facts, preferences, habits, and relationships distilled from conversations into something compact and retrievable. This is the foundation of an agent that feels like it knows you.

Useful AI agents need all three, managed intelligently. Working memory handles the immediate conversation. Episodic memory provides conversation continuity. Semantic memory enables long-term personalisation.

How Skales Implements Persistent Memory

Skales maintains two JSON files that accumulate semantic memory over time: soul.json and human.json, stored in ~/.skales-data/.

soul.json is the agent's self-model: its name, personality configuration, communication style preferences, the skills it has been given, the tools it has access to, and the base system prompt. This file defines what the agent is. It is modified when you change settings or install new skills.

human.json is what the agent has learned about you: your name, your occupation, your preferences, your active projects, your communication style, notable facts you have mentioned in conversations. This is a structured extract of information you have provided. It is not surveillance โ€” it contains only what you have told the agent, in a format you can view and edit directly at any time. It is modified by the agent when it learns something new and decides to persist it.

At the start of every conversation, both files are loaded into the system prompt. The agent knows who it is and who you are before you type anything.

Context Chunking for Long Conversations

Long conversations eventually exceed the context window. Loading a six-month conversation history verbatim would be prohibitively expensive and slow. Skales handles this through a three-tier chunking strategy.

The most recent messages (typically the last 20โ€“30 exchanges) are always included verbatim โ€” the agent needs precise recall of what was just discussed. Older messages from the current session are summarised in groups: each block of exchanges is compressed to a shorter summary that preserves key information without the full text. Summaries from previous sessions are stored separately and loaded selectively when topic similarity suggests they are relevant to the current conversation.

The result is that the agent's effective memory extends well beyond the physical context window, at the cost of some fidelity on older details. For most practical purposes โ€” knowing your preferences, your projects, your communication history โ€” this is entirely adequate.

Bi-Temporal Storage

Memory entries in Skales carry two timestamps, not one. The valid time is when the fact was true in the world. The recorded time is when the agent learned it. These timestamps differ when you tell the agent about past events, correct a previous statement, or update a preference over time.

Consider: in January you mention you work at Acme Corp. In March you say you just started a new job at Beta Inc. A naive memory system would overwrite the old fact. The bi-temporal system records: "Acme Corp, valid from January to March." "Beta Inc, valid from March onwards." This allows the agent to answer time-referenced questions correctly โ€” "where was I working when we discussed the contract in February?" โ€” without losing historical context.

Most personal AI tools do not implement this level of temporal precision. The practical benefit is that the agent's knowledge remains accurate as your life changes, without losing the ability to reason about your past. See all personalisation features in Skales.

Try it yourself ๐ŸฆŽ

Skales is free for personal use. No Docker. No account.

Download Free โ†’
ShareXLinkedIn