Why most agent memory implementations are wrong
May 9, 2025
1 min read
Most agent memory systems conflate storage with retrieval strategy, and that’s the root of the problem.
When developers add memory to an agent, they typically do one of two things: stuff everything into the context window, or vector-search the conversation history and inject the top-k results. Both approaches fail in predictable ways.
The context stuffing trap
Putting everything in context is the naive solution. It works until it doesn’t — and it doesn’t at around 50 turns of conversation, or whenever the user switches topics, or whenever the model gets confused about temporal ordering.
The core issue is that LLMs aren’t databases. Long contexts introduce attention dilution. Important facts from turn 3 can get drowned out by irrelevant chatter from turns 40–60.
Vector search is better, but not enough
Semantic retrieval solves the context length problem but introduces a new one: you’re now retrieving based on surface-level similarity rather than causal relevance.
If a user said “I prefer dark mode” in turn 2 and you’re now at turn 100 discussing UI preferences, a vector search will find it — but only if the query semantically overlaps. If you’re discussing “interface settings,” you might not retrieve it.
What actually works
The systems that work in practice maintain multiple memory types:
- Episodic — recent conversation turns (last N)
- Semantic — extracted facts about the user/task (key-value or graph)
- Procedural — learned patterns about how the user likes to work
The trick is writing the memory, not just reading it. Most implementations treat memory as read-only retrieval. The agents that actually feel intelligent are the ones that actively maintain a model of the user.
Tools worth looking at: Mem0 for managed memory, Zep for session graphs, Letta for persistent agent state.