How to Spec Agent Memory for Your AI Product

For AI product managers and technical leads · Based on IBM CoALA Four-Type Agent Memory Framework

// TL;DR

For AI product managers and technical leads, the CoALA framework gives you a structured language to spec agent memory requirements without over-engineering or under-investing. Classify your agent's complexity tier (reflex, narrow-purpose, or full autonomous) to determine the minimum viable memory stack. Use the four memory types — working, semantic, procedural, episodic — as your requirements vocabulary. This prevents teams from defaulting to 'just use a bigger context window' or building expensive episodic memory systems for agents that only need a Markdown knowledge file.

Why should product managers care about agent memory architecture?

The difference between a chatbot and an agent is memory. A chatbot gives a response. An agent gives a response shaped by persistent knowledge, accumulated experience, remembered preferences, and recorded mistakes. When your AI product fails to retain user context, violates its own guidelines, or can't improve over time, these are memory architecture failures — and they're specifiable, solvable problems.

The CoALA framework from Princeton research gives PMs a structured vocabulary for agent memory: four types (working, semantic, procedural, episodic), three complexity tiers, and clear decision criteria for what your product actually needs.

How do you translate agent memory types into product requirements?

Each CoALA memory type maps to a product capability:

- Working memory = the agent can handle the current conversation and loaded data. Every agent has this. The product question is: how much can it hold before quality degrades?

- Semantic memory = the agent consistently applies your product's rules, brand guidelines, or domain knowledge across every session. Requirement: "The agent must follow our style guide and product documentation in every interaction without being re-prompted."

- Procedural memory = the agent can execute structured workflows reliably. Requirement: "The agent must follow a defined step-by-step process for handling refund requests." Specify progressive disclosure so the agent doesn't load every workflow simultaneously.

- Episodic memory = the agent learns from past interactions with this specific user or project. Requirement: "The agent must remember that this user prefers concise responses and had a billing issue resolved last month."

How do you avoid over-engineering or under-engineering memory?

Classify your product's agent into a complexity tier:

Tier A — Reflex: Does your agent perform one deterministic task? (Routing, classification, simple lookup.) Spec only working memory. Don't let engineering build a vector database.

Tier B — Narrow-Purpose: Does your agent follow structured procedures in one domain? (Onboarding, troubleshooting, form completion.) Spec working memory + procedural memory. A skill.md file per workflow is sufficient.

Tier C — Full Autonomous: Does your agent need to maintain awareness across sessions, handle diverse tasks, and improve? (Personal assistant, coding copilot, research analyst.) Spec all four types. Budget for the episodic memory engineering work, including the forgetting problem.

The most common PM mistake is speccing Tier C memory for a Tier B product — building expensive episodic memory and vector databases when a Markdown file and a skill library would suffice.

What should you include in the memory section of your product spec?

Your spec should include:

1. Agent tier classification with justification

2. Memory types required (working, semantic, procedural, episodic) with specific product requirements each type serves

3. Implementation approach per type — at minimum: Markdown file vs. vector DB for semantic, skill.md with progressive disclosure for procedural, distilled notes vs. raw transcripts for episodic

4. Forgetting policy if episodic memory is in scope — what expires, what triggers deletion, how context shifts are handled

5. Known failure modes the memory architecture must resolve

This gives engineering a clear, actionable specification rather than the vague "the agent should remember things" that leads to either over-building or under-building.

What's the next step?

Review your current agent product spec. Does it explicitly address memory architecture? If not, classify your agent's tier, identify the memory types it needs using the CoALA framework, and add a memory architecture section to your spec with specific requirements, implementation guidance, and a forgetting policy where applicable.

// FREQUENTLY ASKED QUESTIONS

How do I explain the CoALA memory framework to my engineering team?

Frame it as four layers of memory, each solving a different problem: working memory is RAM (current session), semantic memory is the knowledge base (persistent facts), procedural memory is the skill library (how to do things), and episodic memory is the journal (what happened before and what we learned). Classify your agent's tier and tell engineering which layers you need and why — this gives them actionable architecture requirements.

How much does agent memory architecture add to development cost?

Tier A (working memory only) adds nothing. Tier B (add procedural) adds modest effort — writing skill.md files and implementing progressive disclosure. Tier C (all four types) adds significant engineering, especially episodic memory with forgetting policies. The biggest cost savings come from correctly classifying your agent tier and not building memory types you don't need. Over-engineering memory for a Tier A agent wastes weeks; under-engineering memory for Tier C causes costly production failures.

What's the most common memory-related product failure?

The most common failure is an agent that loses all context between sessions because it has no semantic or episodic memory — only working memory. Users report the agent 'forgetting everything,' violating rules it previously followed, and requiring constant re-prompting. The fix is usually straightforward: add a semantic memory file with persistent rules and documentation loaded at session start.