How to Design Agent Memory Architecture with CoALA

For AI/ML engineers building production agents · Based on IBM CoALA Four-Type Agent Memory Framework

// TL;DR

For AI/ML engineers building production agents, the CoALA framework provides a systematic method to select the right memory types for your agent's complexity tier. Classify your agent as reflex (working memory only), narrow-purpose (add procedural), or full autonomous (all four types). Use progressive disclosure to manage procedural memory without overloading the context window. Implement episodic memory as distilled experience notes — not raw transcripts — with explicit deletion policies. Apply it during initial architecture design and whenever agents exhibit memory-related failures.

Why do production agents need a formal memory architecture?

Most agent failures in production trace back to memory problems. The agent forgets project conventions between sessions, repeats mistakes it already resolved, loses track of instructions buried in a long context window, or accumulates stale memories that contradict current reality. The IBM CoALA Four-Type Agent Memory Framework — derived from the Princeton Cognitive Architectures for Language Agents research — gives AI engineers a systematic way to diagnose these failures and design the correct memory stack.

The framework identifies four distinct memory types: working memory (context window), semantic memory (persistent knowledge), procedural memory (skill libraries), and episodic memory (distilled past experience). Not every agent needs all four.

How do you classify an agent and pick the right memory types?

Start by classifying your agent into one of three complexity tiers:

- Tier A — Reflex/Simple: Deterministic, single-task agents like routing bots. These need only working memory.

- Tier B — Narrow-Purpose: Structured single-domain agents like onboarding assistants. These need working memory plus procedural memory (encoded skills).

- Tier C — Full Autonomous: Multi-task agents that must maintain context and improve over time, like coding assistants or research agents. These need all four memory types.

Classification drives the architecture. Over-engineering a Tier A agent with episodic memory wastes resources. Under-engineering a Tier C agent with only working memory guarantees failure.

How should engineers implement each memory type in practice?

Working memory is always present — it's your model's context window. Keep it lean. Even million-token windows degrade when overloaded. Don't bulk-load skills or documentation.

Semantic memory in most production systems is a well-structured Markdown file (like CLAUDE.md) loaded at session start. It contains architecture, conventions, build commands, and rules. Upgrade to a vector database only when the knowledge base outgrows what fits in context.

Procedural memory uses the skill.md format. Each skill lives in its own folder with a Markdown file containing name, description, and step-by-step instructions. Apply progressive disclosure: hold only the skill index (~100 tokens per skill) in working memory; load full instructions on demand; pull referenced files only during execution.

Episodic memory is the hardest to get right. Distill each session into compressed, decision-relevant notes — not raw transcripts. Define explicit save criteria, update rules, and deletion triggers. The forgetting problem is real: without expiry policies, stale memories accumulate and degrade performance.

How do you audit an existing agent's memory using CoALA?

Document your agent's current memory mechanisms: what's in the context window, what persists across sessions, what retrieval systems exist. Compare against the tier-appropriate memory stack. Common gaps include:

- No semantic memory → agent violates conventions it was told about in prior sessions.

- No progressive disclosure → all skills bulk-loaded, context overloaded.

- Raw transcript storage → episodic memory exists but is useless in practice.

- No forgetting policy → contradictory memories accumulate.

Each gap maps directly to a remediation action.

What's the next step?

Describe your agent — what it does, what problems it has, and what memory it currently uses. Then walk through the 8-step CoALA workflow to produce a complete memory architecture specification, including tier classification, memory type assignments, implementation recommendations, and forgetting policies where applicable.

// FREQUENTLY ASKED QUESTIONS

What's the simplest way to add semantic memory to an existing agent?

Create a Markdown file containing your project's architecture, conventions, rules, and key facts. Load it into the agent's context window at the start of every session via the system prompt or a file-loading mechanism. This single file eliminates the most common memory failure — agents forgetting persistent knowledge between sessions. Keep it concise and update it whenever conventions change.

How do I know if my agent's context window is overloaded?

Watch for: the agent ignoring instructions that appear in the middle of long contexts, inconsistent behavior on identical tasks, failure to reference information you know is loaded, and generally degraded response quality as context grows. Test by moving critical instructions to the beginning or end of context — if performance improves, mid-context information was being lost.

Should I use LangChain memory modules or build custom memory?

Use CoALA to decide your architecture first — which memory types you need and why. Then evaluate whether LangChain's modules fit your requirements. LangChain provides useful conversation buffers and retrieval tools, but its built-in memory is primarily working memory management. You'll likely need custom implementations for progressive disclosure of procedural memory and for distilled episodic memory with forgetting policies.