IBM CoALA Four-Type Agent Memory Framework
Design the right memory architecture for any AI agent by selecting and configuring the correct combination of the four CoALA memory types — working, semantic, procedural, and episodic — matched to the agent's complexity and purpose.
// TL;DR
The IBM CoALA Four-Type Agent Memory Framework is a systematic approach to designing AI agent memory architectures based on four memory types from the Princeton CoALA research: working memory (context window), semantic memory (persistent knowledge), procedural memory (skill libraries), and episodic memory (distilled past experience). Use it whenever you're building, auditing, or debugging an AI agent to determine exactly which memory types it needs. It prevents over-engineering simple agents and ensures complex agents don't lose context, repeat mistakes, or fail to improve across sessions.
// When should you use the CoALA four-type agent memory framework?
Use this skill whenever you are designing, auditing, or debugging an AI agent and need to decide what kinds of memory it requires. Also apply it when an agent is making repeated mistakes, losing context across sessions, or failing to improve over time.
// What inputs do you need to design an agent's memory architecture?
- agent_descriptionrequired
A plain-English description of what the agent does, what tasks it handles, and how it interacts with users or systems. - agent_complexity_level
A rough classification of the agent: reflex/simple, narrow-purpose, or full autonomous agent. Can be inferred from agent_description if not stated. - current_memory_setup
What memory mechanisms, if any, the agent already uses (e.g. context window only, vector database, saved transcripts, markdown files). - known_failure_modes
Any specific problems the agent currently exhibits — e.g. repeating mistakes, forgetting project context, not following conventions.
// What are the core principles behind the CoALA agent memory framework?
Memory Is What Separates a Chatbot from an Agent
A chatbot gives a response. An agent gives a response shaped by persistent knowledge, accumulated experience, remembered preferences, and recorded mistakes. The presence and quality of memory architecture is the defining distinction.
Four CoALA Memory Types
Every well-designed agent draws on some combination of four memory types derived from the CoALA framework (Cognitive Architectures for Language Agents, Princeton): working memory, semantic memory, procedural memory, and episodic memory. Not every agent needs all four — match the memory stack to the agent's complexity.
Progressive Disclosure for Procedural Memory
Agent skills should never be bulk-loaded into the context window. The agent maintains a lightweight index (name + description per skill, ~100 tokens each); full skill instructions are only loaded when a matching task is triggered, and referenced resources are pulled in only during execution.
Distilled Experience over Raw Transcripts
Naive episodic memory — saving full conversation transcripts — is technically valid but rarely useful. Production systems distil sessions into compressed, decision-relevant notes (e.g. 'last time we debugged the auth module, the issue was in the middleware layer') rather than storing verbatim logs.
Forgetting Is an Engineering Problem
Humans forget naturally and usefully. For agents, deciding what to delete, when information becomes obsolete, and how to handle context that no longer applies (e.g. a user changes jobs) is an explicit engineering challenge that must be designed for, not ignored.
Working Memory Is Volatile and Bounded
The context window is fast and immediately accessible but disappears when the session ends and degrades in performance when overloaded. Even very large context windows (1M+ tokens) have a ceiling, and burying information in the middle causes the model to lose track of it.
// How do you apply the CoALA memory framework step by step?
- 1
Classify the agent's complexity tier
Map the agent to one of three tiers: (A) Reflex/Simple — deterministic, narrow, no learning needed (e.g. thermostat, basic routing bot); (B) Narrow-Purpose — single domain, structured tasks, no cross-session learning needed (e.g. password reset agent); (C) Full Autonomous — multi-task, project-aware, must improve over time (e.g. coding agent). This tier determines the minimum viable memory stack.
- 2
Assign Working Memory baseline
Every agent at every tier requires working memory — the context window. Document what currently lives in it: system instructions, conversation history, loaded files/data. Flag any signs of context window overload (performance degradation, model losing track of items buried mid-context). Working memory is always present; the question is what else is needed.
- 3
Determine whether Semantic Memory is required
Ask: does the agent need persistent factual knowledge, project architecture, coding conventions, rules, or documentation that must be consistent across every session? If yes, implement semantic memory. The simplest production form is a Markdown file (e.g. a CLAUDE.md-style file) loaded into the context window at session start. More complex implementations include vector databases or knowledge graphs. Without semantic memory, the agent will repeat the same mistakes because it has no persistent knowledge to draw from.
- 4
Determine whether Procedural Memory is required
Ask: does the agent need to execute repeatable, structured procedures (skills)? If yes, implement procedural memory using the skill.md format — each skill is a folder containing a Markdown file with the skill name, description, and step-by-step instructions. Apply progressive disclosure: expose only a lightweight index (name + description) in working memory at all times; load full skill instructions only when a task matches that skill; pull in referenced files, templates, and scripts only during execution. Narrow-purpose agents with a single defined procedure need this; reflex agents typically do not.
- 5
Determine whether Episodic Memory is required
Ask: does the agent need to learn and improve across sessions, remember past decisions, or carry forward project-specific discoveries? If yes, implement episodic memory. Do NOT default to raw transcript storage — design a distillation mechanism that extracts decision-relevant notes from each session. Define explicit policies for: what gets saved, what gets overwritten, and what gets deleted (the forgetting problem). Episodic memory is the hardest type to get right and is typically only warranted for full autonomous agents.
- 6
Map the agent to its final memory stack
Produce a explicit memory stack declaration for the agent. Tier A: Working only. Tier B: Working + Procedural (at minimum). Tier C: Working + Semantic + Procedural + Episodic. Justify any deviation from the tier default based on known failure modes or specific requirements identified in the inputs.
- 7
Audit existing memory setup against the target stack
If current_memory_setup was provided, compare it to the target stack. Identify gaps (missing memory types causing known failure modes) and overloads (e.g. bulk-loading all skills into working memory instead of using progressive disclosure, or storing raw transcripts instead of distilled experience). Produce specific remediation recommendations.
- 8
Flag the forgetting problem if Episodic Memory is in scope
If episodic memory is part of the design, explicitly answer: What is the deletion/expiry policy? What triggers a memory as obsolete? How does the agent handle context shifts (e.g. user changes role, project is deprecated)? This must be answered as an engineering decision, not left open.
// What does the CoALA memory framework look like in real agent examples?
A simple internal routing bot that reads incoming support tickets and assigns them to the correct team queue based on keywords.
Tier A — Reflex agent. Assign Working Memory only. The agent reads the current ticket (working memory/context window), applies routing logic, and routes. No persistent knowledge, no skills library, no cross-session learning needed. Any attempt to add semantic or episodic memory here would add complexity with no benefit.
A narrow-purpose HR agent whose sole function is to walk employees through a structured onboarding checklist.
Tier B — Narrow-purpose agent. Assign Working Memory + Procedural Memory. The onboarding checklist is encoded as a skill.md file. The agent holds a lightweight skill index in working memory; when the onboarding task is triggered, it loads the full checklist instructions. No semantic memory needed (knowledge is baked into the skill); no episodic memory needed (each onboarding is stateless).
A coding agent that works on a long-running software project across many sessions, follows project-specific conventions, executes structured workflows like code review, and needs to remember past debugging discoveries.
Tier C — Full autonomous agent. Assign all four types. Working memory holds the current session context. Semantic memory is implemented as a project Markdown file containing architecture, conventions, build commands, and anti-patterns, loaded at session start. Procedural memory holds skills like 'run structured code review' and 'create pull request summary', exposed via a lightweight index with progressive disclosure. Episodic memory distils key discoveries across sessions (e.g. 'the auth module issues trace to middleware') rather than storing raw transcripts. Forgetting policy must address what to do when a module is deprecated or refactored.
// What are the most common mistakes when designing AI agent memory?
- Defaulting every agent to all four memory types — reflex and narrow-purpose agents only need a subset; over-engineering memory adds latency, cost, and complexity for no gain.
- Storing raw conversation transcripts as episodic memory — this is technically episodic memory but is rarely useful in production; always distil sessions into compressed, decision-relevant notes instead.
- Bulk-loading all skill instructions into working memory at once — this violates progressive disclosure, blows through the working memory budget, and degrades performance; expose only the lightweight index by default.
- Treating working memory as unlimited — even million-token context windows degrade when overloaded, particularly when relevant information gets buried in the middle of the context window.
- Ignoring the forgetting problem when implementing episodic memory — failing to define deletion and expiry policies means the agent accumulates stale, contradictory, or irrelevant memories that degrade rather than improve its performance.
- Conflating semantic and procedural memory — semantic memory stores what the agent knows (facts, rules, documentation); procedural memory stores how the agent does things (step-by-step skill instructions). They serve different purposes and are implemented differently.
- Implementing semantic memory only as vector databases or knowledge graphs — in many production agentic systems, a well-structured Markdown file loaded at session start is sufficient and far simpler to maintain.
// What do the key terms in the CoALA memory framework mean?
- CoALA
- Cognitive Architectures for Language Agents — a framework from a Princeton research team that maps out four distinct types of memory that AI agents need: working, semantic, procedural, and episodic.
- Working Memory
- The agent's context window — everything the agent can see right now, including the current conversation, system instructions, and any loaded files or data. Analogous to RAM: fast and immediately accessible, but volatile (gone when the session ends) and bounded in size.
- Semantic Memory
- The agent's knowledge base — persistent facts, rules, conventions, and documentation the agent needs across all sessions. In production, commonly implemented as a Markdown file (e.g. CLAUDE.md) loaded into the context window at session start, though vector databases and knowledge graphs are also valid implementations.
- Procedural Memory
- How the agent knows how to do things — a library of agent skills encoded in skill.md files, each containing the skill name, description, and step-by-step execution instructions. Governed by progressive disclosure.
- Episodic Memory
- The agent's record of what happened in past interactions, past decisions, and what it learned from them — implemented as distilled, compressed experience rather than raw transcripts. This is where agent memory starts to genuinely look like learning.
- Agent Skills
- An open standard for procedural memory using the skill.md file format. Each skill is a folder containing a Markdown file that describes the skill and provides step-by-step instructions for performing it.
- skill.md
- The file format used for agent skills in procedural memory — a Markdown file containing the skill name, a description of what the skill does, and step-by-step instructions for performing it.
- Progressive Disclosure
- The loading strategy for procedural memory: the agent only holds a lightweight index (name + description, ~100 tokens per skill) in working memory at all times; full skill instructions are loaded only when a matching task is triggered; referenced files, templates, or scripts are pulled in only during execution.
- Distilled Experience
- The correct form of episodic memory in production systems — compressed, decision-relevant notes extracted from past sessions (e.g. 'last time we debugged the auth module, the issue was in the middleware layer') rather than full conversation transcripts.
- The Forgetting Problem
- The engineering challenge unique to episodic memory: deciding what to delete, when information becomes obsolete, and how to handle context shifts. Humans forget naturally and usefully; for agents, forgetting must be explicitly designed and is the hardest part of episodic memory to get right.
- Reflex Agent
- The simplest agent tier — deterministic, narrow-purpose agents like a thermostat or basic routing bot that typically require only working memory.
// FREQUENTLY ASKED QUESTIONS
What is the CoALA agent memory framework?
The CoALA (Cognitive Architectures for Language Agents) framework is a Princeton research-derived system that identifies four distinct memory types every AI agent may need: working memory (the context window), semantic memory (persistent knowledge and facts), procedural memory (executable skills and workflows), and episodic memory (distilled records of past experiences and decisions). Not every agent needs all four — you match the memory stack to the agent's complexity tier.
What are the four types of memory an AI agent needs?
The four types are: (1) Working memory — the context window, volatile and bounded, present in every agent. (2) Semantic memory — persistent facts, rules, conventions, and documentation loaded across sessions. (3) Procedural memory — a library of encoded skills with step-by-step instructions, managed via progressive disclosure. (4) Episodic memory — compressed, decision-relevant notes distilled from past sessions that enable the agent to learn and improve over time.
How do I decide which memory types my AI agent needs?
Classify your agent into one of three tiers. Tier A (reflex/simple agents like routing bots) needs only working memory. Tier B (narrow-purpose agents like onboarding bots) needs working memory plus procedural memory. Tier C (full autonomous agents like coding assistants) needs all four types. Start with the tier default, then adjust based on specific failure modes — for example, if a Tier B agent keeps forgetting project conventions, add semantic memory.
How do you implement semantic memory for an AI agent?
The simplest and most common production implementation is a well-structured Markdown file — like a CLAUDE.md or project knowledge file — loaded into the agent's context window at session start. This file contains architecture decisions, coding conventions, rules, anti-patterns, and persistent documentation. For larger knowledge bases, vector databases or knowledge graphs are valid alternatives, but a Markdown file is sufficient for many production agentic systems and far simpler to maintain.
How does CoALA agent memory compare to just using RAG?
RAG (Retrieval-Augmented Generation) is one implementation pattern that primarily addresses semantic memory — retrieving relevant facts from a knowledge base. CoALA is a broader architectural framework that covers four distinct memory types, including procedural memory (how the agent executes skills) and episodic memory (how it learns from past sessions). RAG doesn't address skill management, progressive disclosure, or experience distillation. Using CoALA, you might implement semantic memory via RAG but still need separate systems for procedural and episodic memory.
When should I add episodic memory to my AI agent?
Add episodic memory only when your agent needs to genuinely learn and improve across sessions — remembering past decisions, debugging discoveries, or user preferences that change over time. This is typically warranted only for Tier C full autonomous agents. If your agent handles stateless, repeatable tasks (password resets, ticket routing), episodic memory adds complexity with no benefit. When you do implement it, always distill sessions into compressed notes rather than storing raw transcripts.
What is progressive disclosure in agent memory design?
Progressive disclosure is the loading strategy for procedural memory that prevents context window overload. Instead of bulk-loading all skill instructions into working memory, the agent holds only a lightweight index — each skill's name and description at roughly 100 tokens each. Full skill instructions load only when a matching task is triggered, and referenced files, templates, or scripts are pulled in only during execution. This keeps working memory lean and prevents performance degradation.
What results can I expect from applying the CoALA memory framework?
Agents designed with the correct CoALA memory stack exhibit fewer repeated mistakes, maintain context reliably across sessions, follow project conventions consistently, and improve over time through distilled experience. You'll also see reduced costs and latency from not over-engineering simple agents. Common measurable improvements include fewer user corrections per session, consistent adherence to documented rules, and the agent referencing past debugging discoveries without being re-prompted.
What's the difference between semantic memory and procedural memory?
Semantic memory stores what the agent knows — persistent facts, rules, documentation, architecture decisions, and coding conventions. Procedural memory stores how the agent does things — step-by-step skill instructions encoded in skill.md files that the agent executes when a matching task appears. Confusing them is a common design mistake. A project's coding standards belong in semantic memory; the workflow for performing a code review belongs in procedural memory.
Why does my AI agent keep forgetting things between sessions?
Your agent likely relies only on working memory (the context window), which is volatile and disappears when the session ends. To retain information across sessions, you need semantic memory (for persistent facts and rules) or episodic memory (for past decisions and discoveries). The simplest fix is creating a Markdown knowledge file loaded at session start. For learning from past interactions, implement distilled episodic memory with explicit policies for what gets saved, updated, and deleted.
What is the forgetting problem in AI agent memory?
The forgetting problem is the engineering challenge of deciding what an agent should delete, when information becomes obsolete, and how to handle context shifts — like a user changing jobs or a project being deprecated. Humans forget naturally and usefully, but agents accumulate stale or contradictory memories unless forgetting is explicitly designed. This requires defining deletion triggers, expiry policies, and contradiction-resolution rules as part of the episodic memory implementation.