Frequently Asked Questions About IBM CoALA Four-Type Agent Memory Framework

25 answers covering everything from basics to advanced usage.

// Basics

What is working memory in an AI agent?

Working memory is the agent's context window — everything the model can see right now, including the current conversation, system instructions, and any loaded files or data. It's analogous to RAM: fast, immediately accessible, but volatile (it disappears when the session ends) and bounded in size. Even million-token context windows degrade when overloaded, especially when relevant information gets buried in the middle of the context.

What is semantic memory in an AI agent?

Semantic memory is the agent's persistent knowledge base — facts, rules, conventions, architecture documentation, and domain knowledge that must remain consistent across every session. In production, it's most commonly implemented as a structured Markdown file (like CLAUDE.md) loaded into the context window at session start. Vector databases and knowledge graphs are more complex alternatives for larger-scale knowledge requirements.

What is episodic memory in an AI agent?

Episodic memory is the agent's record of what happened in past interactions — past decisions, debugging discoveries, and lessons learned. Critically, production episodic memory should consist of distilled, compressed notes (e.g., 'the auth issue traced to middleware') rather than raw conversation transcripts. Episodic memory is what makes an agent genuinely learn over time and is the hardest memory type to implement correctly because of the forgetting problem.

What is a skill.md file and how do I create one?

A skill.md file is the standard file format for encoding agent skills in procedural memory. It's a Markdown file stored in its own folder containing the skill name, a description of what the skill does, and step-by-step instructions for executing it. To create one, define the skill's name, write a concise description (for the lightweight index), then document each step the agent should follow. Keep instructions specific and actionable — the agent loads the full file only when the matching task is triggered.

How do I know if my agent is Tier A, B, or C?

Tier A (reflex/simple): the agent performs one deterministic task with no learning — like routing tickets or triggering automations. Tier B (narrow-purpose): the agent handles structured tasks in a single domain but doesn't need to learn across sessions — like running an onboarding checklist. Tier C (full autonomous): the agent works across multiple tasks, maintains project awareness over time, and must improve from experience — like a coding assistant or research agent. When in doubt, describe what happens if the agent forgets everything between sessions — if that's acceptable, it's Tier A or B.

// How To

How do I implement progressive disclosure for my agent's procedural memory?

Structure your skill library so each skill.md file has a name and short description extractable as an index entry (~100 tokens per skill). At session start, load only this lightweight index into working memory. When the agent identifies a task matching a skill, dynamically load the full skill.md instructions into the context. During skill execution, pull in referenced templates, scripts, or files as needed. After execution, unload the full instructions to free working memory space.

How do I build a distilled episodic memory system for my agent?

After each session, run a distillation step that extracts decision-relevant notes — key discoveries, resolved bugs, important user preferences, or changed requirements. Store these as structured entries with timestamps and context tags. Define explicit save criteria (what's worth recording), update rules (when to overwrite older entries), and deletion triggers (when context shifts make entries obsolete). Avoid storing raw transcripts; aim for entries like 'auth module failures trace to middleware config' rather than full conversation logs.

How do I set up a CLAUDE.md-style semantic memory file?

Create a Markdown file in your project root containing all persistent knowledge the agent needs: project architecture overview, coding conventions, naming standards, build and deploy commands, known anti-patterns, key dependencies, and any domain-specific rules. Structure it with clear headers so the agent can reference specific sections. Load this file into the agent's context window at the start of every session. Update it whenever conventions change. Keep it concise — aim for the minimum knowledge needed for consistent behavior.

How do I audit my existing agent's memory architecture using CoALA?

Document what memory mechanisms your agent currently uses: context window contents, any persisted files, databases, or retrieval systems. Classify your agent into Tier A, B, or C based on its purpose and complexity. Compare the current setup to the tier's recommended memory stack. Identify gaps (missing memory types explaining known failures) and overloads (bulk-loaded skills in context, raw transcript storage). Produce specific remediation steps — like adding a semantic memory file or implementing progressive disclosure for skills.

How often should I update semantic memory?

Update semantic memory whenever the underlying facts change — new conventions are adopted, architecture is refactored, dependencies are updated, or rules are modified. Treat the semantic memory file like living documentation: review it at regular intervals (e.g., per sprint or monthly) and update it as part of your development workflow. Stale semantic memory is nearly as harmful as no semantic memory, because the agent will confidently apply outdated rules. Assign ownership of the file to prevent it from drifting.

// Troubleshooting

My agent keeps repeating the same mistakes — which memory type is missing?

If the mistakes involve violating project conventions or ignoring documented rules, you're missing semantic memory — add a persistent knowledge file loaded at session start. If the agent made and resolved the same error in a previous session but doesn't remember the fix, you're missing episodic memory — implement distilled experience notes that capture past debugging discoveries. If the agent executes tasks inconsistently, you may be missing procedural memory — encode the correct workflow as a skill.md file.

My agent loses context in the middle of long conversations — what's wrong?

Your working memory is overloaded. Even large context windows (1M+ tokens) degrade when too much information is loaded, especially when critical details get buried in the middle — a well-documented phenomenon called 'lost in the middle.' Remediate by applying progressive disclosure (don't bulk-load all skills), moving persistent knowledge to a semantic memory file loaded at the top of context, and summarizing or truncating older conversation turns. Keep working memory lean and well-organized.

My agent's episodic memory keeps growing and its performance is getting worse — how do I fix it?

You've hit the forgetting problem. Without explicit deletion and expiry policies, episodic memory accumulates stale, contradictory, or irrelevant entries that confuse the agent. Fix it by defining: (1) expiry triggers — memories older than X days without access get archived, (2) contradiction resolution — newer entries override older ones on the same topic, (3) context shift handling — when a project is deprecated or a user changes roles, related memories are flagged for review or deletion. Treat forgetting as a core engineering requirement.

What happens if I skip semantic memory for my agent?

Without semantic memory, your agent has no persistent knowledge base. It will not remember project conventions, architecture decisions, or documented rules between sessions. Every new session starts from zero understanding of the project or domain. The most visible symptom is the agent repeatedly violating rules it was previously told about — because those rules existed only in a past session's working memory, which is now gone. Adding even a simple Markdown knowledge file eliminates this class of failures entirely.

// Comparisons

How does CoALA compare to LangChain memory or LlamaIndex for agent memory?

CoALA is an architectural framework that tells you which memory types your agent needs and why — it's implementation-agnostic. LangChain and LlamaIndex are implementation libraries that provide specific tools for building memory (conversation buffers, vector stores, retrievers). You use CoALA to decide your memory architecture, then you might use LangChain or LlamaIndex to build it. CoALA prevents the common mistake of jumping to implementation before understanding what kinds of memory the agent actually requires.

How does the CoALA framework compare to just using a large context window for everything?

Relying solely on a large context window means relying only on working memory. This fails in three ways: (1) context is volatile — everything disappears when the session ends, (2) performance degrades when the window is overloaded, even at 1M+ tokens, and (3) information buried mid-context gets lost. CoALA adds persistent layers — semantic, procedural, and episodic memory — that survive across sessions and are loaded strategically rather than all at once. A large context window is necessary but never sufficient for complex agents.

Is the CoALA framework only for LLM-based agents or does it work for other AI systems?

CoALA was specifically designed for language agents — AI systems built on large language models. The four memory types are modeled on cognitive science principles but are mapped to LLM-specific implementations: context windows as working memory, Markdown files or vector databases as semantic memory, skill.md files as procedural memory, and distilled session notes as episodic memory. The conceptual framework could inform other AI architectures, but the specific implementation guidance targets LLM-based agent systems.

Do simple chatbots need the CoALA memory framework?

Basic chatbots that handle single-turn or short multi-turn interactions with no cross-session persistence typically need only working memory — the context window. CoALA classifies these as Tier A (reflex/simple) agents. Applying the full four-type framework would be over-engineering. However, if your 'simple chatbot' needs to remember user preferences across sessions, follow documented guidelines, or improve over time, it's actually a more complex agent and CoALA helps you identify which memory types to add.

What's the difference between the CoALA framework and a basic memory buffer in LangChain?

A LangChain memory buffer is a specific implementation of working memory — it stores recent conversation history in the context window. CoALA is a complete architectural framework that identifies four distinct memory types and helps you decide which ones your agent needs. A memory buffer addresses only one of the four types. CoALA would tell you that you also need semantic memory for persistent rules, procedural memory for structured skills, and potentially episodic memory for cross-session learning — none of which a basic buffer provides.

// Advanced

Can I use a vector database instead of a Markdown file for semantic memory?

Yes. Vector databases are a valid implementation of semantic memory, especially for large-scale knowledge bases where you need retrieval over thousands of documents. However, for many production agentic systems — particularly coding agents and project assistants — a well-structured Markdown file loaded at session start is sufficient and dramatically simpler to build and maintain. Start with a Markdown file; upgrade to a vector database only when the knowledge base exceeds what fits cleanly in working memory.

How do you handle conflicting information between semantic memory and episodic memory?

Design an explicit precedence hierarchy. Typically, episodic memory (recent discoveries) should override semantic memory (baseline knowledge) for the specific context where the discovery applies, while semantic memory remains the default for general cases. For example, if semantic memory says 'use REST for all APIs' but episodic memory records 'the payments team requires GraphQL,' the episodic override applies only for payments-related tasks. Document these precedence rules in the agent's system instructions.

How many skills can an agent handle with progressive disclosure?

With progressive disclosure, each skill occupies only ~100 tokens in the working memory index (name + description). A typical 128K-token context window could theoretically hold an index of hundreds of skills while leaving ample room for the active conversation and loaded instructions. Practical limits depend on how well the agent can match tasks to skills from the index alone. For agents with 50+ skills, consider categorizing skills into domains and loading only relevant domain indexes based on the conversation context.

Should episodic memory be stored in the same system as semantic memory?

They should be stored and managed separately because they serve different purposes and have different lifecycle requirements. Semantic memory is relatively stable — it's updated deliberately when rules or facts change. Episodic memory is dynamic — it grows with each session and requires active deletion and expiry management. Mixing them in one system makes it hard to apply different update and deletion policies. In practice, semantic memory might live in a Markdown file while episodic memory lives in a structured log or lightweight database.

What's the minimum viable memory setup for a coding agent?

A coding agent working on a persistent project is a Tier C (full autonomous) agent and needs all four memory types at minimum: working memory (context window with current code and conversation), semantic memory (a project knowledge file with architecture, conventions, and build commands), procedural memory (skills like code review, PR creation, and test generation with progressive disclosure), and episodic memory (distilled notes from past debugging sessions). Skipping any of these will result in the agent losing context, ignoring conventions, or repeating resolved mistakes.

Can I use the CoALA framework with multi-agent systems?

Yes. In multi-agent systems, apply CoALA to each agent individually — each agent gets its own memory stack classification based on its specific role and complexity tier. Additionally, you may need a shared semantic memory layer (common knowledge accessible to all agents) and shared episodic memory (collective records of system-wide decisions). The framework scales naturally: a simple router agent in the system might be Tier A while the orchestrator agent is Tier C, each with appropriately different memory architectures.