Christensen Durable Sessions AI UX Framework

Diagnose why your AI chat experience breaks under real-world conditions and rebuild it around Durable Sessions so it is resilient, multi-surface, and supports live agent control.

// WHEN TO USE

Use this skill whenever you are designing or auditing an AI chat or agent-driven product experience and need to evaluate whether your streaming architecture can handle disconnections, multi-device continuity, user-initiated control, or concurrent multi-agent activity.

// INPUTS REQUIRED

  • Current streaming architecturerequired
    How your app currently delivers agent responses to the client — e.g. SSE via Vercel AI SDK, direct WebSocket, polling, etc.
  • Target interaction patternsrequired
    Which of the three capabilities you need: resilient delivery, cross-surface continuity, live agent control, or all three.
  • Agent topology
    Whether you have a single agent, an orchestrator + sub-agents, or a full multi-agent architecture.
  • Client surfaces
    Which surfaces users access the experience on — e.g. web tab, mobile, background notifications.

// PRINCIPLES

The Single-Connection Trap

The default direct HTTP streaming model couples the health of the response stream to the health of one client's connection. If that connection drops, the stream is gone. This fundamentally limits the quality and richness of AI product experiences.

Three Foundational Capabilities

The best AI products invest in exactly three capabilities that separate a fragile demo from a great AI product experience: Resilient Delivery (streams that survive disconnections), Continuity Across Surfaces (session follows the user across tabs and devices), and Live Control (clients can steer, interrupt, or communicate with an agent while it is working).

Durable Sessions

A Durable Session is a persistent, stateful, shared resource that sits between the agent layer and the client layer. Agents write events to the session; clients connect to the session. Neither party holds a private pipe to the other, which unlocks all three foundational capabilities simultaneously.

Agent-Client Decoupling

By decoupling the agent layer from the client layer, the agent stops managing connection health, replay logic, and client state. The agent only cares about writing events to the session; all delivery complexity is handled by the session layer.

SSE Resume-Cancel Conflict

SSE is strictly one-way. This creates a fundamental conflict: a client closing an SSE connection is ambiguous — it could mean 'I disconnected, please buffer and let me resume' or 'I pressed stop, please cancel.' Resume and cancel are mutually exclusive under SSE, making bidirectional transport (e.g. WebSockets) necessary for live control.

Pub/Sub as the Foundation

Durable Sessions map naturally onto a pub/sub channel model. Publishers (agents) and subscribers (clients) communicate through a shared, independently addressable, persistent, and fully resumable channel rather than directly with each other.

Orchestrator Dual-Purpose Problem

In naive multi-agent architectures, the orchestrator is forced to both coordinate subtasks and proxy granular progress updates back to the client. This adds unnecessary complexity. With Durable Sessions, every sub-agent writes directly to the session, eliminating the orchestrator's relay role.

// WORKFLOW

  1. 1

    Audit your current streaming model against the Single-Connection Trap

    Identify whether you are using SSE (e.g. Vercel AI SDK, LangChain streaming) or raw WebSockets. Map out: does a connection drop destroy the stream? Is the stream a private pipe? Can a second tab or device see the live response? If any answer is yes/no/no respectively, you are inside the Single-Connection Trap.

  2. 2

    Score your product against the Three Foundational Capabilities

    For each capability — Resilient Delivery, Continuity Across Surfaces, Live Control — determine whether your current architecture supports it, partially supports it, or breaks. This produces a gap map that prioritises your redesign effort.

  3. 3

    Identify which failure modes apply to your architecture

    There are four canonical failure modes to check: (1) Stream lost on disconnect — no resume logic, (2) Resume-Cancel SSE conflict — stop button ambiguity, (3) Second tab/device blindness — no visibility of live response, (4) Orchestrator relay bottleneck — sub-agent updates forced through a central agent. Note which apply.

  4. 4

    Design a Durable Sessions layer between your agent layer and client layer

    The Durable Session must be: independently addressable (any client or agent reaches it by name/ID), persistent (messages outlive any individual connection), and fully resumable (clients reconnecting receive exactly the events they missed, in order). A pub/sub channel model is the natural implementation substrate.

  5. 5

    Redirect agent output to write to the Durable Session, not to the client connection

    Agents should publish events — LLM token chunks, tool call results, status updates — directly into the session channel. Agents must never hold a reference to a specific client connection. This is the core architectural inversion.

  6. 6

    Redirect clients to subscribe to the Durable Session, not to a per-request connection

    Clients maintain a persistent connection to the session channel that is always active — not just alive for the duration of a single request. Clients receive all events from all agents writing to that session. Resumability is handled at the session layer, not the agent layer.

  7. 7

    Replace SSE with a bidirectional transport if Live Control is required

    If you need a stop button, steering messages, or follow-up prompts mid-generation, SSE cannot serve as your transport. Switch to WebSockets or an equivalent bidirectional channel so clients have an upstream channel to the agent. Resolve the Resume-Cancel Conflict by using explicit cancel signals rather than connection closure.

  8. 8

    Flatten multi-agent architectures by having all agents write directly to the session

    Remove any orchestrator relay logic for progress updates. Each sub-agent publishes granular updates independently to the Durable Session. Clients subscribe once to the session and receive full visibility of all concurrent agent activity. The orchestrator only cares about final results from sub-agents.

  9. 9

    Validate the three foundational capabilities against your redesigned architecture

    Test: (1) Drop the client connection mid-stream — does the client reconnect and resume exactly from where it left off without agent-side logic? (2) Open the session on a second tab or device — does it see the live response immediately? (3) Send a cancel or steering message from a different tab — does the agent receive it? All three must pass.

  10. 10

    Layer on additional AI UX capabilities now that the session substrate is in place

    With Durable Sessions established, further capabilities become straightforward to add: push notifications for async background agent work, shared subscribable data objects for collaborative human-AI sessions, and human-agent handoff (adding a human participant into an existing session with full interaction history visible).

// EXAMPLES

A SaaS product has an AI assistant that streams responses via SSE. Users on mobile frequently lose their response when switching networks. The team has added Redis buffering but the agent code is increasingly complex.

This is a classic Single-Connection Trap with Resilient Delivery failure. The fix is to introduce a Durable Sessions layer: the agent writes token chunks to a persistent channel with sequence numbers managed by the session substrate, not the agent. The agent code strips all replay and reconnect logic. Mobile clients reconnect to the channel and resume automatically. The agent's complexity drops; resilience improves.

A coding assistant product wants to let users send steering messages (e.g. 'actually use TypeScript') while the agent is mid-generation, and also wants a stop button.

This hits the SSE Resume-Cancel Conflict directly. The team must replace SSE with a bidirectional transport. With a Durable Sessions layer on WebSockets, the stop button sends an explicit cancel signal upstream through the session channel rather than closing the connection. The agent receives a clear cancel intent, stops generation, and does not buffer further tokens. Resume and cancel are no longer ambiguous.

A research automation product uses an orchestrator agent that delegates to five specialist sub-agents and wants to show users a live progress feed of all sub-agent activity.

This is the Orchestrator Dual-Purpose Problem. The current architecture forces the orchestrator to relay all sub-agent updates, creating a bottleneck and complex proxying code. With Durable Sessions, each sub-agent writes its granular progress directly to the shared session channel. The orchestrator is freed to focus only on delegation and final result aggregation. The client subscribes to one session and sees all five sub-agents' activity with full multiplexing, with zero additional coordination code.

A customer support product wants to allow seamless escalation from an AI agent to a human support representative, with the human having full context of the prior AI conversation.

Because the entire interaction is already materialised in the Durable Session channel, adding a human agent is simply adding another participant who subscribes to the same session. The human support representative connects to the session and has full visibility of all prior AI agent and customer activity. They can then send messages into the same session channel, which the customer receives in their existing interface. No context transfer, no transcript export — the session is the shared medium.

// PITFALLS

  • Building resume logic inside the agent itself — this creates per-client replay complexity that scales poorly and couples agent code to connection management concerns that belong in the session layer.
  • Using SSE and relying on connection closure as a cancel signal — this creates an irresolvable ambiguity between user-initiated cancel and network disconnect, making resume and cancel mutually exclusive.
  • Treating the orchestrator as a relay for sub-agent progress updates — this adds a dual-purpose role to the orchestrator (coordination + proxying) that dramatically increases architectural complexity without benefit.
  • Establishing client connections only on request initiation rather than maintaining persistent session subscriptions — this means any client not present at the moment a request was made has zero visibility of live activity.
  • Assuming a bidirectional transport (WebSockets) alone solves multi-device and multi-surface problems — bidirectionality is necessary for live control but does not solve the shared-visibility problem; a Durable Sessions layer is still required.
  • Focusing engineering effort on model quality and agent logic while neglecting the delivery and connectivity layer — the gap between a fragile demo and a great AI product experience is almost entirely in the infrastructure, not the model.

// GLOSSARY

Durable Sessions
A persistent, stateful, shared resource that sits between the agent layer and the client layer. Agents write events to it; clients subscribe to it. Neither holds a direct connection to the other. Messages outlive any individual connection, device, or agent instance.
Three Foundational Capabilities
The three capabilities that separate a fragile demo from a great AI product experience: Resilient Delivery, Continuity Across Surfaces, and Live Control.
Resilient Delivery
The ability of a stream to survive client disconnections, allowing clients to reconnect and pick up exactly from where they left off without data loss or agent-side replay logic.
Continuity Across Surfaces
The property whereby a conversation session follows the user across tabs and devices, keeping all clients fully in sync including any live in-progress activity.
Live Control
The ability for clients to communicate with an agent while it is actively working — sending steering messages, follow-up prompts, or cancellation signals — rather than being limited to sequential request-response.
Single-Connection Trap
The failure mode of direct HTTP streaming where stream health is coupled to a single client's connection health, preventing resilience, multi-surface continuity, and live control.
SSE Resume-Cancel Conflict
The fundamental ambiguity in SSE-based architectures where closing a connection cannot be distinguished between a network disconnect (requiring resume) and a user cancel (requiring termination), making the two behaviours mutually exclusive.
Orchestrator Dual-Purpose Problem
The architectural antipattern where an orchestrator agent is forced to both coordinate sub-agent tasks and proxy granular sub-agent progress updates back to clients, unnecessarily coupling orchestration logic to delivery concerns.
Direct HTTP Streaming
The default pattern where a client establishes a persistent point-to-point connection directly to an agent, which pipes LLM-generated events back over that connection via SSE or similar. The foundational approach that creates the Single-Connection Trap.
Agent-Client Decoupling
The architectural principle of removing any direct connection between agents and clients, routing all communication through a shared Durable Sessions layer so neither party is aware of the other's connection state.
Fragile Demo
An AI product experience that works in controlled conditions but breaks under real-world constraints such as network drops, multi-device usage, or concurrent agent activity — contrasted with a great AI product experience.
// GET STARTED

Turn Any YouTube Video Into An AI Skill

SkillForge captures a creator's exact methodology from their video and turns it into a reusable AI skill you can invoke in Claude, ChatGPT, or any LLM.

Forge your own skill