Christensen Durable Sessions AI UX Framework
Diagnose why your AI chat experience breaks under real-world conditions and rebuild it around Durable Sessions so it is resilient, multi-surface, and supports live agent control.
// WHEN TO USE
Use this skill whenever you are designing or auditing an AI chat or agent-driven product experience and need to evaluate whether your streaming architecture can handle disconnections, multi-device continuity, user-initiated control, or concurrent multi-agent activity.
// INPUTS REQUIRED
- Current streaming architecturerequired
How your app currently delivers agent responses to the client — e.g. SSE via Vercel AI SDK, direct WebSocket, polling, etc. - Target interaction patternsrequired
Which of the three capabilities you need: resilient delivery, cross-surface continuity, live agent control, or all three. - Agent topology
Whether you have a single agent, an orchestrator + sub-agents, or a full multi-agent architecture. - Client surfaces
Which surfaces users access the experience on — e.g. web tab, mobile, background notifications.
// PRINCIPLES
The Single-Connection Trap
The default direct HTTP streaming model couples the health of the response stream to the health of one client's connection. If that connection drops, the stream is gone. This fundamentally limits the quality and richness of AI product experiences.
Three Foundational Capabilities
The best AI products invest in exactly three capabilities that separate a fragile demo from a great AI product experience: Resilient Delivery (streams that survive disconnections), Continuity Across Surfaces (session follows the user across tabs and devices), and Live Control (clients can steer, interrupt, or communicate with an agent while it is working).
Durable Sessions
A Durable Session is a persistent, stateful, shared resource that sits between the agent layer and the client layer. Agents write events to the session; clients connect to the session. Neither party holds a private pipe to the other, which unlocks all three foundational capabilities simultaneously.
Agent-Client Decoupling
By decoupling the agent layer from the client layer, the agent stops managing connection health, replay logic, and client state. The agent only cares about writing events to the session; all delivery complexity is handled by the session layer.
SSE Resume-Cancel Conflict
SSE is strictly one-way. This creates a fundamental conflict: a client closing an SSE connection is ambiguous — it could mean 'I disconnected, please buffer and let me resume' or 'I pressed stop, please cancel.' Resume and cancel are mutually exclusive under SSE, making bidirectional transport (e.g. WebSockets) necessary for live control.
Pub/Sub as the Foundation
Durable Sessions map naturally onto a pub/sub channel model. Publishers (agents) and subscribers (clients) communicate through a shared, independently addressable, persistent, and fully resumable channel rather than directly with each other.
Orchestrator Dual-Purpose Problem
In naive multi-agent architectures, the orchestrator is forced to both coordinate subtasks and proxy granular progress updates back to the client. This adds unnecessary complexity. With Durable Sessions, every sub-agent writes directly to the session, eliminating the orchestrator's relay role.
// WORKFLOW
- 1
Audit your current streaming model against the Single-Connection Trap
Identify whether you are using SSE (e.g. Vercel AI SDK, LangChain streaming) or raw WebSockets. Map out: does a connection drop destroy the stream? Is the stream a private pipe? Can a second tab or device see the live response? If any answer is yes/no/no respectively, you are inside the Single-Connection Trap.
- 2
Score your product against the Three Foundational Capabilities
For each capability — Resilient Delivery, Continuity Across Surfaces, Live Control — determine whether your current architecture supports it, partially supports it, or breaks. This produces a gap map that prioritises your redesign effort.
- 3
Identify which failure modes apply to your architecture
There are four canonical failure modes to check: (1) Stream lost on disconnect — no resume logic, (2) Resume-Cancel SSE conflict — stop button ambiguity, (3) Second tab/device blindness — no visibility of live response, (4) Orchestrator relay bottleneck — sub-agent updates forced through a central agent. Note which apply.
- 4
Design a Durable Sessions layer between your agent layer and client layer
The Durable Session must be: independently addressable (any client or agent reaches it by name/ID), persistent (messages outlive any individual connection), and fully resumable (clients reconnecting receive exactly the events they missed, in order). A pub/sub channel model is the natural implementation substrate.
- 5
Redirect agent output to write to the Durable Session, not to the client connection
Agents should publish events — LLM token chunks, tool call results, status updates — directly into the session channel. Agents must never hold a reference to a specific client connection. This is the core architectural inversion.
- 6
Redirect clients to subscribe to the Durable Session, not to a per-request connection
Clients maintain a persistent connection to the session channel that is always active — not just alive for the duration of a single request. Clients receive all events from all agents writing to that session. Resumability is handled at the session layer, not the agent layer.
- 7
Replace SSE with a bidirectional transport if Live Control is required
If you need a stop button, steering messages, or follow-up prompts mid-generation, SSE cannot serve as your transport. Switch to WebSockets or an equivalent bidirectional channel so clients have an upstream channel to the agent. Resolve the Resume-Cancel Conflict by using explicit cancel signals rather than connection closure.
- 8
Flatten multi-agent architectures by having all agents write directly to the session
Remove any orchestrator relay logic for progress updates. Each sub-agent publishes granular updates independently to the Durable Session. Clients subscribe once to the session and receive full visibility of all concurrent agent activity. The orchestrator only cares about final results from sub-agents.
- 9
Validate the three foundational capabilities against your redesigned architecture
Test: (1) Drop the client connection mid-stream — does the client reconnect and resume exactly from where it left off without agent-side logic? (2) Open the session on a second tab or device — does it see the live response immediately? (3) Send a cancel or steering message from a different tab — does the agent receive it? All three must pass.
- 10
Layer on additional AI UX capabilities now that the session substrate is in place
With Durable Sessions established, further capabilities become straightforward to add: push notifications for async background agent work, shared subscribable data objects for collaborative human-AI sessions, and human-agent handoff (adding a human participant into an existing session with full interaction history visible).
// EXAMPLES
A SaaS product has an AI assistant that streams responses via SSE. Users on mobile frequently lose their response when switching networks. The team has added Redis buffering but the agent code is increasingly complex.
This is a classic Single-Connection Trap with Resilient Delivery failure. The fix is to introduce a Durable Sessions layer: the agent writes token chunks to a persistent channel with sequence numbers managed by the session substrate, not the agent. The agent code strips all replay and reconnect logic. Mobile clients reconnect to the channel and resume automatically. The agent's complexity drops; resilience improves.
A coding assistant product wants to let users send steering messages (e.g. 'actually use TypeScript') while the agent is mid-generation, and also wants a stop button.
This hits the SSE Resume-Cancel Conflict directly. The team must replace SSE with a bidirectional transport. With a Durable Sessions layer on WebSockets, the stop button sends an explicit cancel signal upstream through the session channel rather than closing the connection. The agent receives a clear cancel intent, stops generation, and does not buffer further tokens. Resume and cancel are no longer ambiguous.
A research automation product uses an orchestrator agent that delegates to five specialist sub-agents and wants to show users a live progress feed of all sub-agent activity.
This is the Orchestrator Dual-Purpose Problem. The current architecture forces the orchestrator to relay all sub-agent updates, creating a bottleneck and complex proxying code. With Durable Sessions, each sub-agent writes its granular progress directly to the shared session channel. The orchestrator is freed to focus only on delegation and final result aggregation. The client subscribes to one session and sees all five sub-agents' activity with full multiplexing, with zero additional coordination code.
A customer support product wants to allow seamless escalation from an AI agent to a human support representative, with the human having full context of the prior AI conversation.
Because the entire interaction is already materialised in the Durable Session channel, adding a human agent is simply adding another participant who subscribes to the same session. The human support representative connects to the session and has full visibility of all prior AI agent and customer activity. They can then send messages into the same session channel, which the customer receives in their existing interface. No context transfer, no transcript export — the session is the shared medium.
// PITFALLS
- Building resume logic inside the agent itself — this creates per-client replay complexity that scales poorly and couples agent code to connection management concerns that belong in the session layer.
- Using SSE and relying on connection closure as a cancel signal — this creates an irresolvable ambiguity between user-initiated cancel and network disconnect, making resume and cancel mutually exclusive.
- Treating the orchestrator as a relay for sub-agent progress updates — this adds a dual-purpose role to the orchestrator (coordination + proxying) that dramatically increases architectural complexity without benefit.
- Establishing client connections only on request initiation rather than maintaining persistent session subscriptions — this means any client not present at the moment a request was made has zero visibility of live activity.
- Assuming a bidirectional transport (WebSockets) alone solves multi-device and multi-surface problems — bidirectionality is necessary for live control but does not solve the shared-visibility problem; a Durable Sessions layer is still required.
- Focusing engineering effort on model quality and agent logic while neglecting the delivery and connectivity layer — the gap between a fragile demo and a great AI product experience is almost entirely in the infrastructure, not the model.
// GLOSSARY
- Durable Sessions
- A persistent, stateful, shared resource that sits between the agent layer and the client layer. Agents write events to it; clients subscribe to it. Neither holds a direct connection to the other. Messages outlive any individual connection, device, or agent instance.
- Three Foundational Capabilities
- The three capabilities that separate a fragile demo from a great AI product experience: Resilient Delivery, Continuity Across Surfaces, and Live Control.
- Resilient Delivery
- The ability of a stream to survive client disconnections, allowing clients to reconnect and pick up exactly from where they left off without data loss or agent-side replay logic.
- Continuity Across Surfaces
- The property whereby a conversation session follows the user across tabs and devices, keeping all clients fully in sync including any live in-progress activity.
- Live Control
- The ability for clients to communicate with an agent while it is actively working — sending steering messages, follow-up prompts, or cancellation signals — rather than being limited to sequential request-response.
- Single-Connection Trap
- The failure mode of direct HTTP streaming where stream health is coupled to a single client's connection health, preventing resilience, multi-surface continuity, and live control.
- SSE Resume-Cancel Conflict
- The fundamental ambiguity in SSE-based architectures where closing a connection cannot be distinguished between a network disconnect (requiring resume) and a user cancel (requiring termination), making the two behaviours mutually exclusive.
- Orchestrator Dual-Purpose Problem
- The architectural antipattern where an orchestrator agent is forced to both coordinate sub-agent tasks and proxy granular sub-agent progress updates back to clients, unnecessarily coupling orchestration logic to delivery concerns.
- Direct HTTP Streaming
- The default pattern where a client establishes a persistent point-to-point connection directly to an agent, which pipes LLM-generated events back over that connection via SSE or similar. The foundational approach that creates the Single-Connection Trap.
- Agent-Client Decoupling
- The architectural principle of removing any direct connection between agents and clients, routing all communication through a shared Durable Sessions layer so neither party is aware of the other's connection state.
- Fragile Demo
- An AI product experience that works in controlled conditions but breaks under real-world constraints such as network drops, multi-device usage, or concurrent agent activity — contrasted with a great AI product experience.
Turn Any YouTube Video Into An AI Skill
SkillForge captures a creator's exact methodology from their video and turns it into a reusable AI skill you can invoke in Claude, ChatGPT, or any LLM.
Forge your own skill