Durable Sessions vs Agent-Ready Engineering: Which Framework?
// TL;DR
Use the Schmid Agent-Ready Engineering Framework if you are building or fixing AI agents that behave unreliably — it addresses the core engineering mindset shifts needed for non-deterministic systems. Use the Christensen Durable Sessions Framework if your agent already works but the delivery experience breaks under real-world conditions like disconnections, multi-device usage, or multi-agent streaming. Most teams hit agent reliability problems before delivery problems, so start with Schmid, then layer on Christensen.
// HOW DO THEY COMPARE?
| Dimension | Christensen Durable Sessions AI UX Framework | Schmid Agent-Ready Engineering Framework |
|---|---|---|
| Best for | Fixing broken AI chat/streaming UX — disconnections, multi-device, live control | Fixing unreliable AI agents — flaky outputs, rigid workflows, poor tool design |
| Primary problem addressed | Infrastructure and delivery layer between agent and client | Agent logic, prompt design, error handling, and testing strategy |
| Complexity | High — requires rearchitecting streaming infrastructure, introducing pub/sub layer, replacing SSE with WebSockets | Medium — requires mindset shifts and refactoring existing agent code, prompts, and tests |
| Time to apply | Days to weeks — infrastructure changes, transport layer swap, session layer deployment | Hours to days — prompt rewrites, tool doc improvements, eval setup, error handling changes |
| Prerequisites | A working agent that produces correct outputs; problems are in delivery, not logic | An agent that exists but is unreliable, over-controlled, or poorly tested |
| Output type | Architectural redesign: Durable Sessions layer, pub/sub channels, bidirectional transport | Refactored agent: goal-based prompts, agent-ready tools, eval suite, error-as-input handling |
| Creator background | Mike Christensen, Ably — real-time infrastructure and streaming delivery specialist | Philipp Schmid, Google DeepMind — ML engineering and LLM agent development |
| Multi-agent support | Excellent — directly solves orchestrator relay bottleneck with shared session channels | Indirect — improves individual agent quality but does not address inter-agent delivery |
| Testing philosophy | Validation tests: disconnect/reconnect, multi-device sync, cancel signal delivery | Probabilistic evals: reliability thresholds, LLM-as-a-judge, observe-adjust loops |
| Model dependency | Model-agnostic — operates entirely at the infrastructure layer | Model-aware — principles adapt as models improve; explicitly embraces 'build to delete' |
What does the Christensen Durable Sessions AI UX Framework do?
The Christensen Durable Sessions Framework diagnoses why AI chat experiences break under real-world conditions — network drops, device switching, multi-agent streaming — and provides an architectural solution. The core insight is that most AI products use direct HTTP streaming (typically SSE), which couples the response stream to a single client connection. When that connection drops, the stream is lost.
The framework introduces Durable Sessions: a persistent, shared layer between agents and clients. Agents write events to the session; clients subscribe to it. Neither holds a direct pipe to the other. This unlocks three foundational capabilities: Resilient Delivery (streams survive disconnections), Continuity Across Surfaces (sessions follow users across tabs and devices), and Live Control (clients can steer or cancel agents mid-generation).
The framework also identifies the SSE Resume-Cancel Conflict — closing an SSE connection is ambiguous between a disconnect and a user cancel — and argues for bidirectional transport like WebSockets. For multi-agent systems, it solves the Orchestrator Dual-Purpose Problem by letting each sub-agent write directly to the session channel.
What does the Schmid Agent-Ready Engineering Framework do?
The Schmid Agent-Ready Engineering Framework identifies five specific mindset gaps that cause experienced software engineers to build unreliable AI agents, then provides a systematic process for fixing each one.
The five principles are: Text Is Our New State (replace Boolean flags with semantic context), Hand Over Control (define goals, not step-by-step workflows), Errors Are Just Inputs (feed failures back to the model instead of restarting), Move From Unit Tests to Evals (measure reliability across runs, not exact outputs), and Agents Evolve and APIs Don't (make every tool self-documenting for zero-context callers).
This framework is particularly powerful for senior engineers who default to traditional deterministic patterns. It reframes the engineer's role from traffic controller to dispatcher — you specify the destination and available options, then trust the LLM to navigate.
How do they compare?
These frameworks operate at fundamentally different layers of the AI product stack and are complementary, not competing.
The Christensen framework operates at the infrastructure and delivery layer. It assumes your agent produces good outputs but the experience of receiving those outputs is fragile. It is an architecture-heavy solution requiring pub/sub infrastructure, session management, and transport protocol changes.
The Schmid framework operates at the agent logic and development process layer. It assumes your agent itself is the problem — producing unreliable results, fighting the model with rigid workflows, or using poorly documented tools. It is a mindset-plus-refactoring solution that changes how you design prompts, handle errors, write tools, and test.
Where they overlap is in the shared belief that demo-quality is not production-quality. Christensen calls the gap a "fragile demo"; Schmid calls it the reliability threshold that must be met before shipping. Both reject the idea that model improvements alone will solve product problems.
Key differences:
- Christensen is model-agnostic — it does not care what LLM you use. Schmid is model-aware and explicitly plans for model replacement with the "build to delete" principle.
- Schmid is faster to apply — rewriting prompts, tool docs, and tests can happen in hours. Christensen requires infrastructure deployment that takes days or weeks.
- Christensen is clearly better for multi-agent delivery — it directly solves the orchestrator relay bottleneck. Schmid improves individual agents but does not address how their outputs reach the client.
- Schmid is clearly better for agent reliability — if your agent makes wrong decisions, no amount of Durable Sessions will fix it.
Which should you choose?
Start with the Schmid Agent-Ready Engineering Framework if your agent itself is unreliable. If outputs are wrong, workflows are brittle, tools are poorly documented, or your test suite asserts exact outputs and constantly fails — these are agent-layer problems. Schmid gives you the diagnostic and fix for each one.
Move to the Christensen Durable Sessions Framework once your agent works correctly but the user experience breaks. If users lose responses on mobile, cannot continue conversations across devices, cannot press stop without losing their place, or your multi-agent orchestrator is drowning in relay logic — these are delivery-layer problems. Christensen gives you the architectural pattern.
If you are building a new AI product from scratch, apply Schmid's principles during agent development and Christensen's architecture during infrastructure design. They compose naturally: Schmid ensures the agent produces reliable outputs; Christensen ensures those outputs reach every client, on every device, without loss.
The one exception: if you are specifically building a multi-agent system where sub-agent visibility and real-time progress streaming are core UX requirements, prioritize the Christensen framework early — the architectural decisions it addresses are hard to retrofit.
Can you use both frameworks together?
Yes, and you should. The Schmid framework ensures your agents are reliable, goal-oriented, and properly tested. The Christensen framework ensures those reliable agents deliver their outputs through a resilient, multi-surface, controllable infrastructure. Applying Schmid without Christensen gives you a great agent with a fragile delivery pipe. Applying Christensen without Schmid gives you a beautifully resilient delivery pipe for unreliable outputs. The combination is what separates production-grade AI products from demos.
// FREQUENTLY ASKED QUESTIONS
Do I need both the Durable Sessions framework and the Agent-Ready Engineering framework?
Most production AI products eventually need both. They solve problems at different layers — Schmid fixes agent reliability (logic, prompts, tools, testing) while Christensen fixes delivery reliability (disconnections, multi-device, streaming). Start with whichever layer is currently broken. If your agent produces bad outputs, start with Schmid. If outputs are good but users lose them, start with Christensen.
Which framework should I use if my AI agent is flaky and unreliable?
Use the Schmid Agent-Ready Engineering Framework. It directly addresses why agents produce unreliable results: rigid step-by-step workflows, poor error handling, exact-output unit tests, and under-documented tools. It gives you five specific principles and a step-by-step audit process to diagnose and fix each reliability gap.
How do I fix my AI chat app losing responses when users switch networks?
Use the Christensen Durable Sessions Framework. This is the classic Single-Connection Trap — your streaming architecture couples stream health to one connection. The fix is introducing a Durable Sessions layer where the agent writes to a persistent channel and the client subscribes. On reconnect, the client resumes from exactly where it left off with no agent-side logic needed.
What is the difference between Durable Sessions and regular WebSocket connections?
WebSockets provide bidirectional transport but do not solve persistence or multi-device visibility. If a WebSocket drops, the stream is still lost. Durable Sessions add a persistent, shared layer on top of any transport — messages outlive connections, multiple clients can subscribe to the same session, and reconnecting clients receive missed events automatically. WebSockets are a transport; Durable Sessions are a session architecture.
Should I replace my unit tests with evals for AI agents?
Yes, for any test asserting exact deterministic outputs from a non-deterministic agent. The Schmid framework recommends supplementing or replacing unit tests with evals that measure reliability across multiple runs — e.g., 'this agent must succeed 8 out of 10 times.' Use LLM-as-a-judge for scalable qualitative scoring and set explicit reliability thresholds as production gates.
How do I show live progress from multiple AI sub-agents to the user?
Use the Christensen Durable Sessions Framework. It solves the Orchestrator Dual-Purpose Problem by having each sub-agent write progress updates directly to a shared session channel. The client subscribes once and sees all sub-agents' activity multiplexed together. The orchestrator focuses only on coordination, not relaying updates, dramatically reducing architectural complexity.
What does 'errors are just inputs' mean for AI agent design?
It means when a tool call or API fails inside an agent flow, you feed the error back to the model as an informational message rather than throwing an exception or restarting. The model then reasons around the failure — trying alternatives, skipping that step, or asking the user. This is critical for long-running agents where a restart wastes minutes of compute and accumulated context.
Which framework is better for a multi-agent AI system?
For delivery and streaming of multi-agent outputs, Christensen's Durable Sessions is clearly better — it directly solves how sub-agent updates reach clients without bottlenecking through an orchestrator. For making each individual agent within the system reliable and well-designed, Schmid's Agent-Ready framework is better. In a multi-agent system, you likely need both.