Agent Harness vs Durable Sessions: Which Should You Use?
// TL;DR
Use the Tejas Agent Harness Engineering Framework if your AI agent produces unreliable outputs, hallucinates, or lies about success — it wraps any model in deterministic guardrails, verify steps, and handlers to guarantee correctness. Use the Christensen Durable Sessions Framework if your agent works correctly but the user experience breaks due to dropped streams, no multi-device continuity, or missing stop/steer controls. These frameworks solve different layers of the AI stack and are complementary, not competing.
// HOW DO THEY COMPARE?
| Dimension | Tejas Agent Harness Engineering Framework | Christensen Durable Sessions AI UX Framework |
|---|---|---|
| Best for | Making unreliable AI agents produce correct, verifiable results | Making AI chat/agent UX resilient, multi-device, and controllable |
| Layer of the stack | Agent execution logic — wraps the model and agent loop | Delivery infrastructure — sits between agent output and client |
| Core problem solved | Non-deterministic model outputs, hallucination, false success reports | Dropped streams, SSE limitations, no cross-device sync, no live control |
| Complexity to implement | Moderate — build agent loop, guardrails, verify step, and handlers in code | Moderate to high — requires persistent pub/sub session layer and transport changes |
| Time to apply | Hours to days for a single agent task; iterative | Days to weeks; architectural change across client and server |
| Prerequisites | An existing agent task, any LLM, basic coding ability | An existing streaming AI product, understanding of SSE/WebSockets, pub/sub infrastructure |
| Output type | A deterministic harness wrapping an agent that guarantees task correctness | A session architecture that guarantees delivery resilience and user control |
| Model dependency | Model-agnostic — explicitly designed to make cheap models reliable | Model-agnostic — operates below the model layer entirely |
| Creator background | Tejas Kumar — AI engineer focused on agentic reliability patterns | Mike Christensen (Ably) — real-time infrastructure, AI Engineer conference talk |
| Complementary use | Pair with Durable Sessions for end-to-end reliability from agent to client | Pair with Agent Harness for end-to-end reliability from model to user |
What does the Tejas Agent Harness Engineering Framework do?
The Tejas Agent Harness Engineering Framework solves the problem of unreliable AI agents at the execution layer. When an agent hallucinates, lies about completing a task, gets stuck in loops, or crashes on authentication walls, the instinct is to rewrite the prompt. Tejas's framework rejects that instinct entirely. Instead, you build a harness — a deterministic wrapper around the model that includes an agent loop, tool registry, guardrails (max iterations, max messages), a context compressor, deterministic handlers for known obstacles like login pages, and a verify step that inspects the agent's trace to confirm success or failure in code.
The key insight is that a cheap, outdated model wrapped in a strong harness outperforms an expensive frontier model running without one. The harness handles everything critical — authentication, secret injection, result verification — deterministically, so the model only handles what it's good at: reasoning and tool selection. You never change the prompt; you iterate on the harness.
What does the Christensen Durable Sessions Framework do?
The Christensen Durable Sessions Framework solves the problem of broken AI user experiences at the delivery layer. Even when your agent works perfectly, the experience can still break: a mobile user switches networks and loses the streamed response, a second tab can't see the in-progress answer, the stop button doesn't work reliably because SSE conflates disconnect with cancel, or an orchestrator bottlenecks because it's forced to relay sub-agent progress updates.
The fix is a Durable Session — a persistent, shared, addressable resource that sits between agents and clients. Agents write events to the session; clients subscribe to it. Neither holds a direct connection to the other. This unlocks three foundational capabilities simultaneously: Resilient Delivery (streams survive disconnections), Continuity Across Surfaces (session follows the user across devices), and Live Control (users can steer, stop, or message agents mid-generation). The natural implementation is a pub/sub channel model with bidirectional transport replacing SSE.
How do they compare?
These two frameworks operate on entirely different layers of the AI product stack and solve different categories of failure.
The Agent Harness is concerned with whether the agent does the right thing. Did it actually submit the form? Did it hallucinate a tool call? Is it lying about success? The harness answers these questions with deterministic verify steps and handlers — it is engineering reliability into the agent's execution.
The Durable Sessions framework is concerned with whether the user receives the right thing. Did the response survive a network drop? Can the user see it on another device? Can they stop or steer the agent? Durable Sessions engineer reliability into the delivery and interaction layer.
If your agent gives wrong answers, no amount of session infrastructure will fix it — you need the harness. If your agent gives right answers but users lose them mid-stream or can't control the agent, no amount of harness engineering will fix it — you need Durable Sessions.
A critical distinction: the Agent Harness is iterative and task-scoped. You define a task, observe failures, and add harness components one by one. Durable Sessions require an architectural change — you're redesigning how your entire streaming pipeline works, replacing SSE with bidirectional transport, and introducing a persistent session layer. The harness is faster to apply but narrower in scope; Durable Sessions are slower to implement but transform your entire product's UX resilience.
Which should you choose?
Choose the Tejas Agent Harness if your primary problem is agent correctness. Your agent hallucinates, claims it completed tasks it didn't, crashes on auth walls, loops infinitely, or produces inconsistent results. You want to make a cheap model reliable without prompt tweaking. This is the right starting point for anyone building agentic workflows that interact with external systems.
Choose Christensen Durable Sessions if your agent logic works but your user experience is fragile. Users lose responses on mobile, can't continue conversations across devices, can't stop or steer the agent mid-generation, or your multi-agent architecture creates orchestrator bottlenecks. This is the right framework for teams scaling an AI product beyond a demo into production UX.
Choose both if you're building a production AI product end-to-end. The harness guarantees the agent does the right thing; Durable Sessions guarantee the user receives it reliably and can interact with it. They are fully complementary. Apply the harness first to get agent correctness, then layer Durable Sessions to make the delivery bulletproof.
Neither framework is a substitute for the other. They are not competitors — they are adjacent layers in a complete AI product architecture. The harness is the agent-side anchor; the Durable Session is the client-side anchor. Together, they close the full reliability gap from model to user.
// FREQUENTLY ASKED QUESTIONS
Can I use the Agent Harness and Durable Sessions together?
Yes, and you should for production AI products. The Agent Harness ensures the agent produces correct results by wrapping it in deterministic guardrails and verify steps. Durable Sessions ensure those results reach the user reliably across disconnections, devices, and with live control. They operate on different layers and are fully complementary.
Do I need Durable Sessions if I'm already using the Vercel AI SDK?
Likely yes. The Vercel AI SDK uses SSE for streaming, which means you're in the Single-Connection Trap: a dropped connection kills the stream, a second device can't see the response, and the stop button creates resume-cancel ambiguity. Durable Sessions solve all three problems by decoupling agents from client connections.
Does the Agent Harness work with any LLM model?
Yes. The harness is explicitly model-agnostic. A core principle is that a cheap or small model wrapped in a strong harness outperforms an expensive frontier model running without one. You can swap models freely because all critical logic — authentication, verification, guardrails — lives in the harness, not the model.
What's the difference between an agent harness and an agent loop?
The agent loop is the inner while-true cycle that sends prompts and collects responses. The harness is everything around the model: the loop wrapper, tool registry, guardrails, context compressor, deterministic handlers, verify step, and retry logic. The agent loop is one component inside the harness, not the harness itself.
Why can't I just improve my prompt instead of building a harness?
Because prompting is probabilistic — it may improve outputs on average but cannot guarantee correctness for critical steps like authentication, form submission, or result verification. The harness handles these deterministically in code. In Tejas's demonstration, the prompt was never changed; the harness alone made an unreliable agent succeed every time.
What is the SSE resume-cancel conflict in Durable Sessions?
SSE is one-way, so the only signal a client can send is closing the connection. This creates an irresolvable ambiguity: does closing mean 'I disconnected, buffer my messages for resume' or 'I pressed stop, cancel generation'? These are mutually exclusive actions. Durable Sessions solve this by using bidirectional transport with explicit cancel signals.
How long does it take to implement each framework?
The Agent Harness can be applied in hours to days for a single agent task — you build iteratively, adding guardrails and handlers as you identify failure modes. Durable Sessions typically take days to weeks because they require architectural changes: introducing a persistent session layer, replacing SSE transport, and updating both server and client code.
Which framework should I use if my AI agent keeps saying it completed a task but actually didn't?
Use the Tejas Agent Harness. This is the exact failure mode it targets. You add a deterministic verify step that inspects the agent's trace — the history of tool calls and events — and confirms success or failure in code. The model's self-report is never trusted. The verify step removes the agent's ability to lie.