Durable Sessions vs DeepMind App-Building: Which Do You Need?

// TL;DR

These two frameworks solve completely different problems and are not substitutes. If your AI product streams responses to users and breaks on disconnect, multi-device, or mid-generation control, use the Christensen Durable Sessions Framework to fix your real-time delivery architecture. If you need to build a new multimodal AI application using Google DeepMind's model suite (Gemini, Nano Banana, VO, LIA), use the DeepMind Generative Media App-Building Framework. Many teams building generative media apps will eventually need both — DeepMind's framework to select and chain models, and Durable Sessions to deliver those outputs reliably.

// HOW DO THEY COMPARE?

DimensionChristensen Durable Sessions AI UX FrameworkGoogle DeepMind Generative Media App-Building Framework
Best ForFixing broken AI chat/agent streaming UX — disconnects, multi-device, live controlBuilding new multimodal AI apps using Google DeepMind's model suite (Gemini, image, video, music)
Problem DomainReal-time delivery infrastructure and AI UX resilienceModel selection, prototyping, and deploying generative media pipelines
ComplexityHigh — requires rearchitecting streaming layer, replacing SSE, adding pub/sub session substrateModerate — follows a guided prototype-to-production path using AI Studio's built-in tooling
Time to ApplyDays to weeks depending on existing architecture debtHours to days for prototyping; days to weeks for production pipelines
PrerequisitesExisting AI chat/agent product with streaming; familiarity with WebSockets, pub/sub, SSEA Google AI Studio account; basic Python or TypeScript; understanding of target modalities
Output TypeArchitectural redesign: a durable session layer between agents and clientsA working multimodal app with model pipeline, scaffolded code, and deployment target
Model Agnostic?Yes — works with any LLM or agent frameworkNo — tightly coupled to Google DeepMind models (Gemini, Nano Banana, VO, LIA, Gemma)
Multi-Agent SupportCore strength — solves orchestrator relay bottleneck for multi-agent architecturesNot addressed — focuses on model chaining, not agent coordination
Creator BackgroundMike Christensen, Ably (real-time infrastructure company)Paige Bailey & Guillaume Vernade, Google DeepMind
Vendor Lock-inLow — principles apply to any pub/sub or WebSocket implementationHigh — deeply integrated with Google's AI Studio, Vertex AI, and DeepMind model APIs

What does the Christensen Durable Sessions AI UX Framework do?

The Christensen Durable Sessions Framework diagnoses and fixes a specific, pervasive problem: your AI chat or agent product streams responses to users over a fragile, single-connection pipe (typically SSE), and that pipe breaks the moment real-world conditions appear — network drops, tab switches, multi-device usage, or the user pressing a stop button.

The framework introduces the concept of a Durable Session: a persistent, shared resource that sits between your agent layer and your client layer. Agents write events (token chunks, tool results, status updates) to the session. Clients subscribe to the session. Neither holds a direct connection to the other. This architectural inversion unlocks three foundational capabilities: Resilient Delivery (streams survive disconnections), Continuity Across Surfaces (sessions follow users across tabs and devices), and Live Control (clients can steer or cancel agents mid-generation).

The framework also solves the Orchestrator Dual-Purpose Problem in multi-agent setups, where the orchestrator is forced to both coordinate subtasks and relay progress updates. With Durable Sessions, every sub-agent writes directly to the session, eliminating the relay bottleneck.

What does the Google DeepMind Generative Media App-Building Framework do?

This framework is a step-by-step guide to building real, deployable multimodal AI applications using Google DeepMind's full model suite: Gemini for understanding and generation, Nano Banana 2 for images, VO for video, LIA for music, Gemini Live for real-time voice, Genie 3 for interactive worlds, and Gemma 4 for on-device inference.

The core workflow is: define your app goal → select the right model tier for each modality → prototype in AI Studio Playground → click 'Get Code' to export validated configurations → chain models using Gemini as a prompt factory with structured outputs → deploy to the right platform (AI Studio for personal projects, Developer API for consumer apps, Vertex AI for enterprise).

Key principles include using Gemini to generate prompts for downstream generative models (since Gemini was trained on the same data), enforcing character consistency by passing explicit reference images, and defaulting to the cheapest model tier during development (Flash Light at ~$0.25/M tokens) before upgrading deliberately.

How do they compare?

These frameworks operate at entirely different layers of the AI product stack and are complementary, not competitive.

The Durable Sessions Framework operates at the delivery and infrastructure layer — how generated content gets from the agent to the user reliably. It is model-agnostic and applies whether you use OpenAI, Anthropic, Google, or open-source models. It solves problems that appear after you have a working model pipeline: disconnections, multi-device sync, live user control, and multi-agent progress visibility.

The DeepMind App-Building Framework operates at the model selection and application layer — choosing which models to use, how to chain them, and how to go from prototype to production. It is deeply coupled to Google's ecosystem and solves problems that appear before you have a delivery layer: which model handles which modality, how to maintain character consistency across generated images, and how to avoid building infrastructure the model will absorb.

Where they overlap is in the ambition to move AI products from fragile demos to production-quality experiences. Christensen attacks this from the connectivity side; Bailey and Vernade attack it from the model-selection and prototyping side.

A team building a multimodal generative media app on Google's stack would use the DeepMind framework to design and build the model pipeline, and would later need the Durable Sessions framework when they discover that streaming VO-generated video or multi-agent progress updates to users over SSE breaks under real-world conditions.

Which should you choose?

Choose the Christensen Durable Sessions Framework if you already have an AI chat, agent, or streaming product and your users are experiencing broken streams on disconnect, no multi-device continuity, or inability to interrupt/steer an agent mid-generation. This framework gives you the architectural pattern to fix those problems permanently. It is especially critical if you are running multi-agent architectures where the orchestrator has become a bottleneck for progress updates.

Choose the DeepMind Generative Media App-Building Framework if you are starting a new project that involves image, video, music, or multimodal generation and want to build on Google's model suite. It gives you a clear path from prototype (AI Studio) to production (Developer API or Vertex AI) with specific guidance on model tiers, cost management, and pipeline design.

Choose both if you are building a production multimodal AI product that streams generative content to users across devices. Use DeepMind's framework to build the model pipeline, then layer Durable Sessions underneath to ensure reliable, resumable, multi-surface delivery.

Neither framework replaces the other. If you are debugging why your AI chat UX breaks when users switch Wi-Fi networks, the DeepMind framework will not help you. If you are trying to figure out whether to use Nano Banana 2 or VO3.1 Light for your use case, the Durable Sessions framework has nothing to say about it.

// FREQUENTLY ASKED QUESTIONS

Can I use Durable Sessions with Google DeepMind models?

Yes. The Durable Sessions framework is model-agnostic. It sits between any agent layer (including agents powered by Gemini, Nano Banana, or VO) and the client layer. If you're streaming DeepMind model outputs to users, Durable Sessions solves the delivery reliability problem regardless of which model generated the content.

Do I need both frameworks to build a production AI app?

It depends on your app. If you're building a multimodal app on Google's stack that streams results to users in real time, you likely need both — DeepMind's framework for model selection and pipeline design, and Durable Sessions for reliable delivery. If your app is batch-only with no live streaming, Durable Sessions may not be necessary.

Which framework should I use if my AI chatbot breaks when users lose internet?

Use the Christensen Durable Sessions Framework. This is the exact problem it solves — the Single-Connection Trap where stream health is coupled to one client's connection. Durable Sessions decouple the agent from the client, enabling automatic resume on reconnect without agent-side replay logic.

Which framework helps me choose between Gemini Flash and Gemini Pro?

The Google DeepMind Generative Media App-Building Framework. It provides explicit model tier selection guidance: default to Flash Light (~$0.25/M tokens) during development, upgrade to Pro only when quality deltas justify the ~10x cost increase. The Durable Sessions framework does not address model selection.

Is the Durable Sessions framework tied to Ably's products?

The principles are vendor-agnostic — any pub/sub or WebSocket infrastructure can implement Durable Sessions. The talk was given by Ably's Mike Christensen, and Ably provides a natural implementation substrate, but the architectural pattern (persistent, resumable, shared session channels) can be built on other platforms.

Can the DeepMind framework help me build a multi-agent system?

Not directly. The DeepMind framework focuses on chaining models (Gemini → Nano Banana → VO → LIA) in a pipeline, not on orchestrating multiple autonomous agents. For multi-agent coordination and progress delivery, the Durable Sessions framework is the relevant one — it solves the orchestrator relay bottleneck.

Which framework is faster to implement?

The DeepMind framework is faster to get started with — you can prototype in AI Studio and export working code in hours. The Durable Sessions framework requires rearchitecting your streaming layer, which typically takes days to weeks depending on existing technical debt and whether you need to replace SSE with WebSockets.

What if I'm not using Google's models — is the DeepMind framework still useful?

Mostly no. The framework is tightly coupled to Google's model suite, AI Studio tooling, and deployment platforms (Vertex AI, Developer API). Some principles — like using structured outputs for chained pipelines and prototyping cheap before upgrading — are universally applicable, but the specific workflow assumes Google's ecosystem.