DeepMind App Framework vs Software Factory Primitives
// TL;DR
These two frameworks solve completely different problems. Use the Google DeepMind Generative Media App-Building Framework when you need to build multimodal AI applications — image, video, music, and text generation — using Google's model suite. Use the Software Factory Primitives Framework when you need to coordinate fleets of coding agents across your SDLC to automate software development itself. If you're building an AI-powered product for end users, choose DeepMind. If you're automating how your engineering team ships code with agents, choose Software Factory Primitives.
// HOW DO THEY COMPARE?
| Dimension | Google DeepMind Generative Media App-Building Framework | Lou Bichard Software Factory Primitives Framework |
|---|---|---|
| Best For | Building multimodal AI apps (image, video, music, text generation) for end users | Designing and scaling agentic coding pipelines that automate the SDLC |
| Primary Problem Solved | Selecting the right model, prototyping in AI Studio, and shipping production-ready generative media apps | Diagnosing which infrastructure primitive is blocking your agent swarm and building a coordination layer |
| Complexity | Moderate — guided by AI Studio playground with one-click code export; steepens at multi-model pipelines | High — requires decomposing SDLC into micro-steps, building coordination layers, and iterating via harness engineering |
| Time to Apply | Hours to days for a prototype; days to weeks for production multi-model pipeline | Weeks to months — infrastructure design, coordination layer build, and iterative harness engineering |
| Prerequisites | Google AI Studio account, familiarity with Python or TypeScript, basic prompt engineering | Existing coding agents (Claude Code, Cursor, etc.), DevOps/platform engineering capacity, understanding of SDLC stages |
| Output Type | A deployable app that generates or processes images, video, music, text, or combinations thereof | An infrastructure architecture and coordination layer that enables autonomous agent-driven software delivery |
| Target User | App developers, indie creators, product engineers building AI-powered consumer or enterprise products | Platform engineers, engineering managers, and DevOps leads scaling agentic coding across teams or orgs |
| Ecosystem Lock-in | Tied to Google DeepMind models (Gemini, Nano Banana 2, VO, LIA, Gemma); open-weight option via Gemma 4 | Model-agnostic — works with any coding agent (Claude Code, Cursor, custom agents); principles are infrastructure-level |
| Scaling Model | Vertical — upgrade model tier (Flash Light → Flash → Pro) and add service tier signaling for cost/reliability | Horizontal — Swarm (fan-out sub-agents), Fleet (fan-out across repos), Events (webhook-triggered background agents) |
| Creator Background | Paige Bailey & Guillaume Vernade, Google DeepMind — presented at AI Engineer conference | Lou Bichard, Ona — presented at AI Engineer conference; focused on agent coordination infrastructure |
What does the Google DeepMind Generative Media App-Building Framework do?
This framework gives you a structured method for building real, deployable applications using Google DeepMind's full model suite — Gemini for multimodal understanding and generation, Nano Banana 2 for image generation, VO for video generation, LIA for music generation, Gemini Live for real-time voice, and Gemma for on-device open-weight deployment.
The core workflow is: define your app goal and required modalities, select the right model tier for cost and quality, prototype everything in AI Studio's playground, then click "Get Code" to export a production-ready Python or TypeScript starting point. For complex generative media pipelines — like illustrating a book with consistent characters, then generating video and music for each chapter — you use Gemini as a "prompt factory" to generate structured prompts for downstream models.
Key principles include prototyping cheap with the smallest capable model (Flash Light at ~$0.25/M tokens), using structured outputs for chained pipelines, passing explicit reference images for character consistency, and avoiding building infrastructure the model will absorb natively within months. The framework also covers full-stack app scaffolding via AI Studio Build, which accepts natural-language specs and generates apps with UI, database, and auth.
What does the Software Factory Primitives Framework do?
This framework addresses a fundamentally different problem: how to design, audit, and scale systems of coding agents that automate the software development lifecycle. It diagnoses exactly which infrastructure primitive is blocking your agentic coding pipeline and prescribes the coordination layer needed to remove humans from the loop.
The framework identifies four required primitives: Runtime (where agents run), Orchestration (scaling agents horizontally), Triggers (events that bring agents online), and Coordination (how agents interact and hand off work). The central thesis is that the first three are largely solved, but Coordination is the missing primitive that prevents true software factories from working.
Critical concepts include decomposing the coarse five-step SDLC into explicit micro-steps that agents can follow without skipping, combating "context rot" (agent degradation as context windows fill), and "Harness Engineering" — the practice of encoding everything possible back into the repository to keep agents on track. The framework defines three scaling patterns: Swarm (sub-agents funneling into one PR), Fleet (agents across hundreds of repos), and Events (webhook-triggered background agents).
How do they compare?
These frameworks operate at entirely different layers of the AI application stack and do not compete with each other.
The DeepMind framework is about what you build — it helps you construct AI-powered products that use generative media capabilities. Your output is an application that end users interact with: a bookshelf cataloger, an illustrated book generator, a real-time multilingual voice assistant.
The Software Factory Primitives framework is about how you build — it restructures the engineering process itself so that coding agents, not humans, drive work through the SDLC. Your output is infrastructure: coordination layers, micro-step definitions, gating systems, and harness-engineered repositories.
On complexity, the DeepMind framework is more accessible. AI Studio's playground and one-click code export mean a solo developer can ship a working multimodal app in hours. Software Factory Primitives demands platform engineering expertise, iterative debugging of agent behavior, and weeks of infrastructure investment before seeing results.
On ecosystem lock-in, the DeepMind framework is tightly coupled to Google's model suite (though Gemma 4 offers an open-weight escape hatch). Software Factory Primitives is model-agnostic — it works with any coding agent and any CI/CD stack.
On who benefits, the DeepMind framework serves app developers and creators. Software Factory Primitives serves platform engineers and engineering leaders at organizations with multiple repositories and teams.
Which should you choose?
Choose the Google DeepMind Generative Media App-Building Framework if your goal is to build an application that generates, processes, or combines images, video, music, audio, or text for end users. This is the right choice for product engineers, indie developers, and creators who want to ship multimodal AI apps using Google's models. Start in AI Studio, validate your experience, click Get Code, and iterate toward production.
Choose the Software Factory Primitives Framework if your goal is to automate how your engineering organization ships software using coding agents. This is the right choice for platform engineers and engineering leaders who already have coding agents running but are hitting coordination failures — agents skipping steps, humans drowning in noise from GitHub/Linear, and no clear mechanism for agent-to-agent handoff. You need this when the bottleneck is not the model's capability but the infrastructure around it.
You might use both if you are building a generative media product (DeepMind framework for the app itself) and want to automate the development process for that product using coding agents (Software Factory Primitives for the engineering pipeline). They are complementary, not competitive.
If you are unsure which you need: if your primary question is "which model should I use and how do I call it," start with DeepMind. If your primary question is "why do my agents keep failing mid-task and how do I get them to finish autonomously," start with Software Factory Primitives.
// FREQUENTLY ASKED QUESTIONS
Can I use the DeepMind framework and Software Factory Primitives together?
Yes. They solve different problems. Use the DeepMind framework to design your multimodal AI application — model selection, prototyping, generative media pipelines. Use Software Factory Primitives to automate the development workflow for building that application with coding agents. One defines what you build; the other defines how your engineering process works.
Which framework is better for a solo developer building an AI app?
The Google DeepMind Generative Media App-Building Framework is clearly better. It's designed for individual developers: prototype in AI Studio, click Get Code, and ship. Software Factory Primitives requires platform engineering infrastructure and is designed for teams coordinating multiple coding agents across repositories — overkill for a solo developer.
Do I need Google Cloud or Vertex AI to use the DeepMind framework?
No. The framework recommends starting with AI Studio and the Developer API — just create an API key and build. Vertex AI is only needed when you have enterprise data residency requirements or an existing GCP setup with DevOps capacity. For on-device deployment, Gemma 4 runs via Ollama or LM Studio with no cloud dependency.
What coding agents work with the Software Factory Primitives framework?
The framework is model-agnostic. It works with Claude Code, Cursor, Codex, custom agents, or any coding agent that can execute tool calls. The principles — four primitives, micro-step decomposition, harness engineering, coordination layers — apply regardless of which underlying LLM or agent tooling you use.
What is the coordination layer and why is it the missing primitive?
The coordination layer is infrastructure that enables agents to interact, hand off tasks, gate progress through SDLC micro-steps, and collaborate. It's the missing primitive because Runtime, Orchestration, and Triggers are largely solved by existing tools, but no standard solution exists for agent-to-agent coordination. Without it, agents skip steps, lose context, and require constant human intervention.
How long does it take to build a working app with the DeepMind framework?
A prototype can be built in hours using AI Studio's playground and Get Code export. A production multi-model pipeline — like generating illustrated book chapters with consistent characters, video, and music — takes days to weeks. The framework's emphasis on prototyping cheap with Flash Light and upgrading deliberately minimizes wasted time and cost.
What is harness engineering and when do I need it?
Harness engineering is the practice of encoding process knowledge — agents.md files, skills, context files, unit tests — back into your repository so coding agents stay on track. You need it when agents are losing context, skipping steps, or drifting mid-task. It's the core feedback loop in the Software Factory Primitives framework for continuously improving agent reliability.
Is the DeepMind framework only for generative media apps?
Primarily, yes. It's optimized for applications involving image generation, video generation, music generation, text-to-speech, multimodal understanding, and their combinations. However, the principles around AI Studio prototyping, Get Code export, model tier selection, and structured outputs apply to any application built on Gemini's API — including pure text or code generation tasks.