Alvoeiro Missions Multi-Agent Architecture

Last updated: 20 May 2026

Design and deploy a multi-agent system capable of running autonomously for days or weeks by composing the five frontier multi-agent patterns into a structured Missions workflow with serial execution, validation contracts, and structured handoffs.

Framework

multi-agent software-engineering autonomous-systems AI-architecture long-running-tasks validation agent-orchestration

// WHEN TO USE

Use this skill when a software task is too large or complex for a single agent session, or when the bottleneck is human attention rather than model intelligence — particularly for long-running builds, large refactors, migrations, or overnight feature prototyping.

// INPUTS REQUIRED

Project goalrequired
A plain-language description of what the system should build or accomplish.
Scope constraintsrequired
Budget limits, timeline, architectural preferences, or any non-negotiable requirements to resolve during orchestrator scoping conversation.
Model rosterrequired
The LLMs available for assignment to each role (orchestrator, worker, validator). At least two distinct models or providers are strongly preferred.
Validation criteria (seed)
Any known acceptance criteria, user flows, or functional requirements that can seed the validation contract before coding begins.
Skills / procedures
Any domain-specific skills, coding conventions, or agent.md files the orchestrator should inject into worker behaviour.

// PRINCIPLES

The Attention Bottleneck

The bottleneck in software engineering is no longer model intelligence — it is human attention. The goal of a Missions architecture is to let a human decide what to build while the system figures out how, running for hours or days without continuous supervision.

The Five Frontier Patterns

All multi-agent coordination can be decomposed into five building blocks: Delegation (parent spawns child agent), Creator-Verifier (separate agents for implementation and review), Direct Communication (peer-to-peer without coordinator), Negotiation (agents interact over a shared resource, ideally net-positive-sum), and Broadcast (one agent sends state or constraints to many). Missions deliberately composes four of these — Delegation, Creator-Verifier, Broadcast, and Negotiation — into a single workflow.

Validation Contract

A validation contract is written during planning, before any code is written, and defines correctness independently of implementation. It is a set of assertions (potentially hundreds) that the completed system must satisfy, so that tests are shaped by intent rather than by the code that was actually written. Tests written after implementation don't catch bugs — they confirm decisions.

Three-Role Architecture

Every Mission uses exactly three roles: the Orchestrator (planning, scoping, producing the validation contract, defining worker skills), Workers (clean-context implementation agents that each inherit a working codebase via Git commits), and Validators (adversarial verification agents that have never seen the code and are not invested in the implementation).

Serial Execution with Targeted Internal Parallelisation

Features run serially — only one worker or validator is active at any time — to prevent agents from conflicting, duplicating work, or making inconsistent architectural decisions. Parallelisation is permitted only for read-only operations within a feature (e.g., code search, API research) and for code review within a validation pass. Correctness compounds over many days; error rate matters more than raw throughput.

Structured Handoffs

When a worker finishes a feature it fills out a structured handoff documenting: what was completed, what was left undone, every command run and its exit codes, issues discovered, and whether it followed the orchestrator-defined procedures. This is the connective tissue that prevents context loss across agents and enables the system to self-heal at milestone boundaries.

Droid Whispering

Droid whispering is the skill of mentally modelling how different LLMs behave under pressure, where they fail, and how those failures compound over a multi-day run — then making deliberate choices about which model sits in which seat. Planning benefits from slow, careful reasoning; implementation from fast code fluency and creativity; validation from precise instruction-following. No single model or provider is best at all three.

Adversarial Validation by Design

Neither validator type (Scrutiny Validator or User Testing Validator) has seen the code before they review it. They carry no cost bias toward the implementation succeeding. Separation of context between creator and verifier is not incidental — it is the core mechanism that makes long-running missions stay on track.

The Bitter Lesson Hedge

Orchestration logic should live in prompts and skills rather than hard-coded state machines, so the system improves automatically with every model upgrade. The only deterministic logic should be thin bookkeeping — running validation, blocking progress on unresolved handoff issues — while models provide the intelligence.

// WORKFLOW

1
Define the mission goal and open a scoping conversation with the Orchestrator
Hand the Orchestrator a plain-language goal. The Orchestrator acts as a sounding board: it surfaces unclear requirements, asks strategic clarifying questions, and resolves ambiguities before any plan is produced. Do not skip this step — unresolved scope issues become compounding errors over a multi-day run.
2
Produce the validation contract before any coding begins
The Orchestrator outputs a validation contract — a set of assertions defining what 'done' means, derived from the scoping conversation and any seed criteria you provided. Each assertion must be checkable independently of implementation. Aim for comprehensive behavioural coverage (end-to-end flows, functional paths), not just syntactic checks. This document is the source of truth for the entire mission.
3
Approve the plan including features, milestones, and assertion assignments
The Orchestrator produces a plan decomposed into features and milestones. Each feature is assigned one or more assertions from the validation contract. The sum of all feature assertions must cover every assertion in the contract. Review and approve this plan; argue with the Orchestrator about scope now, not during execution.
4
Assign model roles using Droid Whispering principles
Explicitly choose which model sits in each seat: Orchestrator (slow, careful reasoning model), Worker (fast, code-fluent, creative model), Scrutiny Validator, User Testing Validator. Consider using a different model provider for validation entirely to avoid shared training-data bias. If using open-weight models, the validation contract and milestone checkpoints compensate for sub-frontier performance.
5
Execute features serially with Workers, each starting from a clean context
Each Worker receives only its assigned spec and the current codebase state (inherited via Git). No accumulated context from prior agents. Within a feature, parallelise only read-only sub-tasks (code search, API research). When the feature is complete, the Worker commits via Git and fills out a structured handoff before it is terminated.
6
Run the Scrutiny Validator at each milestone boundary
The Scrutiny Validator runs the test suite, type checking, and linting, then spawns dedicated code review agents for each completed feature in the milestone. Code review agents are parallelised (read-only). The validator has never seen the code and carries no implementation bias. Any failures are documented and scoped as corrective work.
7
Run the User Testing Validator at each milestone boundary
The User Testing Validator spawns the live application and interacts with it via computer use or an equivalent mechanism — filling forms, clicking buttons, verifying rendered pages, walking end-to-end functional flows. This step is the most time-consuming; most of a mission's wall-clock time is spent here waiting for real-world execution rather than generating tokens. Expect this validator to fail on the first pass; follow-up features from its findings are normal and expected.
8
Process handoff summaries and trigger self-healing at milestone boundaries
The Orchestrator reviews structured handoffs from all completed workers and both validators. Unresolved issues block forward progress (this is enforced deterministically). Corrective features are scoped, assigned assertions, and inserted into the next milestone. The mission self-heals by writing failures down and forcing resolution — not by hoping agents remember context.
9
Broadcast updated shared mission state to all subsequent agents
After each milestone, the Orchestrator broadcasts updated shared state (progress, revised constraints, known issues, architectural decisions) so that all future Workers and Validators start with coherent context. This is the Broadcast pattern maintaining global coherence across a long-running ecosystem.
10
Monitor via Mission Control and intervene only when necessary
Use a dedicated async monitoring view (Mission Control equivalent) to check active worker activity, handoff summaries, validator findings, budget burn, and overall completion percentage at a glance. The default posture is non-intervention — you are a project manager, not a pair programmer. Intervene only if the orchestrator surfaces a decision that requires human judgment on scope or architecture.

// EXAMPLES

An enterprise team wants to prototype a new internal tool overnight without pulling engineers off their current sprint.

The team describes the tool's goal to the Orchestrator, resolves scope in a brief conversation, and approves a plan with a validation contract covering the core user flows. Workers implement features serially through the night. Both validators run at each milestone. By morning, the team reviews the Mission Control summary, sees which assertions passed and which corrective features were auto-scoped, and receives a working prototype with 90%+ test coverage — without any engineer staying up to supervise.

A team needs to run a large codebase migration (e.g., upgrading a framework or changing a data layer) across thousands of files.

The Orchestrator scopes the migration into milestones by subsystem. The validation contract encodes behavioural correctness for each subsystem (not just that tests pass, but that end-to-end flows still work). Each Worker handles one subsystem with clean context, committing a working state before the next begins. The User Testing Validator catches regressions that unit tests would miss because those tests were not written after the migration — they were specified before it. Serial execution prevents workers from producing conflicting migrations in the same files.

A solo developer wants to build a complex web application with features spanning authentication, real-time data, and third-party integrations.

The developer argues with the Orchestrator during scoping to lock down architecture decisions (e.g., which auth provider, which real-time approach) before any code is written. The validation contract asserts functional flows for each integration. Droid Whispering leads to assigning a reasoning-heavy model to orchestration, a fast code-generation model to workers, and a different provider's model to validation to avoid shared-training-data blind spots. The developer checks Mission Control periodically and returns to a largely complete application after several days, with the codebase ending up cleaner — with more tests and documented skills — than a manual implementation would have produced.

// PITFALLS

Skipping the validation contract and writing tests after implementation — this causes validators to confirm decisions rather than catch bugs, and the mission will drift over time.
Running all agents in full parallelism — agents conflict on shared files, duplicate work, and make inconsistent architectural decisions; coordination overhead eliminates speed gains while burning tokens.
Using a single model or provider across all three roles — no model is best at planning, code generation, and precise instruction-following simultaneously; locking into one provider means being constrained by that family's weakest capability.
Allowing workers to carry accumulated context from prior agents — each worker must start with clean context (only its spec and the current codebase state) to avoid degraded attention and compounding errors.
Relying only on Scrutiny Validation (lint, type-check, tests) without User Testing Validation — tests written during implementation confirm implementation decisions; behavioural validation of a live application catches an entirely different class of bugs.
Hard-coding orchestration logic into a deterministic state machine — this makes the architecture brittle to model improvements; orchestration logic should live in prompts and skills so the system improves as models improve.
Ignoring or deferring unresolved handoff issues — forward progress must be blocked until issues surfaced in handoff summaries are addressed; hoping agents will remember context across sessions is how missions go off the rails.
Not resolving scope ambiguity during the Orchestrator scoping conversation — ambiguities not caught in planning become compounding errors across a multi-day run.
Treating direct agent-to-peer communication (Direct Communication pattern) as a default — state fragments across conversations without a central coordinator and there is no single source of truth; prefer structured handoffs and broadcast through shared mission state instead.

// GLOSSARY

Mission: A long-running autonomous workflow — not a single agent session — combining Delegation, Creator-Verifier, Broadcast, and Negotiation patterns into an ecosystem of agents that communicate through structured handoffs and shared state to complete a goal over hours or days.
Validation Contract: A document produced during planning, before any code is written, that defines correctness independently of implementation as a set of assertions each completed feature must satisfy. It is the source of truth that prevents missions from drifting.
Three-Role Architecture: The Orchestrator (planning, scoping, producing the validation contract, defining worker skills), Workers (clean-context implementation agents), and Validators (adversarial verification agents with no prior exposure to the code).
Orchestrator: The planning agent that scopes the mission, asks strategic clarifying questions, produces the validation contract and feature plan, defines skills and procedures for workers, and manages milestone-boundary negotiation.
Worker: A clean-context implementation agent assigned a single feature spec. It reads its spec, implements the feature, commits via Git to hand off a clean working codebase, and fills out a structured handoff upon completion.
Scrutiny Validator: The first validator type, which runs the test suite, type checking, and linting, then spawns parallelised code review agents for each feature in the completed milestone. It has never seen the code before.
User Testing Validator: The second validator type, which spawns the live application and interacts with it via computer use or equivalent — filling forms, clicking buttons, verifying rendered pages — to confirm end-to-end functional flows work holistically.
Structured Handoff: A structured document a Worker fills out upon feature completion, detailing what was completed, what was left undone, every command run and its exit code, issues discovered, and adherence to orchestrator-defined procedures. It is the connective tissue that prevents context loss between agents.
Droid Whispering: The skill of mentally modelling how different LLMs behave, where they fail, and how failures compound over a multi-day run, then making deliberate choices about which model sits in which role (Orchestrator, Worker, Validator).
Serial Execution with Targeted Internal Parallelisation: The execution strategy in which only one worker or validator is active at any given time (serial), but read-only sub-tasks within a feature or validation pass (code search, API research, code review) are parallelised internally.
The Five Frontier Patterns: The five fundamental multi-agent coordination strategies: Delegation, Creator-Verifier, Direct Communication, Negotiation, and Broadcast.
Delegation: One agent spawns another and awaits a response. The simplest form of multi-agent communication.
Creator-Verifier: One agent builds something; a separate agent with fresh context checks that work. Separation of concerns prevents cost bias by the implementing agent.
Direct Communication: Agents communicate peer-to-peer without a central coordinator. Hard to scale because state fragments across conversations with no single source of truth.
Negotiation: Agents interact over a shared resource. Most powerful when structured as net-positive-sum trading (win-win). In Missions it appears at milestone boundaries where the Orchestrator evaluates handoff summaries and decides whether to rescope.
Broadcast: One agent sends shared state, new context, or constraints to many agents. Critical for maintaining coherence across a long-running mission.
Mission Control: A dedicated asynchronous monitoring view showing active worker activity, handoff summaries, validator findings, budget burn, and overall mission completion percentage — enabling oversight without continuous supervision.
The Attention Bottleneck: The insight that the limiting factor in modern software engineering is not model intelligence but human attention — the number of tasks a human can actively supervise at once.
The Bitter Lesson Hedge: The architectural principle that orchestration logic should live in prompts and skills rather than hard-coded state machines, ensuring the system improves automatically with each model release rather than becoming obsolete.

// GET THIS SKILL — FREE