Koc Dark Factory Agent Orchestration Method
Apply Vincent Koc's swim-lane factory model to ship code at industrial velocity using multiple parallel AI coding agents, without turning your codebase into a fire dump.
// TL;DR
The Koc Dark Factory Agent Orchestration Method is Vincent Koc's structured framework for running multiple parallel AI coding agents across isolated swim lanes — CI, features, bugs, and horizon-scanning — while maintaining codebase coherence. Use it when you're managing a high-velocity software project with AI agents and need to ship fast without turning your repo into a fire dump. It replaces brute-force 'commit maxing' with opinionated swim-lane assignments, test-harness gating, waffling detection, and dot-skills iteration. The engineer becomes a factory manager applying taste, not a craftsman typing code.
// When should I use the Koc Dark Factory Agent Orchestration Method?
Use this skill when you are managing a high-velocity software project with AI coding agents and need a structured, opinionated approach to parallelise work across multiple sessions. Trigger it whenever you are tempted to just 'commit max' or 'Ralph loop' without a process.
// What inputs do I need before running the Dark Factory method?
- Current codebase staterequired
Brief description of the repo: stability level, active areas of change, and any known P0/P1 issues. - Work backlogrequired
List of open tasks: features, bugs, refactors, CI failures, PRs to triage. Grouped by type if possible. - Team context
Number of maintainers/contributors active, their day-job constraints, and which areas of the repo each owns. - Agent environment setuprequired
Which coding agent(s) you are running (e.g. Codex), how many parallel sessions your machine can support, and whether you use Git work trees or repo clones per session. - Skills files
Any existing dot-skills files or prompt libraries you have authored for this project.
// What are the core principles of Dark Factory agent orchestration?
Factory Manager Mindset
Engineers are no longer craftsmen at the loom — they are factory managers. Your job is not to write every line; it is to run the production line. The bottleneck is no longer hands; the bottleneck is taste.
Swim Lanes
Partition all active work into parallel, isolated swim lanes — typically CI, features, bugs, and horizon-scanning. Each swim lane is one agent session with a clearly scoped mandate. Lanes that are stable require minimal babysitting; lanes touching novel features require active conversation.
In Harness We Trust
The test harness is your safety net, not your bottleneck. Over-fitting unit tests generated by AI are a feature, not a bug: as long as they go green after a massive refactor, you know you are close. Never rip out the harness before the refactor; it is the only truth signal you have.
Feeling the Reasoning Tokens
Develop intuition for when an agent session is off — not from what it is doing but from how it is explaining itself. Waffling, circular reasoning, or explanations that don't cohere are the agent equivalent of a staff member who is bullshitting. Nuke the session and re-approach later rather than burning more tokens on a derailed context.
Token Efficiency Over Commit Maxing
2025 was about token maxing — burning as many tokens as possible and hoping something ships. The mature posture is token efficiency and agent-in-the-loop: opinionated loops with a reward mechanism, not blind Ralph looping. Do not say yes to every PR; bloat turns the codebase into a fire dump.
Plugin Architecture as Scope Boundary
When contributor pressure on a monolithic codebase becomes unmanageable, the architectural answer is decomposition — a plugin model where external providers own their own slice. This is a 'no' mechanism that scales: instead of rejecting contributors, hand them an isolated surface they control.
Dot-Skills as Engineering Artefacts
Skills are first-class engineering artefacts, versioned alongside dot-files. Co-create them with other engineers, iterate them by feeding agent session logs back through the skill, and deploy them as reusable context into every new session. A skill that improves through use is the compound interest of the factory.
// How do you apply the Dark Factory method step by step?
- 1
Assess codebase stability and classify all open work
Sort the backlog into four buckets: CI/test health, active features, open bugs, and P0/P1 horizon issues. Stability level determines how many swim lanes you can run unsupervised vs. conversationally. A stable codebase supports more unsupervised lanes.
- 2
Deduplicate the PR/issue signal before opening any agent sessions
Before spinning up agents, cluster or semantically group incoming PRs and issues to find where pressure is concentrating. If many contributors independently flag the same problem, that is a signal it is big enough to prioritise. Do not let PR volume become noise that drowns engineering judgement.
- 3
Define swim lanes and assign mandates
Assign each swim lane a single, scoped mandate. Typical split: lanes 1-2 for test refactors or CI (low babysitting, 'take your time, commit when green'); lanes 3-4 for specific features or channel/integration work (active conversation, agent reports back before committing); lane 5+ for P0/P1 triage using live data sources such as GitHub or Discord. Scale lane count up or down based on machine compute and your own brain-space budget — not token cost.
- 4
Instantiate agent sessions, one per swim lane
Prefer cloning the repo N times over Git work trees at high lane counts — work trees under a heavy test harness can nuke your machine. Load the relevant dot-skills file into each session as context. Do not use plan mode or spec mode by default; have a direct conversation with the agent to align on the task, then let it run.
- 5
Monitor sessions for reasoning quality, not just output
Read the agent's explanations the way a manager reads a direct report's status update. Waffling, repetition, or explanations that loop without resolving are red flags. When you detect this pattern, nuke the session rather than pushing through. Redirect that work to another maintainer or return to it in several days with a fresh context.
- 6
Gate merges on the test harness, not on manual review of every diff
At high commit velocity, line-by-line review does not scale. The harness is the gate. AI-generated tests that over-fit the codebase are acceptable — they are canaries. Green harness equals merge candidate. Apply taste at the architectural and scope level, not the line level.
- 7
Apply the 'no' mechanism before merging feature PRs
Tokens are cheap; a bloated codebase is not recoverable cheaply. For every incoming feature PR, ask: does this belong in core or should it live in a plugin? Saying no to core inclusion while handing the contributor a plugin surface is the scalable answer. Vision maintenance — deciding what the codebase is not — is the factory manager's primary judgement call.
- 8
Feed session logs back into dot-skills and iterate
After a significant sprint or refactor, pass the agent session logs through your skill-improvement loop: read the logs, identify where the agent drifted or needed correction, and update the dot-skills file accordingly. Version and publish skills openly where possible so the broader contributor community benefits.
- 9
Run evaluation loops post-refactor
After large structural changes (e.g. a plugin migration), stand up synthetic evaluation environments — fake channel environments with both synthetic and real models — to verify that all providers and integrations still behave correctly. Evals are not optional at scale; they are the only way to confirm the factory's output is coherent.
// What does the Dark Factory method look like in practice?
A small open-source team of 8 part-time maintainers needs to ship a major architectural decomposition (e.g. breaking a monolith into a plugin system) while simultaneously keeping CI green and processing a backlog of 200+ open PRs.
Deduplicate the PR backlog first to find pressure points. Open swim lanes 1-2 to stabilise CI with minimal oversight ('commit when green'). Open swim lanes 3-4 for the decomposition refactor with active agent conversation and frequent check-ins. Open swim lane 5 pointing at the issue tracker to surface any P0s the refactor might be introducing. Trust the over-fitted test harness as the merge gate. Apply the plugin architecture as the 'no' mechanism for every feature PR that arrives during the refactor window.
A solo engineer is running 10 parallel agent sessions on a feature sprint and notices one session has been producing verbose, circular explanations for 20 minutes without committing anything meaningful.
Recognise this as the waffling signal — the agent is bullshitting. Do not invest more tokens trying to recover that context. Nuke the session. Either reassign the task to a fresh session with a tighter mandate, or park it and return in a few days. Redistribute brain-space to the lanes that are producing coherent commits.
// What mistakes should I avoid when using the Dark Factory method?
- Commit maxing without an opinionated process — Ralph looping (burning tokens for 8-9 hours hoping something ships) produces noise, not velocity.
- Adopting Git work trees at high lane counts without machine-level safeguards — they will nuke your local environment under a heavy test harness. Prefer multiple repo clones.
- Saying yes to every PR because tokens are cheap — this turns the codebase into a fire dump. Token cost is not the constraint; codebase coherence is.
- Reviewing every diff line-by-line at scale — it does not work. Trust the harness as your gate; apply taste at the architectural level.
- Ignoring the waffling signal — if an agent session is explaining itself incoherently, pushing through wastes tokens and corrupts context. Nuke and restart.
- Treating skills as throw-away prompts rather than versioned engineering artefacts — skills that are not iterated and maintained decay in usefulness.
- Letting incoming PR clustering attempts proliferate without a deduplication step — every maintainer will try to solve the backlog their own way, creating meta-noise on top of the original noise.
- Starting a large refactor without a test harness already in place — over-fitted AI-generated tests are your only safety net when 80%+ of the codebase changes in one sprint.
// What do the key terms in the Dark Factory method mean?
- Dark Factory
- A software production model where AI agents do the majority of the physical coding work in parallel, supervised by a small number of human factory managers who apply taste and architectural judgement rather than writing code directly.
- Swim Lanes
- Isolated, parallel agent sessions each scoped to a single category of work (CI, features, bugs, horizon-scanning). The factory manager decides how many lanes to run and how much active oversight each requires.
- Factory Manager
- The new role of the software engineer in the dark factory era — not a craftsman writing code, but a manager running a production line of agents, whose primary constraint is taste and brain-space, not typing speed.
- Commit Maxing
- An immature form of agent-driven development where the only goal is maximising commit volume, without an opinionated process or reward mechanism. Described as the 2025 phase to move beyond.
- Ralph Looping
- Giving an agent a task and letting it burn tokens for 8-9 hours with no structured intervention or feedback loop, hoping something useful emerges. A cautionary anti-pattern.
- Bot Looping
- A more opinionated alternative to Ralph looping — running agent loops with a defined reward mechanism and structured checkpoints, so the loop is goal-directed rather than open-ended.
- Token Maxing
- The practice of consuming as many tokens as possible to drive output volume. Characterised as the 2025 mode of working, now being superseded by token efficiency.
- Token Efficiency
- The 2026 posture — being deliberate about which tokens are spent, structuring agent-in-the-loop processes so that tokens drive meaningful, non-wasteful progress.
- Agent in the Loop
- A structured process where the human factory manager actively participates in agent sessions at defined checkpoints rather than leaving agents to run completely autonomously.
- In Harness We Trust
- The operating principle that the automated test harness — even over-fitted AI-generated tests — is the primary merge gate at high commit velocity. Human line-by-line review does not scale; the harness does.
- The Waffling Signal
- The behavioural pattern in an agent session where explanations become verbose, circular, and incoherent — analogous to a staff member who is bullshitting. The correct response is to nuke the session rather than invest more tokens.
- Dot-Skills
- Versioned skill files (analogous to dot-files) that encode reusable agent context, task-specific instructions, and methodology. Treated as first-class engineering artefacts, iterated using session logs, and deployed into new agent sessions as persistent context.
- The Great Refactor
- A pattern of large-scale structural decomposition (e.g. migrating a monolith to a plugin architecture) executed at high agent velocity, validated by the test harness rather than manual review. Used as a 'no' mechanism to manage contributor scope.
- Fire Dump
- What a codebase becomes when every incoming feature PR is merged without architectural judgement. The failure mode of saying yes to everything because tokens are cheap.
- Vibe Maintainer
- A maintainer who operates primarily through intuition, taste, and high-level oversight of agent output rather than direct code authorship. Coined externally (Steve Yegge) but adopted as a recognisable role in the dark factory model.
- Brain-Space
- The human factory manager's cognitive capacity to monitor and intervene across active swim lanes. Identified as the true scaling constraint — not tokens, not compute, but the manager's ability to hold context across sessions.
// FREQUENTLY ASKED QUESTIONS
What is the Koc Dark Factory Agent Orchestration Method?
It is Vincent Koc's structured framework for running multiple parallel AI coding agents in isolated swim lanes — CI, features, bugs, and horizon-scanning — supervised by a human factory manager who applies architectural taste rather than writing every line of code. The method replaces brute-force token maxing with opinionated loops, test-harness gating, and dot-skills iteration to ship code at industrial velocity without codebase degradation.
What is a dark factory in software engineering?
A dark factory is a software production model where AI agents perform the majority of coding work in parallel, supervised by a small number of human engineers acting as factory managers. The term borrows from manufacturing, where lights-out factories run with minimal human presence. In the Koc method, the human's job shifts from writing code to running the production line: assigning swim lanes, gating merges on the test harness, and applying architectural judgement.
How do I set up swim lanes for parallel AI coding agents?
Partition all active work into isolated parallel sessions, each with a single scoped mandate. A typical split: lanes 1-2 handle CI and test refactors with minimal babysitting, lanes 3-4 tackle features or integrations with active agent conversation, and lane 5+ monitors P0/P1 issues via live data sources. Clone the repo separately for each lane rather than using Git work trees, which can crash your machine under heavy test suites. Load relevant dot-skills files into each session.
How do I detect when an AI coding agent is going off track?
Look for the waffling signal — verbose, circular, or incoherent explanations from the agent that don't resolve toward a clear commit. This is the agent equivalent of a team member who is bullshitting. When you detect this pattern, nuke the session immediately rather than investing more tokens. Reassign the task to a fresh session with a tighter mandate, or park it and return with fresh context in a few days.
How does the Dark Factory method compare to just running Codex or Claude in a loop?
Running an agent in an unstructured loop — called Ralph looping — burns tokens for hours hoping something useful ships. The Dark Factory method replaces this with opinionated swim lanes, each with a scoped mandate, defined checkpoints, and a test-harness gate. It adds the factory manager's taste as a quality filter, uses dot-skills for persistent context, and applies a 'no' mechanism via plugin architecture to prevent codebase bloat. It's structured orchestration versus brute-force hoping.
When should I use the Dark Factory method instead of regular AI-assisted coding?
Use it when you are managing a high-velocity project with multiple parallel agent sessions and need structure to prevent codebase degradation. Trigger it whenever you are tempted to commit-max or Ralph loop without a process. It is especially valuable during large refactors, monolith-to-plugin decompositions, or when processing a backlog of 100+ open PRs with a small team. If you are running one agent on one task, standard AI-assisted coding suffices.
What are dot-skills files and how do I use them with AI agents?
Dot-skills are versioned skill files — similar to dot-files — that encode reusable agent context, task-specific instructions, and methodology. Treat them as first-class engineering artefacts: load them into each agent session as persistent context, iterate them by feeding agent session logs through your skill-improvement loop, and version them alongside your codebase. A dot-skills file that improves through use provides compound interest across every new agent session.
What results can I expect from adopting the Dark Factory method?
You can expect significantly higher shipping velocity with maintained codebase coherence. Teams report processing large PR backlogs, executing major architectural decompositions, and keeping CI green simultaneously — all with part-time maintainers. The key outcome is sustainable speed: instead of a burst of commits followed by a fire-dump cleanup, you get a steady production line with architectural integrity. Your bottleneck shifts from coding speed to taste and brain-space.
How do I prevent my codebase from becoming a fire dump when using AI agents?
Apply the 'no' mechanism before every merge. For each incoming feature PR, ask whether it belongs in core or should live in a plugin. Gate merges on the test harness, not on manual line-by-line review. Never say yes to every PR just because tokens are cheap — codebase coherence is the real cost. Use the plugin architecture as a scope boundary to hand contributors an isolated surface rather than rejecting them outright.
What inputs do I need before starting the Dark Factory method?
You need three required inputs: your current codebase state (stability level, active change areas, known issues), your work backlog (features, bugs, refactors, CI failures grouped by type), and your agent environment setup (which agents you're running, how many parallel sessions your machine supports, and whether you use repo clones or work trees). Optional inputs include team context and any existing dot-skills files you've authored for the project.