Koc Dark Factory Agent Shipping Method

Last updated: 6 June 2026

Apply the OpenClaw swim-lane factory methodology to ship code at extreme velocity using autonomous agents without collapsing into chaos or token waste.

// TL;DR

The Koc Dark Factory Agent Shipping Method is a structured framework for orchestrating multiple autonomous AI coding agents in parallel using swim lanes, .skills files, and a factory-manager mindset. Use it whenever you're running 5–20+ concurrent agent sessions across a software project and need to maintain quality, avoid token waste, and ship at extreme velocity without your codebase collapsing into chaos. It replaces the craftsman model of writing code by hand with an industrial model where the engineer triages, monitors, gates merges, and cultivates taste as the primary bottleneck.

Framework

// When should I use the Koc Dark Factory Agent Shipping Method?

Use this skill whenever you are managing multiple parallel AI coding agents across a software project and need a structured process for orchestrating them, triaging work, and maintaining quality at scale. Especially relevant when commit velocity is high, PRs are accumulating, or a large refactor is underway.

// What do I need before I can run a Dark Factory agent workflow?

Active codebase or projectrequired
The repository or software project the agents will operate on.
Work backlogrequired
A list of open issues, PRs, bugs, or feature requests to be triaged into swim lanes.
Number of available agent sessionsrequired
How many concurrent coding agent sessions (e.g. Codex, Claude) you can run simultaneously.
Dot-skills files
Your personal .skills files defining agent behaviours, e.g. for writing technical docs, running tests, or committing work. Can start from open-source examples.
Test harness
Existing unit or integration tests the agents can run to self-validate. Even over-fitted tests are valuable as guardrails.

// What are the core principles behind the Dark Factory method?

Engineer as Factory Manager

Stop thinking of yourself as a craftsman writing code and start thinking of yourself as a factory manager overseeing a production line. The bottleneck is no longer your hands — it is your taste and your ability to direct, monitor, and intervene across many simultaneous work streams.

Swim Lanes

Divide all active work into parallel, isolated lanes — typically CI, feature development, bug fixing, and exploration of new P0/P1 issues. Each swim lane runs semi-autonomously; your job is to decide which lanes need babysitting and which can self-commit.

In Harness We Trust

The test harness is your factory floor safety net. Even over-fitted unit tests that AI code generates are valuable: if they go green after a massive refactor, you are at least directionally correct. Never rip out tests during a large change.

Token Efficiency over Token Maxing

2025 was about token maxing — running loops and burning compute hoping something emerges. The next phase is about not wasting them. Be opinionated about what you run, why you run it, and when you nuke a session that is waffling.

Dot Skills (Agent Development Environment)

Maintain a personal library of reusable .skills files — versioned, open-source where possible — that encode how your agents should behave for recurring tasks (e.g. writing technical documentation, running evaluations). Treat skills as first-class engineering artefacts: run them, read the logs, improve them, redeploy.

Taste as the Bottleneck

In a world where tokens are cheap, the limiting factor is not capacity — it is the judgment to say no to bloat, to feel when an agent is waffling, and to know when a swim lane needs to be nuked. Cultivate this intuition deliberately through high-volume agent interaction.

Plugin Architecture as a Scaling Principle

When a codebase risks becoming a fire dump because everyone wants their feature merged, cut it into pieces. A plugin or modular architecture lets external contributors own their slice without polluting the core, and makes large refactors survivable.

Feeling the Reasoning Tokens

Over time you develop an intuition for when an agent is genuinely working versus waffling. If an agent's self-explanation sounds off — circular, vague, or over-confident about the wrong thing — treat it like a staff member who is downright bullshitting. Nuke the session and reassign the work.

// How do you apply the Dark Factory method step by step?

1
Triage your backlog into swim lane categories
Classify all open work into at minimum: (a) CI/tests, (b) feature development, (c) bug fixes, (d) new P0/P1 issues. Use semantic clustering or pressure-signal heuristics — if many contributors are independently filing the same issue, it is a high-priority lane. Avoid treating a flood of PRs as a road map; deduplicate first.
2
Instantiate agent sessions per swim lane
Assign one or more agent sessions to each swim lane. Prefer simplicity: clone the repo N times and point N Codex/Claude sessions at separate clones rather than wrestling with Git work trees. Aim for 5–20 active swim lanes depending on your brain-space budget. Sub-agents can expand this further but require more monitoring.
3
Set autonomy level per lane
For low-risk lanes (e.g. refactoring tests, formatting), give the agent a standing instruction: 'Take your time, make sure tests pass, just commit.' Minimal babysitting. For higher-risk lanes (new features, architectural changes), maintain an active conversation: let the agent investigate, have it report back, then direct next steps.
4
Load your .skills files into each session
Deploy the relevant .skills file for the task type (e.g. technical documentation skill, evaluation skill). These encode your opinionated defaults so you are not re-explaining context each time. If no skill exists yet for this task type, note it for creation after the session.
5
Monitor sessions for waffling signals
Actively read agent self-explanations as you would read a team member's status update. Warning signs: circular reasoning, vague assertions, over-explaining simple steps, or confident claims about the wrong thing. If a session feels off, nuke it. Do not try to rescue a waffling agent — reassign the lane or defer it.
6
Run the test harness as your ground truth
After any significant change — especially a large refactor touching a high percentage of the core codebase — the test harness is your single source of truth. Even over-fitted tests provide a directional signal. If tests go green, you are at least close. Do not merge until the harness passes.
7
Gate merges with taste, not throughput
Resist the temptation to merge everything because tokens are cheap. Every merge that adds bloat degrades the codebase. Apply the 'who do I say no to?' filter: does this feature belong in the core, or should it live in a plugin? Modular/plugin architecture is the structural answer to this problem at scale.
8
Run evaluation loops on critical integration points
For any system with multiple providers or channels, build a synthetic evaluation environment (e.g. a fake Slack with both synthetic and real models) so you can run evaluation loops after each release cycle to confirm that all integrations still work. Treat evals as a swim lane, not an afterthought.
9
Retrospect on .skills files after each heavy session
After a major shipping cycle, go through your agent logs and identify where instructions were ambiguous or produced poor output. Update your .skills files accordingly. Redeploy improved skills to your environment (e.g. vercel.skills.sh or equivalent). Treat skill improvement as a first-class engineering task.

// What does the Dark Factory method look like in real projects?

A small team of 5 engineers with day jobs needs to ship a major plugin architecture refactor across a large open-source monorepo while simultaneously handling incoming community PRs and bug reports.

Divide work into swim lanes: (1) CI stabilisation, (2) the core refactor — splitting the monorepo into plugin modules, (3) community PR triage using semantic clustering to identify high-pressure issues, (4) bug fixes on the existing stable surface. Assign agent sessions to each lane. Give CI and triage lanes standing commit instructions. Babysit the refactor lane actively — check reasoning explanations frequently for waffling. Keep over-fitted unit tests in place; use them as the go/no-go signal for the refactor. Gate the final merge on harness green, not on time pressure.

A solo developer is building a SaaS product and wants to run multiple agent sessions simultaneously — one writing features, one writing tests, one drafting documentation — without losing coherence.

Set up three swim lanes pointing at separate repo clones. Load a .skills file for technical documentation into the docs lane so the agent uses consistent structure without re-prompting. Let the test lane run autonomously with a standing 'commit when green' instruction. Keep the feature lane conversational — investigate, report back, then direct. Monitor all three for waffling. After each session, update .skills files with lessons learned. Total brain-space cost: low, because only one lane (features) requires active dialogue.

// What mistakes should I avoid when running a Dark Factory?

Commit maxing without an opinionated process — running long agent loops and hoping something emerges wastes tokens and produces unreviewable diffs. Move from token maxing to token efficiency.
Using Git work trees with heavy test harnesses — this can nuke your local machine. Prefer cloning the repo multiple times and pointing separate agent sessions at each clone.
Trying to rescue a waffling agent session — if an agent is explaining itself in circles, nuke the session. Do not invest more tokens trying to fix it in place.
Merging everything because tokens are cheap — this turns the codebase into a fire dump. The discipline is saying no, not yes. Use plugin/modular architecture to offload features that do not belong in core.
Ignoring over-fitted unit tests — even tests that are over-fitted to old code structure are valuable guardrails during large refactors. Do not delete them; use them as directional signals.
Treating the PR flood as a road map — a mass of community PRs is noise until deduplicated. Use pressure signals (multiple independent contributors filing the same issue) as the real prioritisation mechanism.
Skipping .skills file maintenance — failing to update and version your .skills files after sessions means you re-explain context endlessly and lose compounding efficiency gains.
Expecting soft skills to be optional — managing 10+ agents requires the same interpersonal intuition as managing 10+ staff. If you cannot tell when a person is bullshitting you, you will not be able to tell when an agent is either.

// What do all the Dark Factory terms mean?

Dark Factory: A software development environment where autonomous agents do the bulk of the production work, analogous to a lights-out manufacturing facility. Engineers are present as managers and taste-filters, not as hands-on producers.
Swim Lanes: Parallel, isolated agent work streams, each focused on a distinct category of work (e.g. CI, features, bugs, new P0/P1s). Each lane runs at its own autonomy level and pace.
Factory Manager: The role the engineer now occupies: not writing code directly, but orchestrating, directing, monitoring, and quality-gating a production line of agents. Analogous to the shift from cottage craftsman to industrial mill manager.
Dot Skills (.skills): A personal library of versioned instruction files — analogous to dotfiles — that encode how agents should behave for recurring task types. Deployed into agent sessions to provide opinionated defaults without re-prompting.
Agent Development Environment: The full personal infrastructure a developer uses to run, monitor, and improve their agent factory: including .skills files, Git work trees or repo clones, test harnesses, evaluation loops, and session management tooling.
Token Maxing: The 2025-era practice of running agent loops for extended periods and burning large volumes of tokens, prioritising sheer output volume over efficiency or precision.
Token Efficiency: The next-phase discipline of being opinionated about which agent sessions to run, when to nuke them, and how to avoid wasting compute — prioritising quality of output over raw volume.
Feeling the Reasoning Tokens: The intuition, developed through high-volume agent interaction, to detect when an agent's reasoning is genuinely productive versus waffling. Manifests as a sense that an agent's self-explanation 'sounds off' before the output fails.
Waffling: The agent equivalent of a staff member bullshitting — circular, vague, or over-confident self-explanations that signal the agent does not actually know what it is doing. A trigger to nuke the session.
The Great Refactor: A reference to a large-scale, high-risk architectural change (e.g. splitting a monorepo into a plugin architecture) undertaken at speed using agents, survived only through a test harness and team coordination.
Vibe Maintainer: A developer who ships at extreme velocity by directing agents rather than writing code by hand — maintaining the vision, taste, and direction of a project without being the primary code author.
In Harness We Trust: The operating principle that the test harness — not human review of every line — is the ground truth for whether a large agent-driven change is safe to ship.
Plugin Architecture: A structural answer to codebase bloat at scale: splitting the core into isolated modules that external contributors or providers can own independently, preventing the core from becoming a fire dump.
P0 / P1: Priority classifications for issues: P0 = critical/must fix immediately, P1 = high priority. Used to identify which swim lanes demand active attention versus autonomous operation.

// FREQUENTLY ASKED QUESTIONS

What is the Koc Dark Factory Agent Shipping Method?

It is a structured methodology for shipping software at extreme velocity by orchestrating multiple autonomous AI coding agents in parallel swim lanes. Developed by Vincent Koc from the OpenClaw project, it treats the engineer as a factory manager rather than a hands-on coder — triaging work, monitoring agent sessions for waffling, gating merges with taste, and using .skills files and test harnesses as quality guardrails.

What is a dark factory in software development?

A dark factory is a software development environment where autonomous AI agents do the bulk of production work, analogous to a lights-out manufacturing facility. Engineers are present as managers and taste-filters, not as hands-on code producers. The term comes from manufacturing where factories run without human operators on the floor, and in this context it means agents write, test, and commit code while humans orchestrate and quality-gate.

How do swim lanes work in the Dark Factory method?

Swim lanes are parallel, isolated agent work streams each focused on a distinct category — typically CI/tests, feature development, bug fixes, and new P0/P1 issues. Each lane runs at its own autonomy level: low-risk lanes like formatting get standing commit instructions, while high-risk lanes like architectural changes require active conversation. You assign one or more agent sessions per lane and monitor them independently.

How do I set up .skills files for AI coding agents?

Create versioned instruction files — similar to dotfiles — that encode how agents should behave for recurring task types like writing documentation, running evaluations, or committing work. Load the relevant .skills file into each agent session at startup so it has opinionated defaults without re-prompting. After each heavy session, retrospect on logs and update your .skills files to capture lessons learned. Treat them as first-class engineering artefacts.

How does the Dark Factory method compare to just running Codex or Claude on a single task?

Running a single agent session is like having one worker on a factory floor. The Dark Factory method scales this to 5–20+ simultaneous sessions organized into swim lanes with varying autonomy levels. It adds structured triage, .skills files for consistency, waffling detection, test-harness gating, and merge discipline. A single-task approach lacks the orchestration layer needed to ship at extreme velocity across an entire project without chaos.

When should I use the Dark Factory method instead of writing code myself?

Use it whenever you're managing multiple parallel AI coding agents across a project and need structured orchestration. It's especially relevant when commit velocity is high, PRs are accumulating, a large refactor is underway, or you're a small team needing to ship faster than your headcount allows. If you're only making a single small change, the overhead of swim lanes isn't justified.

What results can I expect from using the Dark Factory agent shipping method?

Teams report shipping large refactors, plugin architectures, and feature batches at velocities impossible with manual coding — OpenClaw shipped faster than reviewers could read the diffs. Expect higher throughput with fewer wasted tokens, cleaner merges due to taste-gating, and compounding efficiency as your .skills files improve over time. The test harness catches regressions that would otherwise slip through at speed.

How do I tell when an AI agent is waffling?

Waffling manifests as circular reasoning, vague assertions, over-explaining simple steps, or confident claims about the wrong thing in an agent's self-explanation. Read agent outputs like status updates from a team member. If the reasoning sounds off — like someone bullshitting in a standup — nuke the session immediately. Do not invest more tokens trying to rescue it. Reassign the work to a fresh session instead.

What does token efficiency mean in the Dark Factory method?

Token efficiency is the discipline of being opinionated about which agent sessions to run, when to nuke them, and how to avoid wasting compute. It replaces the 2025-era practice of 'token maxing' — running long agent loops hoping something emerges. In practice, it means cutting waffling sessions early, loading .skills files to avoid re-explaining context, and only running lanes that have a clear purpose and exit criteria.

Do I need a test suite to use the Dark Factory method?

A test harness is strongly recommended but technically optional to start. Even over-fitted unit tests that agents generate are valuable as guardrails during large refactors — if they go green, you're at least directionally correct. The principle 'In Harness We Trust' makes the test suite your ground truth for whether agent-driven changes are safe to ship. Without one, you lose your primary automated quality gate.

// GET THIS SKILL — FREE