Frequently Asked Questions About Koc Dark Factory Agent Shipping Method

21 answers covering everything from basics to advanced usage.

// Basics

Can I use the Dark Factory method with just one AI agent?

Yes, but you lose the core advantage of parallelism. With a single agent, you can still apply .skills files, test-harness gating, and waffling detection. However, the swim-lane structure and factory-manager mindset only deliver their full value when orchestrating 5+ concurrent sessions. Start with one agent to build intuition, then scale up as you learn to detect waffling and manage autonomy levels.

What are .skills files and how are they different from system prompts?

.Skills files are versioned, reusable instruction files — analogous to dotfiles — that encode opinionated defaults for recurring agent task types. Unlike ad-hoc system prompts, they are treated as first-class engineering artefacts: version-controlled, open-sourced where possible, retrospected on after sessions, and continuously improved. They compound in value over time, whereas system prompts are typically one-off and disposable.

Can I use the Dark Factory method for non-coding tasks like documentation or data pipelines?

Yes. The swim-lane structure and .skills files are task-agnostic. You can run documentation swim lanes with a technical writing .skills file, data pipeline lanes with a data engineering .skills file, and evaluation lanes for testing integrations. The key principles — factory-manager mindset, waffling detection, autonomy levels, and quality gating — apply to any task where autonomous agents work in parallel.

What's the difference between token maxing and token efficiency?

Token maxing is the 2025-era practice of running long agent loops and burning large volumes of tokens, hoping quality emerges from quantity. Token efficiency is the next-phase discipline: being opinionated about which sessions to run, nuking waffling agents early, using .skills files to avoid redundant context, and measuring output quality over raw volume. The Dark Factory method is built on token efficiency as a core principle.

Is the Dark Factory method only for open-source projects?

No. It originated in the OpenClaw open-source project but applies to any software project with enough concurrent work to justify parallel swim lanes. Enterprise teams, SaaS products, solo developers, and consultancies can all use it. Open-source projects benefit from the community PR triage workflow, but the core framework — swim lanes, .skills files, test-harness gating, waffling detection — is universal.

Can the Dark Factory method work for a solo developer?

Absolutely. A solo developer can run 3–5 swim lanes simultaneously: one for features (active conversation), one for tests (autonomous), one for documentation (loaded with a writing .skills file). The factory-manager mindset is even more valuable for solos because it multiplies output without adding headcount. The key constraint is brain-space — only one lane should require active dialogue at a time.

// How To

How many swim lanes should I run at once?

Aim for 5–20 active swim lanes depending on your brain-space budget. Each lane requires some monitoring, with high-risk lanes needing active conversation and low-risk lanes running autonomously. Most practitioners find 8–12 is the sweet spot — enough parallelism for extreme velocity without overwhelming your ability to detect waffling and gate merges. Sub-agents can expand capacity but require more monitoring overhead.

How do I prioritize which swim lanes to babysit versus let run autonomously?

Set autonomy based on risk. Low-risk lanes — test refactoring, formatting, documentation — get standing commit instructions with minimal monitoring. High-risk lanes — new features, architectural changes, core refactors — require active conversation where the agent investigates and reports back before you direct next steps. P0 issues get immediate active attention. Use pressure signals (multiple people filing the same issue) to identify true priorities.

How do I handle a flood of community PRs in the Dark Factory method?

Do not treat a PR flood as a road map. Deduplicate first using semantic clustering or pressure-signal heuristics — if multiple independent contributors file the same issue, that's a genuine high-priority item. Assign a dedicated triage swim lane to cluster and prioritize incoming PRs. Only then assign work to feature or bug-fix lanes. The discipline is filtering signal from noise before committing agent resources.

What tools do I need to run a Dark Factory setup?

At minimum: a Git repository, a work backlog, and access to multiple concurrent AI coding agent sessions (e.g., OpenAI Codex, Claude, or similar). Recommended additions include a test harness, .skills files, and evaluation tooling. The method is tool-agnostic — it's an orchestration framework, not a specific product. You can run it with any combination of agents that support autonomous coding and session management.

// Troubleshooting

Should I use Git worktrees or separate repo clones for swim lanes?

Prefer cloning the repo multiple times over Git worktrees. Worktrees with heavy test harnesses can nuke your local machine because they share the same Git directory and running parallel test suites creates resource contention. Separate clones are simpler, more isolated, and avoid file-locking issues. Point each agent session at its own clone for clean separation between swim lanes.

What happens when two swim lanes create conflicting changes?

This is managed through the merge-gating step. Since each swim lane operates on a separate repo clone, conflicts surface at merge time. The factory manager's job is to sequence merges by priority — typically CI and bug-fix lanes merge first, then features. If conflicts arise, you can reassign the lower-priority lane to rebase. Plugin architecture also reduces conflict surface by isolating modules.

What if I don't have an existing test suite to use as a harness?

Start by having your agents write tests as a dedicated swim lane. Even over-fitted tests generated by AI are valuable guardrails — the principle 'In Harness We Trust' explicitly values directional signal over perfect coverage. Assign an autonomous test-writing lane with standing commit instructions. Build the harness incrementally. Without any tests, you lose your primary automated quality gate for large refactors.

How do I know when to nuke an agent session versus give it more direction?

Nuke when the agent's self-explanation shows circular reasoning, vague assertions, or confident claims about the wrong thing. These are signs the agent is fundamentally off-track, and more tokens will not fix it. Give more direction when the agent is making progress but needs course correction — e.g., it solved the right problem the wrong way. The distinction is between waffling (nuke) and needing guidance (redirect). Err on the side of nuking.

// Comparisons

What's the difference between a dark factory and just using AI code assistants like Copilot?

Copilot-style assistants are inline helpers — they augment your typing. A dark factory replaces the typing entirely. You're not co-authoring code; you're managing a production line of autonomous agents working in parallel swim lanes. The mindset shifts from craftsman to factory manager. Copilot assists one task at a time; the Dark Factory orchestrates 5–20+ tasks simultaneously with structured triage, quality gates, and token efficiency.

How is the Dark Factory method different from just running multiple Codex sessions?

Running multiple Codex sessions without structure is token maxing — burning compute hoping something emerges. The Dark Factory adds an orchestration layer: swim-lane categorization, autonomy levels per lane, .skills files for consistency, waffling detection, test-harness gating, and merge discipline. It's the difference between a factory with a production manager and a warehouse full of unsupervised robots.

// Advanced

How do I build the intuition to detect when an agent is waffling?

You develop it through high-volume agent interaction — there is no shortcut. Read agent self-explanations as you would read status updates from team members. Warning signs include circular reasoning, vague assertions, over-explaining simple steps, and confident claims about the wrong thing. Vincent Koc compares it to detecting when a staff member is bullshitting. The more sessions you run, the faster this intuition develops.

How does plugin architecture help with the Dark Factory method?

Plugin architecture is the structural answer to codebase bloat at scale. When many agents and contributors want to merge features, the core risks becoming a fire dump. By splitting the system into isolated modules, external contributors own their slice without polluting the core. This makes large refactors survivable, reduces merge conflicts between swim lanes, and gives the factory manager a clean 'no, make it a plugin' rejection path.

How do I retrospect on .skills files after a session?

After a major shipping cycle, review your agent logs and identify where instructions were ambiguous or produced poor output. Note which .skills files led to waffling, which needed more context, and which worked well. Update the files with clarified instructions, better defaults, or new guardrails. Redeploy the improved versions to your agent environment. Treat this as a first-class engineering task, not an afterthought — it's how you get compounding efficiency gains.

What soft skills does the Dark Factory method require?

Managing 10+ agents requires the same interpersonal intuition as managing 10+ staff members. You need the ability to detect bullshitting (waffling detection), prioritize under pressure (swim-lane triage), say no to feature bloat (taste as bottleneck), and delegate appropriately (autonomy levels). Vincent Koc explicitly states that if you cannot tell when a person is bullshitting you, you will not be able to tell when an agent is either.

How does the Dark Factory method handle evaluation and QA?

Evaluations are treated as a dedicated swim lane, not an afterthought. For systems with multiple providers or integration points, build a synthetic evaluation environment — e.g., a fake Slack with both synthetic and real models — and run evaluation loops after each release cycle. This confirms all integrations still work. Combined with the test harness as ground truth, this creates a two-layer quality system: unit-level harness plus integration-level evals.