Frequently Asked Questions About Koc Dark Factory Agent Orchestration Method
21 answers covering everything from basics to advanced usage.
// Basics
What is the difference between commit maxing and token efficiency?
Commit maxing is an immature approach focused on maximising commit volume without an opinionated process — just burning tokens hoping something ships. Token efficiency is the mature posture: being deliberate about which tokens are spent, structuring agent-in-the-loop processes with defined checkpoints and reward mechanisms so every token drives meaningful progress. The shift is from quantity of output to quality of orchestration.
What is Ralph looping and why is it bad?
Ralph looping is giving an AI coding agent a task and letting it burn tokens for 8-9 hours with no structured intervention or feedback loop, hoping something useful emerges. It's bad because it wastes tokens, produces incoherent output, and corrupts the agent's context window over long runs. The Dark Factory method replaces it with bot looping — opinionated loops with defined reward mechanisms and structured checkpoints that keep the agent goal-directed.
What is brain-space and why is it the real bottleneck?
Brain-space is the human factory manager's cognitive capacity to monitor and intervene across active swim lanes. It is the true scaling constraint — not tokens, not compute, not typing speed. You can spin up 20 agent sessions, but if you can't hold context across all of them to detect waffling, apply architectural judgement, and gate merges meaningfully, extra lanes add noise rather than velocity. Scale lanes to your brain-space budget first.
How many parallel agent sessions can I realistically manage?
This depends on your brain-space, the stability of each lane, and your machine's compute capacity. Experienced practitioners report running 5-10 lanes, with 2-3 being unsupervised CI lanes and the rest requiring varying levels of active conversation. The practical limit is not tokens or compute — it's your ability to detect waffling, apply architectural judgement, and hold context across sessions simultaneously. Start low, scale up as your factory management intuition develops.
Can the Dark Factory method work with any AI coding agent or only specific ones?
The method is agent-agnostic. It works with Codex, Claude, Gemini, or any coding agent that can operate in a session-based mode. The orchestration layer — swim lanes, test-harness gating, waffling detection, dot-skills — sits above any particular agent. What matters is that the agent can accept a scoped mandate, produce commits, and explain its reasoning so you can monitor for quality signals. Choose the agent that best fits each swim lane's task type.
// How To
How do I decide how many swim lanes to run at once?
Scale lane count based on two factors: machine compute and your brain-space budget. A stable codebase supports more unsupervised lanes (CI and test work) that need minimal babysitting. Novel feature work requires active conversation, consuming more brain-space per lane. Start with 3-5 lanes and increase only when you can maintain reasoning-quality oversight across all active sessions. If you notice yourself rubber-stamping agent output, you have too many lanes.
How do I create and maintain dot-skills files?
Start by documenting the methodology, constraints, and context relevant to a specific task type in a versioned file alongside your dot-files. Load this file into each agent session as persistent context. After each significant sprint, feed agent session logs through the skill file: identify where the agent drifted or needed correction, then update the file accordingly. Co-create skills with other engineers, publish them openly, and treat them as compound-interest artefacts that improve through use.
How do I deduplicate a large PR backlog before opening agent sessions?
Cluster incoming PRs and issues semantically — group them by the area of the codebase they touch and the problem they address. If multiple contributors independently flag the same issue, that convergence signal means it's big enough to prioritise as a dedicated swim lane. Remove duplicate efforts, consolidate overlapping feature requests, and identify which PRs are variations of the same underlying change. This prevents meta-noise where every maintainer tries to solve the backlog their own way.
Should I use Git work trees or repo clones for parallel agent sessions?
Prefer cloning the repo N times over Git work trees at high lane counts. Work trees sharing underlying Git state under a heavy test harness can consume extreme memory and I/O, potentially crashing your local environment. Separate repo clones provide full isolation between swim lanes. The disk space trade-off is worth it — a nuked local environment stops all lanes, while a failed clone only stops one.
// Troubleshooting
What do I do when an agent session starts waffling?
Nuke the session immediately. Do not invest more tokens trying to recover a derailed context — circular reasoning and verbose non-answers indicate the agent has lost the thread and additional prompting will compound the confusion. Reassign the task to a fresh session with a tighter, more explicit mandate. If the task itself seems to consistently cause agent confusion, park it entirely and return in several days, or redirect it to a human maintainer.
What if my test harness is failing after a large agent-driven refactor?
This is expected and is exactly why the harness exists. AI-generated tests that over-fit the codebase act as canaries — failures after a refactor tell you where the structural changes broke assumptions. Work through failures systematically in a dedicated CI swim lane with minimal babysitting ('take your time, commit when green'). Never rip out the harness before the refactor; it is the only truth signal you have. Add new tests to cover gaps the refactor exposed.
My codebase is already a fire dump — can the Dark Factory method help me recover?
Yes, but start with stabilisation, not parallelisation. Dedicate your first swim lanes entirely to CI health and test coverage before opening feature lanes. Build the test harness first — you need a truth signal before you can trust high-velocity changes. Then apply the plugin architecture principle to start decomposing the monolith into isolated surfaces. The 'no' mechanism is even more important for a degraded codebase: reject core additions until architectural coherence is restored.
What happens if I skip the deduplication step before opening agent sessions?
You get meta-noise on top of the original noise. Without deduplication, multiple swim lanes may tackle overlapping problems, producing conflicting PRs that are individually correct but collectively incoherent. Every maintainer — human or agent — will attempt to solve the backlog their own way, creating merge conflicts and architectural drift. The deduplication step clusters the real signal so each swim lane addresses a distinct, non-overlapping concern.
// Comparisons
How does the Dark Factory method compare to using Cursor or Copilot for AI-assisted development?
Cursor and Copilot are tools for individual AI-assisted coding — one engineer, one agent, inline suggestions or chat. The Dark Factory method is an orchestration framework for running multiple parallel agent sessions simultaneously across a production codebase. They operate at different levels: you might use Cursor as the agent inside a swim lane, but the Dark Factory method provides the swim-lane structure, merge-gating process, waffling detection, and architectural 'no' mechanism that Cursor alone doesn't offer.
How is the Dark Factory method different from a standard CI/CD pipeline?
A CI/CD pipeline automates build, test, and deploy steps for human-authored code. The Dark Factory method orchestrates the code authoring itself through parallel AI agents, using the CI pipeline as one component — specifically as the test-harness merge gate. The swim-lane model, factory-manager role, waffling detection, brain-space management, and dot-skills iteration are all concerns that sit above and around CI/CD, governing how code gets created, not just how it gets shipped.
What is the difference between bot looping and Ralph looping?
Ralph looping is open-ended: give the agent a task and let it burn tokens indefinitely with no checkpoints. Bot looping is structured: the agent runs in a loop with a defined reward mechanism, structured checkpoints, and clear exit conditions. Bot looping is goal-directed and produces measurable progress, while Ralph looping produces noise and hope. The Dark Factory method uses bot looping as one component within the broader swim-lane orchestration framework.
// Advanced
Can I use the Dark Factory method as a solo developer?
Yes — the method was partly developed by solo and small-team maintainers. As a solo developer, brain-space is your hardest constraint. Start with 3-5 swim lanes maximum: one or two unsupervised CI lanes, one or two active feature lanes, and one horizon-scanning lane. The plugin architecture 'no' mechanism is especially important for solo devs because you cannot afford to maintain sprawling feature additions. Be ruthless about nuking waffling sessions to preserve your limited cognitive budget.
How do I run evaluation loops after a large Dark Factory refactor?
Stand up synthetic evaluation environments — for example, fake channel environments with both synthetic and real models — to verify all providers and integrations behave correctly after structural changes. Evals are not optional at scale; they are the only way to confirm the factory's output is coherent beyond what the test harness covers. Automate these evaluations where possible and run them as a dedicated swim lane after any major architectural change like a plugin migration.
How does the plugin architecture work as a scope boundary?
When contributor pressure on a monolithic codebase becomes unmanageable, decompose the codebase so that external providers own their own isolated slice through a plugin model. Instead of rejecting contributors, hand them a plugin surface they control. This is a 'no' mechanism that scales — it keeps the core codebase coherent while still accepting contributions. Every incoming feature PR should be evaluated: does this belong in core, or should it be a plugin?
Should I use plan mode or spec mode when starting an agent session?
No — the Dark Factory method recommends against defaulting to plan mode or spec mode. Instead, have a direct conversation with the agent to align on the task, then let it run. Plan and spec modes add overhead that is unnecessary when you have already defined the swim-lane mandate clearly. The mandate itself is the plan. Save structured planning for cases where the agent consistently drifts despite clear conversational framing.
How do I know if over-fitted AI-generated tests are actually useful?
Over-fitted tests are useful precisely because they are canaries: they capture the exact current behaviour of the codebase. When a large refactor breaks them, each failure tells you something changed. If they all go green after a massive refactor, you have strong evidence the refactor preserved behaviour. They are not designed to be perfect software engineering tests — they are a safety net for high-velocity agent-driven changes where manual review of every diff is impossible.