How Do Startup Engineering Leads Run a Dark Factory?
For Engineering team leads at startups · Based on Koc Dark Factory Agent Shipping Method
// TL;DR
Startup engineering leads managing 3–10 engineers can use the Dark Factory method to multiply their team's throughput by adding AI agent swim lanes alongside human work streams. The lead becomes the factory manager — triaging the backlog into swim lanes, assigning autonomy levels, loading .skills files for consistency, and gating merges with taste. This is especially valuable during crunch periods, large refactors, or when the team needs to ship faster than headcount allows. The method's plugin architecture principle prevents the codebase from becoming a fire dump as velocity increases.
Why should an engineering lead think like a factory manager?
Startup engineering leads already manage people. The Dark Factory method extends that management skill to AI agents. Your team of 5 engineers can effectively operate like 20+ by running parallel agent swim lanes alongside human work. But without structure, this creates chaos — unreviewable diffs, conflicting merges, and a codebase that degrades under volume.
The factory-manager mindset solves this. You don't write code. You triage, orchestrate, monitor for quality, and gate merges. The bottleneck shifts from your team's hands to your team's taste and judgment.
How do you integrate agent swim lanes with a human engineering team?
Assign swim lanes to both humans and agents based on risk and complexity:
- Humans handle: Architectural decisions, customer-facing feature design, complex debugging, and code review of high-risk agent output.
- Agents handle: Test writing, documentation, routine bug fixes, formatting, CI stabilization, and first-pass implementations of well-specified features.
Each agent swim lane runs on a separate repo clone with its own .skills file loaded. Human engineers can also use .skills files to maintain consistency when directing their own agent sessions. The test harness is the universal quality gate — both human and agent work must pass before merging.
How do you prevent codebase degradation at high velocity?
This is the critical risk for startup leads. When agents and humans ship fast, every merge that adds bloat degrades the system. Apply three structural safeguards:
1. Merge gating with taste: Not everything that passes tests should merge. Ask 'does this belong in core, or should it be a plugin?' Apply the same judgment you'd apply to a junior engineer's PR.
2. Plugin architecture: Split the codebase so features can live in isolated modules. This prevents the core from becoming a fire dump and makes large refactors survivable.
3. Test harness as ground truth: 'In Harness We Trust' means the test suite — not human review of every line — is the primary quality signal. Maintain and expand the harness aggressively.
How do .skills files create team-wide consistency?
When multiple engineers each run their own agent sessions, output quality varies based on how they prompt. .Skills files standardize this. Create a shared library of versioned .skills files encoding your team's conventions: documentation format, test patterns, commit message style, code review checklist.
Distribute these files across the team. After each sprint or heavy session, do a team retrospective on .skills file effectiveness. Where did agents produce poor output? Update the files. This creates a compounding asset that improves every agent session over time — far more valuable than one-off prompt engineering.
How do you scale agent orchestration across a growing team?
Start with 2–3 agent swim lanes per engineer (one active, two autonomous). As the team builds waffling-detection intuition, scale to 5+ lanes per engineer. Use dedicated triage lanes to handle incoming issues — especially important for startups receiving customer bug reports at volume.
Run evaluation swim lanes after each release cycle, especially for systems with multiple integrations. Build synthetic test environments for critical paths. Treat evals as a swim lane, not an afterthought.
The scaling path is: build .skills files → distribute to team → retrospect weekly → expand swim lanes per engineer → add evaluation lanes → adopt plugin architecture for the codebase.
What's the next step?
Run a pilot: pick one sprint, assign 3 agent swim lanes alongside your human team, load .skills files, and gate every merge through the test harness. After the sprint, retrospect on agent output quality, .skills file effectiveness, and merge discipline. You'll see where the method saves time and where your team needs to build waffling-detection intuition. Then scale from there.
// FREQUENTLY ASKED QUESTIONS
How do I train my engineering team to detect agent waffling?
Have each engineer run high-volume agent sessions and read self-explanations like status updates. Share examples of waffling — circular reasoning, vague assertions, over-confident wrong claims — in team retrospectives. The intuition develops through practice. Vincent Koc compares it to detecting when a staff member is bullshitting: if you can read people, you can learn to read agents.
Should I let engineers merge agent-generated code without human review?
For low-risk swim lanes (test formatting, documentation, CI fixes), yes — if the test harness passes and .skills files enforce your conventions. For high-risk lanes (new features, architectural changes), require human review. The autonomy level should match the risk level. Gate merges with taste: passing tests is necessary but not sufficient for core changes.
How do I prevent merge conflicts when running many parallel swim lanes?
Use separate repo clones per swim lane to isolate work. Sequence merges by priority — CI and bug fixes first, then features. Plugin architecture reduces conflict surface by keeping modules isolated. When conflicts arise, reassign the lower-priority lane to rebase against the updated main branch. Treat conflict resolution as a factory-manager scheduling decision.