How DevTool Companies Write AI-Facing Docs That Actually Work
For Developer tool companies creating AI-facing documentation and skills · Based on Nick Nisi Harness Engineering for AI Agents
// TL;DR
Developer tool companies need their SDKs and APIs to work well when AI agents use them — not just humans. Harness Engineering provides a methodology for creating AI-facing documentation: instead of converting your full docs into agent-readable skills, identify the specific gotchas where agents reliably fail with your product, write targeted guidance under 600 lines, and validate with evals. The principle 'Guide, Don't Prescribe' means exposing implicit contracts the model can't know, not re-teaching it your programming language.
Why do devtool companies need AI-facing documentation?
Your users are increasingly using AI agents to integrate your SDK, configure your platform, and troubleshoot issues. When those agents fail — importing deprecated modules, violating implicit framework contracts, using the wrong authentication flow — your support team gets the ticket. The agent doesn't read your docs the way a human does, and comprehensive documentation can actually degrade agent performance by diluting attention.
Harness Engineering's 'Guide, Don't Prescribe' principle directly addresses this: the model already knows how to code. It knows JavaScript, Python, REST APIs, and common patterns. What it doesn't know are your product's specific implicit contracts — the landmines that even experienced human developers hit. Your AI-facing docs should expose only those.
How do you identify which gotchas to document for AI agents?
Nick Nisi's framework prescribes a measurement-first approach:
1. Build an eval suite: Create a set of tasks that represent common agent interactions with your product — installation, configuration, authentication, common CRUD operations, error handling. Define provable success conditions for each.
2. Run the baseline: Execute the eval suite with zero additional context (just the agent's pre-training knowledge of your product). Record the pass rate.
3. Analyze failures: For each failed task, identify the specific misunderstanding. Was it an implicit contract? A deprecated API the model has stale training data for? A configuration step that requires a non-obvious order?
4. Write targeted gotchas: Document only the failure points. Each gotcha should state the wrong thing agents do and the correct behavior. Aim for under 600 lines total.
5. Re-run evals: Measure the pass rate with gotchas loaded. If it improved, keep the gotcha. If a gotcha doesn't help or makes other tasks worse, delete it. Nick Nisi demonstrated that 553 lines of targeted gotchas outperform 10,000 lines of comprehensive documentation.
What mistakes do devtool companies make when creating AI-facing docs?
The biggest mistake is converting your entire documentation into agent-readable format. This feels productive but violates the 'Measure, Don't Assume' principle. More tokens in context can actively degrade agent performance. The model's attention gets diluted across thousands of lines when it only needs to know about three specific gotchas.
The second mistake is assuming the model doesn't already know your product. If your SDK is popular and well-represented in training data, the model already knows the basics. Re-teaching fundamentals wastes context window and adds noise. Focus exclusively on what changed since the training cutoff and what the model consistently gets wrong.
How do you maintain AI-facing docs as your product evolves?
Treat your gotchas file like code: version it, test it, and iterate on it. Every product release, re-run your eval suite. New failures mean new gotchas. Resolved failures (API stabilized, deprecated module removed) mean pruned gotchas. The Retrospective Memory Loop principle applies here: after every eval run, update the gotchas file with lessons learned.
Publish your gotchas file alongside your standard documentation. Make it easy for agent harnesses to pull only the targeted guidance file rather than parsing your entire docs site. Some companies create dedicated `/ai` or `/agent` endpoints that serve only the gotchas — the minimal, high-signal content that measurably improves agent success rates.
Next step: Build your initial eval suite with 10-20 common agent tasks for your product. Run the baseline without gotchas, identify the top 5 failure patterns, and write your first targeted gotchas file. Measure the delta. That's your proof of value.
// FREQUENTLY ASKED QUESTIONS
How many gotchas should a devtool company document for AI agents?
As few as needed to measurably improve eval pass rates, typically under 600 lines total. Each gotcha should address a specific, recurring agent failure point in your product — not general usage instructions. If an agent consistently uses the wrong authentication method, that's a gotcha. If it correctly handles basic CRUD operations, don't document those. Every line must earn its place through measured eval improvement.
Should devtool companies replace their human docs with AI-facing docs?
No. AI-facing gotcha files supplement human documentation, they don't replace it. Human docs explain concepts comprehensively for learning; gotcha files expose specific landmines for agents that already know how to code. Maintain both. The gotcha file is typically a separate markdown document or API endpoint — minimal, targeted, and validated by evals. Human docs remain your primary documentation for human developers.
How do devtool companies measure whether their AI-facing docs actually help?
Build an eval suite of 10-50 common agent tasks for your product with provable success conditions. Run the suite without gotchas (baseline), then with gotchas loaded. Compare pass rates. If the gotchas file increases pass rate without introducing new failures, it's working. Re-run evals after every product release and gotcha update. Nick Nisi's principle is clear: if you're not measuring, you cannot distinguish improvements from noise.