How do DevTool founders build coding agent skills?
For DevTool and API platform founders · Based on Klingen Coding Agent Skill Architecture Method
// TL;DR
If you run a developer tools company with extensive documentation and flexible integration patterns, coding agents are becoming a primary onboarding channel — and they're getting it wrong. The Klingen method gives you a systematic way to build a Skill MD (CLAUDE.md or equivalent) that guides agents through your product's setup correctly, using progressive disclosure, live doc references, and LLM-as-judge evals. Use it when support tickets increasingly stem from agent-generated misconfigurations rather than human confusion.
Why are coding agents setting up my DevTool product incorrectly?
Coding agents like Claude, Cursor, and Codex rely on pre-training knowledge that goes stale within months. If your SDK has evolved — renamed methods, changed CLI flags, added new integration patterns — the agent is working from outdated context. Without a skill, the agent turns the Rubik's Cube randomly: it has all the moves but no methodology.
The Klingen method starts by auditing this pre-skill failure state. Run your most common user request ('add tracing to my app') through a coding agent with no skill installed. Capture the trace. Document what it got wrong: hallucinated API methods, stale CLI flags, missing instrumentation spans, wrong default configurations. This baseline tells you exactly what the skill needs to fix.
How do I create a skill file for my DevTool product?
Build a Skill MD with two components. First, style rules — behavioural instructions like 'ask the user what application type they're building before recommending an integration pattern' and 'fetch the CLI help flag before assuming parameters exist.' Second, an agent sitemap — a structured index of your documentation URLs organized by feature area.
Critically, reference your docs, don't copy them in. Embedding documentation creates a duplicate that goes stale, reproducing the exact problem the skill was meant to solve. Instead, expose a natural-language search endpoint that returns relevant doc chunks. Advertise markdown-negotiation headers so agents avoid parsing HTML.
For a DevTool with five feature areas and 400+ pages, your agent sitemap might have 10–15 top-level entries pointing to feature area indexes, with the search endpoint handling everything else.
How do I know if my skill is actually working?
Set up a basic eval suite using LLM-as-judge. Create a sample repository representing a typical customer app. Write 3–7 natural-language assertions: 'OpenAI instrumentation was added,' 'retrieval spans appear in trace,' 'no hardcoded API keys were introduced.' Run these via an LLM comparing filesystem state before and after.
Don't wait for a perfect eval framework. A basic setup unblocks iteration immediately. Complement it with manual trace review — read 5–10 full execution traces to catch qualitative issues automated metrics miss, like the agent wandering instead of proceeding directly.
How do I keep my skill current as my product evolves?
Embed a creation timestamp and instruct the agent: 'If this skill is older than 14 days, alert the user.' Since your docs are referenced (not embedded), the agent always reads the latest version when it follows sitemap links or queries the search endpoint. Use production signals from search endpoint queries to discover new use cases and gaps. Run periodic auto-research loops with a carefully defined target function to generate improvement candidates — but human-review every suggestion.
What should I do next?
Start with the single most common user entry scenario for your product. Build a minimal Skill MD with 5–10 style rules and an agent sitemap. Install it in your test environment, run three scenarios, read the traces, and refine. You'll see immediate improvement in setup accuracy. Then layer on the search endpoint, LLM-as-judge evals, and auto-research as your skill matures.
// FREQUENTLY ASKED QUESTIONS
How many documentation pages does my product need before this method is worth it?
There's no strict minimum, but the method delivers the most value when your product has 50+ documentation pages across multiple feature areas with multiple valid integration paths. Even with fewer pages, the core principles — progressive disclosure, reference over duplication, and trace-based iteration — prevent the most common agent failures like hallucinated APIs and stale setup steps.
Should I build one skill for my entire product or separate skills per feature?
Start with one skill covering your primary onboarding workflow. A single Skill MD with style rules and an agent sitemap can handle multiple feature areas through progressive disclosure — the agent only fetches the docs relevant to each decision point. Split into separate skills only when trace data shows the single skill is causing confusion across unrelated workflows, or when different features have fundamentally different user entry scenarios.
Will building a coding agent skill reduce my support ticket volume?
Yes, if a significant portion of your support tickets stem from incorrect agent-driven setups. The skill prevents the most common failure modes — hallucinated APIs, wrong default configurations, and mismatched integration patterns. Production signals from the search endpoint also reveal gaps in your documentation itself, which reduces both agent and human confusion over time.