Question 1

What exactly goes into a CLAUDE.md or .clinerules skill file?

Accepted Answer

A Skill MD file contains two components: style rules and an agent sitemap. Style rules govern agent behaviour — for example, 'ask clarifying questions before making architecture decisions' or 'fetch the CLI help flag before assuming parameters exist.' The agent sitemap is a structured index of documentation URLs organized by feature area. Crucially, you reference documentation, never copy it in, to avoid staleness. You also embed a creation timestamp and staleness detection instructions.

Question 2

Can I use the Klingen method for products with fewer than 50 documentation pages?

Accepted Answer

Yes, but the method delivers the most value for products with deep, flexible documentation where multiple valid setup paths exist. For simpler products, you may only need a basic Skill MD with style rules and a few doc references. The core principles — progressive disclosure, reference over duplication, and trace-based iteration — still apply regardless of documentation size, and even small skills benefit from LLM-as-judge evals and production signal monitoring.

Question 3

What is progressive disclosure in the context of coding agent skills?

Accepted Answer

Progressive disclosure means the skill reveals only the documentation references and decision hints the agent needs at each step, rather than front-loading all possible context. This mirrors how a human expert guides a conversation — asking what type of application the user has before recommending an integration pattern. It reduces token waste, keeps the agent focused, and prevents information overload that leads to incorrect path selection.

Question 4

How do I set up a search endpoint for my documentation that agents can use?

Accepted Answer

Expose a natural-language search endpoint that accepts a query string and returns relevant documentation chunks. This replaces page-by-page doc fetching, reducing turns needed to find information. Advertise markdown-negotiation headers (e.g., append /md to URLs or send Accept: text/markdown) so agents avoid parsing HTML and wasting tokens. The search endpoint also lets you track what problems users actually encounter at query time, which feeds skill improvement through production signals.

Question 5

How do I define a target function for auto-research without losing important behaviours?

Accepted Answer

Include every behaviour you want to preserve in the target function, not just the primary success metric. For example, if you optimise only on 'number of turns,' the agent will strip out documentation-fetching instructions — destroying up-to-date context. Include checks for correct instrumentation, presence of desired spans, no hallucinated APIs, appropriate clarifying questions asked, and any approval gates for sensitive operations. Treat the target function with the same rigour as a product requirements document.

Question 6

How do I handle staleness in my coding agent skill over time?

Accepted Answer

Embed the creation date in the skill file and instruct the agent: 'If this skill is older than N days, alert the user and suggest fetching a fresh version.' Reference documentation via URLs and search endpoints rather than embedding content — this ensures the agent always reads the latest docs. Currently, automatic skill updates are unreliable across coding agent environments, so timestamp-based staleness detection with manual refresh is the most practical approach.

Question 7

How do I write natural-language assertions for LLM-as-judge evaluation?

Accepted Answer

Write 3–7 plain-English statements describing expected post-execution state. Examples: 'OpenAI instrumentation was added to the application,' 'Retrieval spans appear in the trace output,' 'No hardcoded API keys were introduced.' The LLM-as-judge compares filesystem or trace state before and after skill execution against these assertions. Start simple — imperfect coverage is acceptable. You can add more assertions as manual trace review reveals new failure modes.

Question 8

My coding agent keeps hallucinating API methods that don't exist — will the Klingen method fix this?

Accepted Answer

Yes, this is one of the primary failure modes the method addresses. By providing an agent sitemap and search endpoint, the skill directs the agent to authoritative documentation before it falls back on stale pre-training knowledge. Style rules like 'fetch the CLI help flag before assuming parameters exist' and 'always consult the API reference before calling a method' explicitly prevent hallucination. Trace review during iteration will catch any remaining instances so you can add specific guards.

Question 9

What if my auto-research loop keeps removing documentation-fetching steps?

Accepted Answer

This happens when your target function optimises on a proxy metric like turn count. The agent rationally removes doc-fetching instructions because they add turns. Fix this by adding explicit checks to the target function: 'agent consulted the documentation search endpoint,' 'agent used current API signatures, not pre-training versions,' 'agent asked at least one clarifying question.' The target function defines everything — if a behaviour isn't measured, it will be optimised away.

Question 10

The agent works fine for one use case but breaks for others — what's wrong?

Accepted Answer

Your skill likely lacks clarifying questions that differentiate between application types. An unopinionated product supports many valid integration paths (chat vs. batch vs. RAG vs. voice), and a skill without upfront disambiguation will recommend a single path regardless. Add style rules requiring the agent to ask about the user's application type, framework, and goals before selecting an integration pattern. Use production signals from your search endpoint to discover which use cases are actually occurring.

Question 11

How does the Klingen method compare to RAG-based documentation chatbots?

Accepted Answer

RAG chatbots answer questions about your product. The Klingen method produces skills that execute setup and integration workflows inside a coding agent. A RAG chatbot says 'here's how to add tracing'; a Klingen skill makes the agent actually add tracing correctly in the user's codebase. They're complementary — your search endpoint may even use RAG under the hood — but the skill adds procedural methodology, clarifying questions, style rules, and eval-driven iteration that a chatbot doesn't provide.

Question 12

How is the Klingen method different from just writing a really good system prompt?

Accepted Answer

A system prompt is ephemeral and applies to one conversation. A Klingen skill is a persistent, versioned, environment-installed instruction set (CLAUDE.md, .clinerules) that applies across all sessions. It includes an agent sitemap, references to live documentation, staleness detection, and is designed for iterative improvement through traces, evals, and auto-research. A system prompt might capture some style rules, but it can't provide the structured methodology, progressive disclosure, or production signal feedback loop that the Klingen method builds.

Question 13

Can I use the Klingen method with agents other than Claude, Cursor, or Codex?

Accepted Answer

Yes. The principles are agent-agnostic — progressive disclosure, reference over duplication, trace-based iteration, and LLM-as-judge evals work with any coding agent that accepts instruction files or system-level configuration. The specific file format varies (CLAUDE.md for Claude, .clinerules for Cline, .cursorrules for Cursor), but the architecture — style rules plus agent sitemap plus search endpoint — translates directly. The method explicitly avoids dependence on proprietary plugin marketplaces for this reason.

Question 14

What does 'unopinionated infrastructure' mean and why does it matter for skills?

Accepted Answer

Unopinionated infrastructure provides reliable, flexible primitives (e.g., tracing that works at billions of events) without prescribing end-to-end workflows. This flexibility is great for power users but terrible for coding agents, which need clear direction. Skills become the natural completion layer — they add the opinionated guidance the product intentionally omitted. The more unopinionated your product, the more critical a well-designed skill becomes for correct agent-driven setup.

Question 15

How do I know which use cases to prioritise when building my first skill?

Accepted Answer

Start with the most common user entry scenario — the request users most frequently type into a coding agent for your product. If you have an existing search endpoint or support logs, mine those for the top query patterns. If not, pick the single most representative setup workflow (e.g., 'add observability to my agent'). Build and iterate on this one skill first. Production signals will then reveal adjacent use cases you didn't anticipate, which become your next skill targets.

Question 16

Should I accept all suggestions from the auto-research loop?

Accepted Answer

No. Expect to accept roughly 50% of suggestions. If you're accepting all of them, your target function is too narrow and isn't capturing all desired behaviours. Human-review every suggestion against the full intended behaviour, especially edge cases the target function didn't explicitly cover. Maintain an approval gate for any action that moves user data outside their local environment. Auto-research generates candidates; humans make decisions.

Question 17

How do production signals improve my skill over time?

Accepted Answer

Production signals — search endpoint queries, trace logs, execution records — reveal what users are actually trying to do with the skill versus what you assumed during design. If your search endpoint shows frequent queries about prompt versioning but your skill doesn't cover it, that's a gap. If traces show the agent consistently taking a wrong turn at a specific decision point, that's a broken path. These signals feed directly into skill file updates, eval assertion additions, and auto-research target function refinements.

Question 18

What's the minimum viable skill I can ship to start getting value?

Accepted Answer

A Skill MD file with 5–10 style rules and an agent sitemap covering your core documentation pages. Add a timestamp and staleness instruction. You don't need a search endpoint, auto-research, or a full eval suite to start. Run a few user scenarios manually, read the traces, and refine the style rules. This minimal version already prevents the most common failure modes — hallucinated APIs, stale context, and wrong setup paths — and gives you a foundation to iterate on.

Question 19

How do I audit agent-unfriendly UX assumptions inherited from human-facing design?

Accepted Answer

Review every place your product simplified UX for humans — defaulted data regions, omitted environment variables, skipped confirmation steps. Agents don't experience these shortcuts as helpful; they cause silent misconfigurations. For each simplification, ask: 'Would an expert ask a clarifying question here?' If yes, add a style rule requiring the agent to ask. Adding an extra environment variable or configuration step costs an agent zero effort but prevents incorrect default assumptions.

Question 20

Can the Klingen method work for non-developer-facing products?

Accepted Answer

The method is designed for technical products with coding agents, but its core principles apply wherever an AI agent needs to follow a complex workflow reliably. If your product has a CLI, API, SDK, or configuration layer that an agent interacts with, the method applies directly. For purely GUI-based products without programmatic interfaces, the method has limited applicability since coding agents can't interact with graphical interfaces in the same way.

Question 21

Why shouldn't I distribute my skill through a plugin marketplace?

Accepted Answer

The Klingen method advises against relying on proprietary plugin marketplaces or integration layers because they create maintenance overhead across multiple agent environments and are a distraction for small teams. Skill files (CLAUDE.md, .clinerules) are simple text files that can live in your repo or be fetched via URL. This keeps distribution lightweight and avoids vendor lock-in. Focus your effort on skill quality and iteration, not distribution infrastructure.

Question 22

How many traces should I review manually before switching to automated evals?

Accepted Answer

Review at least 5–10 full execution traces across different user scenarios before building automated evals. You need enough traces to see patterns — recurring hallucinations, common wrong turns, missing clarifying questions, absent spans. Each observation becomes a concrete eval assertion or style rule. Manual trace review remains valuable even after you have automated evals; plan to revisit traces periodically as new use cases emerge from production signals.

Frequently Asked Questions About Klingen Coding Agent Skill Architecture Method

// Basics