Frequently Asked Questions About Hablich Agent Interface Engineering Framework

23 answers covering everything from basics to advanced usage.

// Basics

What problem does the Hablich Agent Interface Engineering Framework solve?

It solves the problem of agents 'flying blind' — AI agents that can generate code or take actions but cannot validate what they've done, burn excessive tokens on raw data, select wrong tools due to poor descriptions, or create security vulnerabilities through convenience features that bypass consent. The framework provides a systematic approach to make agent interfaces fuel-efficient, self-healing, discoverable, and trustworthy across different deployment contexts.

What does 'agents are a different user class' mean?

It means agents share the same goals and intents as human users but have fundamentally different cognitive bottlenecks. Humans struggle with visual complexity and need layout, colour, and signal density. Agents have no visual needs — their bottleneck is token cost and reasoning load. Designing for agents requires treating them as a separate user segment with their own non-functional requirements: efficiency, discoverability, security, and stability, rather than repurposing human-facing interfaces.

How is tokens per successful outcome different from just counting total tokens?

Tokens per successful outcome normalises cost against only successful task completions, not all attempts. Total token counts include failed attempts, retries, and incomplete journeys, which masks the true cost of achieving results. The metric also must be measured per journey type — comparing a complex debugging session against a simple scraping task would produce meaningless data. This per-journey normalisation reveals which specific interfaces need optimisation.

Why shouldn't I compare token costs across different agent task types?

Different user journeys have inherently different complexity and token requirements. A multi-step debugging session will always consume more tokens than a simple page-scraping task, and that difference is appropriate — not a problem. Comparing them globally masks the real signal: whether each journey type is becoming more or less efficient over time. Measure tokens per successful outcome within each journey type separately, and use those per-journey metrics to prioritise optimisation effort where it matters most.

What is the 'schema is the UI' principle?

MCP tool descriptions — names, parameter schemas, and docstrings — function as the user interface that agents navigate. Just as a poorly designed visual UI causes human users to click wrong buttons or abandon tasks, poor tool descriptions cause agents to call wrong tools, skip correct ones, or fail entirely. The principle means that auditing and improving tool descriptions is a first-class engineering task, not a documentation afterthought. Research suggests 97% of MCP tool descriptions have quality smells that cause selection errors.

Is the Hablich framework only for MCP servers or does it apply to other agent interfaces?

It applies to any interface that agents consume — MCP servers, CLI tools, REST APIs, GraphQL endpoints, or custom agent-facing interfaces. The core principles (agents as a different user class, fuel efficiency, schema as UI, trust tiers, trade-off awareness) are interface-agnostic. The framework originated from Chrome DevTools MCP work but its concepts — semantic summaries, tool categorisation, self-healing error design, and tokens-per-outcome measurement — transfer directly to any system where an AI agent must discover, select, and use tools to complete tasks.

// How To

How do I replace raw data with semantic summaries in my MCP server?

For any tool that returns large raw payloads — traces, logs, AST dumps, JSON blobs — engineer an alternative response that returns structured markdown surfacing only actionable signal. Keep the raw output capability available for post-processing pipelines, but make the semantic summary the default agent-facing response. For example, instead of returning a full 50,000-line performance trace, return a summary listing the top 5 performance bottlenecks with file locations, metric values, and suggested fix categories.

How do I categorise tools to reduce context window bloat?

Review your full tool inventory and identify niche tools that apply to less than 20% of user journeys. Hide these behind command line parameters or opt-in flags rather than exposing them in the default context. For cost-sensitive deployments, offer a Slim Mode with only 3-5 core tools. Always document the capability trade-offs explicitly: slim mode saves tokens but may force extra agent turns or block certain tasks entirely. Monitor whether agents in slim mode get stuck on tasks that require hidden tools.

How do I write a minimum viable tool description for an MCP tool?

A minimum viable description includes two elements: purpose (what the tool does) and activation criteria (when an agent should call it). Use domain-specific vocabulary the agent will encounter in user prompts. For example: 'Finds SQL injection, XSS, and exposed secret vulnerabilities in source code. Use when asked to perform a security audit or check for common web application vulnerabilities.' Keep it as short as possible — every extra word costs context tokens and can bias smaller models toward over-selecting that tool.

How do I set up trust tiers for my agent interface?

Classify each deployment environment before any other design decision. Tier 1 (local dev): require explicit human consent at every sensitive action, use default browser profiles, and make consent time-bound. Tier 2 (CI/controlled): use containers, separate browser profiles, and remote debugging ports for data separation. Tier 3 (internet access): implement domain allow lists, prompt injection mitigations, and all Tier 2 controls. Share tools across tiers freely, but never share the security model. Document the tier classification explicitly in your architecture.

How do I measure agent interface quality if I can't instrument everything perfectly?

Start measuring even if imperfect — data-informed decisions beat gut-driven decisions. Track the basics: token cost, tool call count, duration, and task completion (binary: did the agent finish the journey?). Calculate tokens per successful outcome per journey type. Visualise this as a per-journey bar chart where bar length equals effectiveness. Prioritise engineering effort on journeys with the worst ratios. You do not need perfect instrumentation to identify which journeys are most wasteful and where to focus optimisation.

// Troubleshooting

My agent keeps calling the wrong MCP tool — how do I fix that?

This is almost always a tool description quality problem. Audit each description for purpose clarity and activation criteria. Add explicit trigger signals using the exact domain vocabulary agents will encounter. Also add proactive detours — explicit redirections in tool descriptions that steer agents toward the correct tool before they reach for the wrong one. For example: 'Do NOT use this tool for accessibility checks — use audit_accessibility instead.' Check whether exposing too many tools is overwhelming the agent's selection ability, and consider tool categorisation or Slim Mode.

My agent keeps hitting context window limits with my MCP server

Your agent is likely in the dump zone — receiving raw, voluminous data that overwhelms its context window. Apply three fixes: First, replace raw data returns (full logs, trace files, AST dumps) with semantic summaries in structured markdown. Second, categorise tools and hide niche ones behind opt-in flags to reduce default context size. Third, trim tool descriptions to minimum viable descriptions. Measure tokens per successful outcome before and after changes to verify improvement.

My agent gets stuck on errors and can't recover without human help

Build self-healing capability into your interface. For each tool, enumerate failure modes and rewrite vague errors to include actionable recovery information. Create diagnostic playbooks — pre-built troubleshooting skills that activate on known recurring failures. Add proactive detours in tool descriptions to redirect agents away from common wrong-path choices. Every unhelpful error message forces the agent to burn tokens on retries; every self-healed error saves them. Measure recovery rate alongside tokens per successful outcome.

What happens if I decompose a monolithic tool into too many small tools?

Over-decomposition creates a new version of the same problem. Too many tools inflate context window size, increase the chance of wrong-tool selection, and force agents to reason about tool orchestration instead of the actual task. The Hablich framework's core principle is that every trade-off shifts, it doesn't disappear. Apply tool categorisation aggressively — hide niche tools behind opt-in flags, offer Slim Mode for common journeys, and audit descriptions to minimum viable length. Monitor wrong-tool-selection rates after decomposition.

// Comparisons

How does the Hablich framework compare to just following MCP spec best practices?

The MCP specification defines the protocol — how tools are registered, called, and return data. The Hablich framework goes beyond protocol compliance to engineering the quality of the agent experience. It adds fuel-efficiency measurement, trust-tier security modelling, semantic summary design, tool categorisation strategies, description auditing methodologies, self-healing error playbooks, and the principle that every optimisation introduces trade-offs. You can be fully MCP-compliant and still have agents that fail, burn tokens, or create security vulnerabilities.

How does this framework compare to function-calling best practices from OpenAI or Anthropic?

Function-calling best practices from model providers focus on schema formatting, parameter typing, and prompt engineering for a single model's function-calling interface. The Hablich framework operates at the interface architecture level — across any model or harness. It adds deployment-tier security, cross-journey fuel-efficiency measurement, semantic summary engineering, tool categorisation with slim modes, proactive detours for training-data biases, and diagnostic playbooks. It's complementary: apply provider best practices within the framework's broader system design.

// Advanced

Should I use skills or individual tools in my MCP server?

Use individual tools as the default, and add skills only when a user journey has too many steps to be reliably assembled from individual tools alone. Skills are pre-built multi-step workflows that improve reliability for complex, repeatable journeys. However, skills are not free — each one inflates context window size and can cause agents to invoke skills inappropriately. Apply minimum viable description discipline to skills just as you do to tools. Monitor whether adding skills causes wrong-skill selection in unrelated journeys.

What is the Lethal Trifactor and how does it relate to agent interface security?

The Lethal Trifactor is Simon Willison's framework for reasoning about three converging risk factors in agentic browser automation that make prompt injection dangerous. In the Hablich framework, it drives the principle of by-design friction in Tier 1 deployments: a human must consent at each sensitive action, and convenience features that remove that consent (like auto-remembering permissions or autoconnect) are treated as security risks, not UX improvements. The trifactor is specifically applied during Step 1 when establishing trust boundaries.

What are proactive detours in agent tool design?

Proactive detours are explicit redirections built into tool descriptions or schemas that counteract an agent's training-data biases. When model training data causes agents to reach for the wrong tool for a given task, proactive detours steer them toward the correct tool before the mistake happens. For example, a performance profiling tool might include: 'This tool measures runtime CPU performance. For memory leak detection, use detect_memory_leaks instead.' They reduce token waste from wrong-tool calls and subsequent corrections.

Can I use the same MCP tools across different trust tiers?

Yes, tools can be shared across all three trust tiers. The critical rule is that the security model must never be shared. The same 'navigate_to_url' tool might exist in Tier 1 (requires human consent each time), Tier 2 (runs in an isolated container with a separate browser profile), and Tier 3 (restricted to domain allow lists with prompt injection mitigations). The tool interface stays the same; the security wrapper around it changes per tier.

How do diagnostic playbooks work in agent interfaces?

Diagnostic playbooks are pre-built troubleshooting skills that activate when an agent encounters a known recurring failure mode — like a configuration error, connectivity issue, or environment setup problem. Instead of the agent burning tokens on trial-and-error recovery, the playbook provides a structured resolution path. For example, a 'fix_chrome_connection' playbook might check if the remote debugging port is open, verify the browser profile exists, and suggest specific fixes for each failure point. This enables self-healing without human intervention.

How do I handle the trade-off between adding more tool descriptions and context window size?

The Hablich framework's fifth principle states that every trade-off shifts, it never disappears. Richer tool descriptions improve discoverability but consume more context tokens and can bias smaller models. The approach is to write minimum viable descriptions — the shortest text that still provides purpose and activation criteria. Then measure: if wrong-tool selection rates drop without context-related failures, the description length is right. If smaller models start over-selecting newly described tools, trim the descriptions. Iterate continuously as models and harnesses evolve.