How Should PMs Staff and Scope Agentic AI Projects?

For Product managers leading AI agent initiatives · Based on Hetzel Agent Team Composition Framework

// TL;DR

The Hetzel Agent Team Composition Framework gives product managers a diagnostic for staffing agentic AI projects with the right cross-functional mix. As a PM, you're positioned to bridge the gap between ML engineers who default to technical metrics and domain experts whose input is often marginalised. The framework shows you how to classify your organisation type, audit your team for coverage gaps across three essential personas, and define functional performance criteria that ensure the agent actually solves the problem it was designed for.

Why do PMs need a specific framework for staffing AI agent teams?

Because agentic AI doesn't follow the same rules as traditional ML or standard product development. The Hetzel Agent Team Composition Framework addresses a problem that PMs encounter repeatedly: the agent project gets staffed by the ML team because it 'has AI in the name,' and the PM is left trying to bridge between engineers who optimise for precision/recall and users who need a functional solution.

Phil Hetzel's framework gives PMs a structured way to advocate for the right team composition. It recognises that no single discipline owns agents — the ideal team is deliberately diverse, with data scientists, product engineers, and domain experts each contributing irreplaceable value at different stages.

As a PM, your leverage is in Step 1 (classifying the organisation type to surface default risks), Step 6 (defining eval criteria that include functional performance), and Step 7 (pressure-testing whether the team has someone who actually understands the end user's problem).

How do I advocate for domain experts on my AI agent team?

Use the Hetzel framework's Proximity to the Problem principle. The people closest to the problem the agent is meant to solve — customer service reps, compliance officers, legal researchers, sales managers — hold disproportionate value in agentic development. Their proximity determines the quality of context engineering (the primary lever for changing agent behaviour) and human annotation (the ground truth for eval validation).

As a PM, frame domain expert involvement as a quality and risk issue, not a nice-to-have:

- Without domain experts in context engineering, your prompts will be written by engineers who understand the technology but not the nuances of the problem domain. The agent will produce outputs that are technically well-formed but functionally wrong.

- Without domain experts in human annotation, your eval pipeline has no reliable ground truth. LLM-as-judge evals are just prompts and models — they need human-labelled validation to be trustworthy.

- Without domain experts reviewing agent traces, failure modes that are obvious to someone who knows the domain will go undetected by engineers who don't.

Make the case that domain expert time is an investment in agent quality, not overhead.

How do I define success metrics for an AI agent as a PM?

The Hetzel framework draws a sharp distinction between technical metrics (precision, recall, F1) and functional performance (does the agent accomplish its purpose for real users). As a PM, you own functional performance.

Define success from the user's perspective first:

- Does the agent resolve the customer's issue without escalation?

- Does the agent produce accurate information the user can act on?

- Does the agent complete the workflow end-to-end without errors?

Then work with your data scientist to build evals around these criteria. Use LLM-as-judge automation for scale, validated against human labels from domain experts for accuracy. Technical metrics still matter — your data scientist should track them — but they're inputs to the functional assessment, not the assessment itself.

Critically, build observability to monitor these functional metrics in production. The Hetzel framework warns that confidence built in experimentation doesn't transfer automatically to production. As a PM, you need to know when the agent starts failing in the real world, not just in test suites.

What should I do before my next AI agent project kickoff?

Before your next kickoff, run the Hetzel framework's seven-step workflow as a pre-mortem:

1. Classify your organisation as Traditional Enterprise or AI Native — this surfaces your default risks.

2. Audit your proposed team against the three personas — if any persona is missing, flag it immediately.

3. Confirm that data scientists are scoped to eval validation and guardrails, not model training.

4. Confirm that product engineers own the systems infrastructure and API integration layer.

5. Confirm that at least one domain expert has real ownership over context engineering and human annotation.

6. Define functional performance criteria with the full team before development begins.

7. Pressure-test: does someone on this team deeply understand what the end user actually needs?

If you can't answer 'yes' to all seven, restructure the team before writing the first spec.

// FREQUENTLY ASKED QUESTIONS

As a PM, how do I know if my AI agent team is missing critical skills?

Run the Hetzel framework's three-persona audit. Map every team member to one of: data scientist/ML engineer, product/systems engineer, or non-technical domain expert. If any persona is missing, you have a structural gap. The most commonly missing persona in Traditional Enterprises is the domain expert, whose input is often treated as cosmetic rather than foundational. Flag this gap to leadership as a quality risk.

Should PMs own the eval criteria for AI agents or should the ML team?

PMs should co-own eval criteria with the full team, but they must ensure functional performance criteria are included alongside technical metrics. The Hetzel framework explicitly warns against letting precision, recall, and F1 be the primary eval signals. As a PM, you represent the user's perspective — define what 'good' looks like functionally, then let the data scientist build the technical eval infrastructure around those criteria.

How do I scope an AI agent project differently from a traditional ML project?

Recognise that the model is already built — your team's job is to implement, evaluate, and contextualise it, not train it. Scope the project around context engineering (prompt and input design), eval pipeline construction, observability infrastructure, and systems integration. Allocate significant time for domain expert annotation workflows. The Hetzel framework shifts the critical path from model development to agent quality assurance.