How Should Enterprise ML Teams Structure Agent Development?

For Enterprise AI/ML team leads · Based on Hetzel Agent Team Composition Framework

// TL;DR

If you lead an enterprise ML or data science team that has been handed agentic AI development, you are at high risk of the isolation mistake — trying to build agents with only data scientists. The Hetzel Agent Team Composition Framework shows you how to augment your team with systems engineers for orchestration and domain experts for context engineering, while refocusing your data scientists on guardrails and LLM-as-judge validation. Use this framework when your agent POC isn't reaching production or when your team is struggling despite strong ML skills.

Why Is My ML Team Struggling to Build Production Agents?

The most common reason enterprise ML teams stall on agent development is the isolation mistake: the organization assigned agent work to your team because generative AI has 'AI' in the name, and your team is now trying to apply a traditional ML workflow to a fundamentally different problem.

The foundational LLM has already been trained and deployed by Anthropic, OpenAI, or another provider. The entire upstream pipeline — data ingestion, training, cross-validation, deployment — is done. Your team's instinct to recreate that workflow is natural but counterproductive. Agent development happens downstream of the model: in context engineering, systems orchestration, evaluation, and observability.

What Roles Are Missing From My Agent Team?

The Hetzel framework requires three role types on every agent team:

1. Data Scientists / ML Engineers — Own guardrails, LLM-as-judge validation, labelled dataset creation, and fine-tuning only when required.

2. Product / Systems Engineers — Own API integration, distributed multi-agent orchestration, infrastructure, and the eval-observability pipeline.

3. Domain Experts / Subject Matter Experts / Product Managers — Own prompt and context engineering, human annotation, and defining what good agent behavior looks like.

If your team is 100% data scientists, you are missing two of three role types. This is the structural root cause of most enterprise agent team failures.

Audit your team today: list every person assigned to agent work and their background. Flag missing role types. Then begin recruiting or borrowing the missing capabilities from adjacent teams.

How Should I Reassign My Data Scientists on the Agent Team?

Do not ask data scientists to generically 'own agents.' Their unique value on an agent team is threefold:

- Guardrails and risk assessment: They are the adults in the room who remind everyone that the LLM is predicting tokens, not knowing things. They stress-test outputs and flag overconfidence.

- LLM-as-judge validation: They create labelled datasets and apply precision, recall, and F1 to measure whether your automated evaluator actually agrees with human judgment. Without this, eval drift goes undetected.

- Fine-tuning (when required): This is rare. Before assigning data scientists to fine-tune, confirm the problem cannot be solved with better context engineering.

If your data scientists are doing systems engineering or prompt writing, that is a misallocation. Move them to where their statistical expertise creates irreplaceable value.

How Do I Bring Domain Experts Into the Agent Workflow?

Identify the people in your organization who are closest to the problem your agent is solving. For a loan processing agent, that is loan officers and underwriters. For a customer support agent, that is support team leads.

Give these domain experts direct control over — or significant input into — the prompts and context seeded into the agent. Do not gate this behind a technical intermediary. They should also be your primary human annotators, reviewing agent traces and labelling whether the agent performed well and why.

Domain experts have the highest proximity to the problem, which means they hold disproportionate value for context engineering and functional evaluation. Engineers will never anticipate the edge cases that domain experts encounter daily.

What Should I Do Next?

Start with the audit: classify your organization as Traditional Enterprise, list your current team members and roles, and identify gaps against the three required role types. Then reassign existing data scientists to guardrails and eval validation, recruit or borrow a systems engineer for orchestration, and schedule your first context engineering session with domain experts. Implement observability from day one so production traces can be annotated by domain experts immediately.

// FREQUENTLY ASKED QUESTIONS

Should my ML team own the entire AI agent project?

No — this is the isolation mistake. Your ML team should own guardrails, eval validation, and fine-tuning when needed, but agent projects also require systems engineers for orchestration and infrastructure, and domain experts for context engineering and human annotation. Build a cross-functional team rather than isolating agent work within ML.

How do I justify adding non-data-science roles to my agent team?

The foundational model is already built by providers like OpenAI and Anthropic. Agent work is primarily systems engineering (orchestration, APIs, infrastructure) and context engineering (prompts, instructions). These require skills your data science team likely lacks. Frame it as filling capability gaps, not adding headcount — the team cannot reach production without these roles.

What is the first step for an enterprise ML team adopting the Hetzel framework?

Audit your current team composition against the three required role types: data scientists, systems/product engineers, and domain experts. Flag any role type that is absent or underrepresented. Then reassign data scientists from generic agent development to their highest-leverage work — guardrails and LLM-as-judge validation — and recruit the missing roles.