How Do Platform Teams Deploy AI Agent Infrastructure at Scale?

For Platform engineering teams at mid-to-large companies · Based on Solmaz On-Demand Disposable Agent Orchestration Framework

// TL;DR

Platform engineering teams responsible for developer productivity can use the Solmaz framework to deploy self-service AI agent infrastructure across their organisation. Instead of giving every engineer a shared Codex or Claude Code instance, deploy a concierge agent on Slack that dispatches on-demand disposable agent pods for specific tasks. This eliminates the single-instance bottleneck, provides clean isolation between tasks, and lets platform teams manage agent lifecycle through Kubernetes operators and Helm charts they already understand.

Why Can't We Just Give Everyone a Shared Agent Instance?

The single-instance bottleneck is the most common failure mode when companies first adopt AI coding agents. One Codex or Claude Code session shared across 100 engineers creates contention, context pollution, and queue delays. Engineers revert to manual work because the agent is always busy or loaded with someone else's context.

The Solmaz framework solves this with the disposable agent pattern: every task gets its own ephemeral Kubernetes pod with a full compute environment. There is no shared state, no contention, and no context leakage between tasks. Platform teams manage this through a Goal Operator that handles provisioning, lifecycle, and teardown — the same operational model they already use for other Kubernetes workloads.

How Do You Architect the Concierge Pattern for Enterprise Slack?

Deploy a single persistent concierge agent on your company's Slack workspace. When an engineer messages the concierge with a task — "debug the OOM errors in the payments service after last night's deploy" — the concierge uses ACPX to dispatch a disposable agent pod on the Kubernetes cluster for that specific task.

Since Slack does not natively support dynamic multi-agent provisioning, the concierge returns a UI link to a React app hosted in your cluster where the engineer interacts with the spawned agent's session. The engineer gets a dedicated agent environment; the concierge remains available for the next request.

Critical implementation detail: never manage Slack app manifests manually. Automate provisioning through the Goal Operator from day one. Manual app manifest management is the first thing that breaks when you scale past 10 concurrent agents.

How Do You Standardise Across Multiple Agent Harnesses?

Enterprise teams often need multiple harnesses — Codex for general tasks, Claude Code for specific languages, internal fine-tuned models for proprietary codebases. ACP (Agent Client Protocol) standardises the interface so that one adapter works across all harnesses. Platform teams write the integration once and swap harnesses by configuration, not code changes.

ACPX is the CLI layer that binds everything together. It lets any agent call any other agent over the command line, routes tasks to the appropriate harness based on task type, and executes SOP workflows as Argo-like DAGs. Platform teams define SOPs per task class (bug triage, PR review, incident response) and deploy them as reusable workflow templates.

What Does the Operational Model Look Like Day-to-Day?

Platform teams manage the agent infrastructure the same way they manage any Kubernetes workload:

- Helm charts define pod templates for each harness type

- Goal Operator handles pod lifecycle, scaling, and failure recovery

- State synchronisation layer keeps file state consistent across parallel agents

- Structured JSON outputs from SOP steps feed into existing observability pipelines

- Escalation metrics track how often agents route to humans, revealing SOP improvement opportunities

The Ship of Theseus principle applies: you do not rebuild the infrastructure when a new model drops. Swap the model inside the harness, update the ACP adapter if needed, and iterate. Continuity of use, not continuity of implementation.

Next step: Deploy the Goal Operator on a staging cluster, configure one harness via ACPX, and run a concierge agent on a test Slack channel with three engineers for one sprint to validate the pattern before company-wide rollout.

// FREQUENTLY ASKED QUESTIONS

How do we handle cost management for disposable agent pods?

Set resource limits per pod via Helm chart configuration and use Kubernetes resource quotas per team or namespace. Since each pod's lifecycle is tied to a single task, costs are predictable and attributable. Track cost-per-task through pod labels and feed that into your existing chargeback system. The resource cost is higher than shared instances but eliminates the hidden cost of engineers waiting for a shared agent.

Can we integrate this with our existing SSO and RBAC systems?

Yes. The concierge agent authenticates users through your existing Slack SSO. ACPX can be configured to check RBAC policies before dispatching agents — for example, restricting which teams can access which harnesses or limiting pod resource allocations by role. The Goal Operator respects Kubernetes RBAC natively. Map your existing permission model onto namespace-level or label-level access controls.

What observability do we get into agent task execution?

Every SOP step emits structured JSON with the decision made, evidence examined, and routing choice. Feed these into your existing logging pipeline (ELK, Datadog, etc.) for full auditability. Track metrics like task completion rate, escalation frequency, average pod lifetime, and cost per task. These signals reveal which SOPs need tuning and which task classes are ready for full automation versus still requiring human sign-off.