How Do Platform Leads Build a Context Engine for AI Agents?

For Platform engineering leads at mid-size companies (40-200 engineers) · Based on Walsenuk Stop Babysitting Agents Framework

// TL;DR

If you're a platform engineering lead at a mid-size company, your AI agents are probably producing code that gets rejected at every PR review because they miss existing patterns, shared services, and team conventions. The Walsenuk Stop Babysitting Agents Framework shows you how to build a Context Engine — infrastructure that performs exhaustive multi-surface retrieval across GitHub, Slack, Jira, and docs, resolves conflicts, enforces permissions, and delivers token-optimised research packets to agents. This is the infrastructure layer that makes background agents viable at your org's scale.

Why Do AI Agents Keep Getting Rejected at PR Review?

As a platform engineering lead, you've likely seen this pattern: an agent generates a new integration or service, the PR lands, and the reviewing senior engineer immediately flags that it ignores the factory pattern, duplicates a shared utility, or contradicts a decision made in a Slack thread last month. The agent produced plausible code, but not correct code for your org.

This happens because of what Brandon Walsenuk calls Satisfaction of Search — the agent finds the first workable approach and stops looking. It never discovers your shared service layer, your canonical patterns, or the CTO's Slack message that changed the architecture direction. You are currently acting as the Context Engine, manually supplying all of this on every run.

How Do You Build a Context Engine at the Platform Layer?

The Context Engine is infrastructure, which means it's your responsibility as a platform lead. Here's the build sequence:

1. Audit your systems of record. List every place engineering context lives: GitHub (PRs, commit history, code patterns), Slack (decisions, overrides, tribal knowledge), Jira/Linear (tickets, priorities), internal docs, and runbooks. Your static CLAUDE.md file covers maybe 10% of this.

2. Build the Social Graph. Point an automated tool at your Git history. Generate a graph where nodes are engineers and edges represent PR reviews, co-authorship, and service ownership. This graph is the pivot point — when an agent query arrives, the engine scopes retrieval to the right codebases and history for the specific engineer or task.

3. Replace naive RAG with exhaustive retrieval. Do not stand up a single vector store. Build retrieval that constructs structured research queries, fans out across all systems in parallel, and runs until no new relevant signals remain. This is the antidote to Satisfaction of Search.

4. Add conflict resolution. When code in main contradicts a Slack thread from the CTO, apply authority-weighting: recency, role, canonicity. Surface the conflict and resolution with citations. Never silently pick one source.

5. Enforce permissions from day one. At your org size (40-200 engineers), data governance is non-negotiable. Carry OAuth context through every retrieval call. Private Slack channels must never leak into unauthorized research packets.

6. Compress and token-optimise. The engine's output is a small, high-signal research packet — not a raw dump. Reason across all surfaces, strip redundancy, and return only what the agent needs to act.

How Does the Context Engine Fit Into Your Agent Execution Pipeline?

Structure agent execution as a three-phase loop: Plan with Engine → Execute → Review with Engine. The Context Engine delivers a research packet before execution so the agent produces an org-aware plan. After execution, the engine evaluates the output against real patterns and current truth. This produces PRs that get 'nitpick and merge.'

The same engine also serves your Ask Engineering Slack bot, ticket enrichment, incident triage, and onboarding — compounding your platform investment.

What Should You Build First?

Start with the social graph and one high-value system of record (usually GitHub). Prove exhaustive retrieval on a single use case — like 'implement a new third-party integration' — and measure PR acceptance rate. Then expand to Slack and Jira. The goal is to move your team from stage (b) on the Context Ladder to stage (d) within a quarter.

Next step: Audit your systems of record this week and generate a social graph from your Git history. That data alone will show you how much context your agents are currently missing.

// FREQUENTLY ASKED QUESTIONS

How many engineers do I need before a Context Engine is worth building?

Any team where agents produce PRs that get rejected benefits from a Context Engine. However, the data governance and permission-scoping components become non-negotiable at 20+ engineers. For smaller teams, you might start with exhaustive retrieval over GitHub alone and add permission scoping as you grow.

Should the Context Engine be a separate service or part of the agent framework?

Build it as a separate infrastructure service. The Context Engine should be agent-framework-agnostic — it delivers research packets via API to whatever agent (Claude, Cursor, Codex, custom) is executing. This separation lets you swap agent frameworks without rebuilding context infrastructure, and it lets the same engine serve non-agent surfaces like Slack bots and ticket enrichment.

How do I get buy-in from leadership to invest in a Context Engine?

Measure the doom loop cost: track how many hours engineers spend correcting agent output, how many PR review cycles agent code requires, and the token cost of repeated failed runs. Compare this to the cost of building the engine. Most platform leads find that a single senior engineer spends 30-50% of their time babysitting agents — that cost alone justifies the infrastructure investment.

Full skill: Walsenuk Stop Babysitting Agents Framework Extended FAQ More by AI Engineer All framework skills