How AI Developers Architect Self-Improving Agent Loops
For AI developers and automation engineers building agent systems · Based on Lewis Jackson Self-Improving Trading Agent Framework
// TL;DR
For AI developers and automation engineers, the Lewis Jackson framework is a reference architecture for building agents with built-in self-improvement loops. The key patterns: a oneshot prompt that orchestrates multi-phase setup end-to-end, the Hermes agent as a persistent learning brain with weekly review cadences, single-variable scientific method iteration for clean attribution, goal polarity (success/failure definitions) as the optimization compass, and Railway cloud hosting for 24/7 reliability. These patterns generalize beyond trading to any domain where an agent must iteratively improve its own behavior against measurable outcomes.
What architectural patterns make this self-improving agent work?
The Lewis Jackson framework implements five patterns that AI developers can study and apply to any autonomous agent system:
1. Oneshot Prompt Architecture: A single prompt orchestrates multi-phase deployment — environment detection, strategy onboarding, scaffolding, cloud deployment, and learning-agent installation. This reduces setup friction to one copy-paste action and ensures reproducibility across environments.
2. Goal Polarity: Every improvement cycle is oriented by two poles — a measurable success definition and a measurable failure definition. The agent doesn't optimize vaguely; it moves toward one pole and away from the other. Without both poles, the loop has no gradient.
3. Scientific Method Loop: One variable change per iteration cycle. This is the core insight that separates this from agents that change everything at once and learn nothing. Clean attribution compounds; noisy signals don't.
4. Persistent Side-State: The Hermes-readable trade ledger and strategy document form a persistent state that survives across cycles. Hermes doesn't start fresh each week — it builds on accumulated evidence.
5. Separation of Observation and Action: The first cycle is always read-only. This pattern prevents a misconfigured agent from acting on misunderstood goals.
How does the Hermes agent differ from a standard LLM feedback loop?
A standard LLM feedback loop typically involves prompting a model, getting output, and manually deciding whether to adjust. Hermes is a persistent agent brain that owns the improvement process. It maintains score weights, reviews outcomes against defined goals, generates scored hypotheses, applies changes, and measures results — all autonomously on a weekly cadence.
The critical difference: Hermes natively learns from interactions without requiring manual retraining instructions. It's not a stateless call to an API — it's a persistent entity with accumulated context, structured ledger data, and ownership of portfolio mechanics. For AI developers, this is the difference between a function call and an agent with memory and agency.
How can I apply these patterns to non-trading agent systems?
The architecture generalizes cleanly. Consider any domain where:
- You have measurable outcomes (success and failure metrics)
- The agent can take actions that influence those outcomes
- You need iterative improvement, not one-shot optimization
Examples: a content-generation agent that improves engagement metrics, a customer-support agent that improves resolution rates, or a code-generation agent that improves test-pass rates. In each case, define the goal polarity, instrument a structured outcome ledger, enforce single-variable changes per cycle, and run on a fixed review cadence.
The oneshot prompt pattern also generalizes — packaging complex multi-step agent bootstrapping into a single reproducible prompt that handles environment detection and branching logic.
What's the deployment architecture look like technically?
The stack is:
- Claude Code as the orchestration terminal for initial setup
- Hermes agent as the persistent self-learning brain
- Railway.app as the 24/7 cloud host
- Strategy YAML as the source-of-truth configuration
- Hermes-readable trade ledger as the structured outcome data store
- Railway CLI for authenticated deployment and live updates
The onboarding flow detects OS and available runtimes (e.g., Node.js), forks into OS-specific instructions, scaffolds the complete file structure, installs Hermes, authenticates Railway, and deploys. Post-deployment, strategy updates are pushed via Railway CLI without manual redeployment.
For developers wanting to extend this, the key integration points are: the strategy YAML (where you'd add new parameters or scoring dimensions), the trade ledger format (where you'd add new outcome metrics), and the Hermes review cadence (which you can adjust for faster or slower iteration).
Explore the oneshot prompt pattern and Hermes integration by retrieving the latest prompt from the 01 Systems community and tracing its execution through Claude Code step by step.
// FREQUENTLY ASKED QUESTIONS
Can I replace Hermes with a different AI agent for the self-improvement loop?
Architecturally, yes — any agent capable of persistent state management, structured data analysis, hypothesis formation, and single-variable iteration could replace Hermes. However, Hermes is specifically designed to natively learn from interactions without manual retraining instructions. Replacing it requires you to build equivalent capabilities: ledger parsing, goal-oriented scoring, hypothesis generation, single-variable enforcement, and weekly cadence management. Hermes provides these out of the box.
How does the oneshot prompt handle different operating systems?
The oneshot prompt includes Phase 1 — Environment Check — where it detects your OS (Mac or Windows) and available runtimes like Node.js. Based on the detection result, it forks into OS-specific instruction paths for file creation, CLI commands, and Railway authentication. You confirm your environment when prompted. This branching logic is embedded in the prompt itself, making the same prompt reproducible across different developer setups.
Can I modify the weekly review cadence to something faster?
Yes, the weekly cadence is a default, not a hard constraint. For high-frequency strategies that generate hundreds of trades per day, a shorter cadence (e.g., every 3 days) may provide sufficient data. However, shorter cadences risk over-optimization — changing variables before enough data accumulates for statistically meaningful analysis. If running a secondary agent, offset its cadence by at least 3 days from Hermes to prevent simultaneous conflicting parameter updates.