In the previous chapter, Context Reconstruction Engine, we gave Claude a "Context Window" so it remembers the history of your project.
Now Claude has a Memory. But having a memory doesn't mean it has discipline.
Large Language Models (LLMs) are designed to be helpful. They are "people pleasers." If you ask: "Did you fix the bug?" Claude might say: "Yes! I fixed it!" because it thinks it did, or because it hallucinated a fix.
In professional software development, "I think I fixed it" isn't good enough. We need proof. We need rigorous testing, security checks, and code reviews before we accept a task as "Complete."
The Spec-Driven Agent Workflow changes how you interact with claude-pilot. Instead of a casual chat, it enforces a strict Standard Operating Procedure (SOP).
Think of it like a Construction Site:
The Builder is not allowed to leave until the Inspector signs off.
Imagine you want to add a Login page.
1. You write a plan.md file defining the Login feature.
2. Claude implements it.
3. Automatically, a second AI agent (The Inspector) wakes up, reviews the code, checks for security flaws, and forces Claude to fix them before telling you it's done.
Everything starts with a Specification (or "Spec"). This is a Markdown file that tracks the status of a feature.
# Plan: Add User Login
Status: PENDING
Iterations: 1
## Tasks
- [ ] Create login.html
- [ ] Implement auth.ts
- [ ] Write tests
This file is sacred. The agents read this to know exactly what "Done" looks like.
In this workflow, the main Claude agent acts like a Project Manager. It hires Sub-Agents to do specific jobs.
We use two specific sub-agents during verification:
These agents run in the background. They don't chat with you; they audit the code and write reports.
If an Inspector finds an issue, it marks it as must_fix.
The workflow is a loop:
Let's look at the choreography when you run the command /spec-verify.
The main agent (Pilot) uses the Task Tool to start the sub-agents. It creates them with run_in_background=true so they run in parallel.
This logic is defined in pilot/commands/spec-verify.md.
# Pseudo-code representation of the prompt sent to Claude
Task(
subagent_type="pilot:spec-reviewer-quality",
run_in_background=true,
prompt="""
Read 'pilot/rules/standards-python.md'.
Review the changed files.
Write findings to 'findings-quality.json'.
"""
)
Explanation:
The subagent_type tells the system which "Persona" to load. The spec-reviewer-quality persona is trained to be meanβit looks for security holes and missing tests.
spec-reviewer-quality.md)
How does the Quality Agent know what to look for? It has a specific system prompt defined in pilot/agents/spec-reviewer-quality.md.
# Spec Reviewer - Quality
You verify code quality, security, and testing.
## Severity Levels
- **must_fix**: Security vulnerabilities, crashes, missing tests.
- **should_fix**: Performance issues, poor error handling.
## Output Format
You MUST write a JSON file containing your findings.
By forcing the output to be JSON, the Main Agent can easily read it and understand exactly what needs to be fixed.
standards-python.md)
The agent doesn't guess what "Good Code" is. It reads a rule file. For example, pilot/rules/standards-python.md:
## Python Standards
- MANDATORY: Use `uv` for package management.
- Unit tests MUST mock all external calls.
- No shell injection vulnerabilities.
If the agent sees pip install instead of uv pip install, it flags a must_fix issue immediately.
After the sub-agent finishes, it produces a file like this:
{
"pass_summary": "Code logic is good, but security is loose.",
"issues": [
{
"severity": "must_fix",
"category": "security",
"title": "Hardcoded Password",
"file": "src/auth.py",
"line": 42
}
]
}
The Main Agent reads this file, sees the "must_fix," and automatically starts editing src/auth.py to remove the password. It doesn't even ask you. It just fixes it.
This workflow shifts the burden of quality assurance from You to the System.
spec-reviewer-quality agent never forgets (because it's in the prompt).
The Spec-Driven Agent Workflow transforms claude-pilot from a coding assistant into a coding team.
We have now covered the entire software stack of Claude Pilot: The Hooks, The Daemon, The Frontend, The Memory, The Context Engine, and The Workflow.
The only thing left is: How do we get this onto a user's computer?
In the final chapter, we will look at the Installer Framework that packages all this complexity into a single command.
Generated by Code IQ