Welcome to Chapter 6 of the pi-mono tutorial!
In the previous TUI Engine chapter, we built a visual interface to display the agent's output. However, as the agent works, it generates a massive amount of textโthousands of lines of code, logs, and conversation history.
This leads to a critical problem: AI models have a memory limit. If you talk too long, the AI will crash or start forgetting instructions.
In this chapter, we will introduce Context Compaction, the "Garbage Collector" for the agent's memory that allows it to run indefinitely.
Imagine a professor solving a massive math problem on a small chalkboard.
The Problem: If they erase the problem statement, they forget what they are solving. If they erase the variable definitions, the equation stops making sense.
The Solution: Before erasing, the professor writes a small Summary in the corner: "Solving for X, knowing that Y=5."
Context Compaction does exactly this for the AI. It watches the conversation length. When it gets "full," it takes the oldest messages, summarizes them into a concise note, and deletes the raw logs.
This is the limit of the AI's short-term memory (measured in "tokens"). For example, 128,000 tokens. If we exceed this, the API rejects our request.
We set a "Safety Margin." If the conversation fills 90% of the window, we trigger compaction. This ensures we never actually hit the hard limit.
We don't just delete old messages; we ask the AI to read them and write a summary. When compaction happens again later, the AI reads the old summary plus the new messages to create an updated summary. This creates an unbroken chain of memory.
Imagine you are building a complex game with the agent.
Without compaction, the AI would have forgotten Turn 1 (where the engine was built) because it fell off the memory cliff. With compaction, the AI sees a summary: "User built a game engine in Python. Key files are engine.py and player.py."
In pi-mono, compaction is handled automatically by the Agent Session, but it requires configuration settings.
We define when compaction should trigger.
// Define how memory is managed
const compactionSettings = {
enabled: true,
// Start compacting when we only have 16k tokens left
reserveTokens: 16384,
// After compacting, keep the last 20k tokens of raw text
keepRecentTokens: 20000
};
Explanation: reserveTokens is our safety buffer. keepRecentTokens ensures the AI still sees the exact text of the most recent interaction, so the conversation doesn't feel disjointed.
The system needs to constantly check if the "cup is full."
import { shouldCompact } from "./compaction";
// Check if current usage + safety buffer > limit
if (shouldCompact(currentTokens, contextWindow, settings)) {
console.log("Memory full! Triggering compaction...");
await session.runCompaction();
}
Explanation: This function is a simple boolean check. If it returns true, the session pauses the agent and starts the cleaning process.
The compaction process is a multi-step operation involving the LLM itself.
keepRecentTokens.SummaryMessage.
The logic resides in packages/coding-agent/src/core/compaction/compaction.ts. Let's break down the critical functions.
We can't just cut in the middle of a sentence or a tool result. We must find a valid boundary.
// Simplified logic from findCutPoint
export function findCutPoint(entries, settings): CutPointResult {
let tokens = 0;
// Walk backwards from the newest message
for (let i = entries.length - 1; i >= 0; i--) {
tokens += estimateTokens(entries[i]);
// Stop if we have saved enough "recent" memory
if (tokens >= settings.keepRecentTokens) {
return { firstKeptEntryIndex: i };
}
}
return { firstKeptEntryIndex: 0 };
}
Explanation: We count tokens from the bottom up. Once we have enough "recent context" to satisfy the user, everything above that index is marked for the shredder (summarization).
We use a structured prompt to ensure the AI doesn't lose critical details like file names.
// The system prompt used to compress memory
const SUMMARIZATION_PROMPT = `
Summarize the conversation. Use this EXACT format:
## Goal
[What is the user trying to accomplish?]
## Progress
### Done
- [x] [Completed tasks]
## Key Decisions
- [Rationale for changes]
`;
Explanation: By forcing the AI to use a standard format (Goal, Progress, Decisions), we ensure that the summary is actually useful for the next compaction cycle.
This function actually calls the AI to perform the compression.
export async function generateSummary(messages, model, apiKey) {
// 1. Convert message objects to a string script
const conversationText = serializeConversation(messages);
// 2. Ask the AI to summarize it
const response = await completeSimple(model, {
systemPrompt: "You are a precise summarizer.",
messages: [{ role: "user", content: prompt + conversationText }]
});
return response.content; // The summary text
}
Explanation: We use serializeConversation to turn the complex JSON message objects into a readable script, then send it to the Unified AI Interface via completeSimple.
A unique feature of pi-mono is that it tracks which files were modified in the summarized history.
// Inside the main compact function
const { readFiles, modifiedFiles } = computeFileLists(fileOps);
// Append file history to the text summary
summary += `\n\nModified Files: ${modifiedFiles.join(", ")}`;
Explanation: Even if the text summary is vague, the system explicitly appends a list of files touched. This cues the Agent to check those files if it needs context later.
Context Compaction is what differentiates a simple chatbot from a robust Agent. By automatically managing its own memory, the agent becomes capable of handling tasks that span hours or days.
In this chapter, we learned:
We have now covered the core engine, the interface, the tools, the UI, and the memory. The final piece of the puzzle is how to let other developers add features to our agent without changing the core code.
Next Chapter: Extension System
Generated by Code IQ