Chapter 1 · CORE

Context Engineering

📄 01_context_engineering.md 🏷 Core

Chapter 1: Context Engineering

Welcome to the first chapter of Agent Skills for Context Engineering. If you want to build AI agents that are reliable, smart, and cost-effective, you are in the right place.

The Problem: The Overwhelmed Workbench

Imagine a master carpenter working at a small workbench.

Large Language Models (LLMs) are exactly like this carpenter. The "Context Window" (the amount of text the AI can read at once) is their workbench.

Context Engineering is the art of keeping that workbench clean. It isn't just about asking the AI nicely (Prompt Engineering); it is about curating, compressing, and organizing exactly what data sits on the workbench so the agent can solve the immediate task without distraction.

Use Case: The Forgetful Coder

Let's look at a classic problem. You are building a Coding Agent.

  1. The Naive Approach (Context Stuffing): You paste your entire project—100 files, 50,000 lines of code—into the chat.
  2. The Result: The AI forgets the rules you set at the beginning. It hallucinates functions that don't exist. This is called the "Lost-in-the-Middle" phenomenon. The AI has an "Attention Budget," and you just spent it all on noise.

In this chapter, we will build a simple Context Manager to solve this.

Key Concepts

1. The Attention Budget

The context window is a limited resource. Every word you send costs money and, more importantly, costs attention. The more text you provide, the less "brain power" the model allocates to each specific word.

2. Signal vs. Noise

Context Engineering is the process of maximizing Signal and minimizing Noise.

Building a Basic Context Manager

Let's look at how we organize data for the AI. We don't just dump strings; we organize them into System Instructions, Relevant History, and the Current Task.

Step 1: The System Prompt (The Rules)

These are the permanent rules the agent must follow. They always stay on the workbench.

# This is the foundation of our context
system_prompt = {
    "role": "system",
    "content": "You are a helpful coding assistant. Answer briefly."
}

Explanation: This is the "Contract" that defines who the agent is. It never leaves the context.

Step 2: Managing History (The Moving Window)

We cannot keep every message forever. We need a way to add messages but ensure we don't overflow.

chat_history = []

def add_message(role, content):
    """Adds a message to our local history list."""
    msg = {"role": role, "content": content}
    chat_history.append(msg)

Explanation: We store the conversation in a simple list. As the user chats, this list grows.

Step 3: Curating the Context (The Selection)

This is where Context Engineering happens. We select only the most recent messages to send to the LLM.

def get_curated_context(limit=3):
    """Selects only the system prompt and last 3 messages."""
    # Always include the system prompt (High Signal)
    context = [system_prompt]
    
    # Grab only the recent history (preventing overflow)
    recent_history = chat_history[-limit:]
    
    return context + recent_history

Explanation: Even if chat_history has 500 messages, we only put the System Prompt and the last 3 messages on the "workbench." This keeps the AI focused.

Internal Implementation: How it Works

What happens under the hood when a user asks a question? The Context Manager acts as a gatekeeper. It doesn't let the raw data hit the LLM directly.

sequenceDiagram participant U as User participant CM as Context Manager participant LLM as AI Model U->>CM: "Fix the bug in auth.py" Note over CM: 1. Fetch System Prompt Note over CM: 2. Fetch 'auth.py' content Note over CM: 3. Ignore 'database.py' (Noise) CM->>LLM: Curated Context (System + File + Query) LLM->>CM: "Here is the fix..." CM->>U: Display Answer

Deep Dive: Context Compression

Sometimes, simply cutting off old messages (like we did in get_curated_context) is too aggressive. We might forget important details from the start of the conversation.

Advanced Context Engineering uses Compression. Instead of deleting old messages, we summarize them.

The Compression Logic

We check if the history is getting too long. If it is, we take the oldest messages and turn them into a tiny summary.

def compress_history(history):
    """Simplifies old messages into a summary."""
    text_block = "\n".join([m['content'] for m in history])
    
    # In a real app, an LLM would generate this summary string
    summary = f"Summary of conversation: {text_block[:50]}..."
    
    return {"role": "system", "content": summary}

Explanation: We take a large block of text and crush it down. We retain the meaning (the signal) but remove the wordiness (the noise).

Applying Compression

Now we update our retrieval logic to include this summary.

def get_smart_context():
    # 1. Oldest stuff becomes a summary
    summary_msg = compress_history(chat_history[:-3])
    
    # 2. Recent stuff stays verbatim
    recent_msgs = chat_history[-3:]
    
    # 3. Combine: System + Summary + Recent
    return [system_prompt, summary_msg] + recent_msgs

Explanation: The workbench now contains:

  1. The Rules (System Prompt)
  2. The Notes from the past (Summary)
  3. The immediate tools (Recent messages)

This is a perfectly engineered context!

Why This Matters

By organizing the context this way, we solve the "Lost-in-the-Middle" problem.

Summary

In this chapter, you learned:

  1. The Workbench Analogy: Don't overload the AI's attention.
  2. Signal vs. Noise: Curate what you send.
  3. Basic Implementation: How to structure System Prompts, History, and Summaries.

But wait—if we have a massive project, how do we decide which specific pieces of information to load onto the workbench at the right time? We can't just summarize everything.

We need a way to reveal information only when it is needed.

Next Chapter: Agent Skill (Progressive Disclosure)


Generated by Code IQ