Chapter 2 Β· AGENTS

Lead Agent & Orchestration

πŸ“„ 02_lead_agent___orchestration.md 🏷 Agents

Chapter 2: Lead Agent & Orchestration

In Chapter 1: Frontend Workspace & AI Elements, we built the visual dashboardβ€”the "Cockpit" for our application. But a cockpit is useless without a pilot.

In this chapter, we will build the "Brain" of deer-flow: the Lead Agent.

The Motivation: The Conductor

Standard AI models are like solo musicians. They are talented, but if you ask them to play a symphony alone (research a topic, write code, run tests, and fix errors), they get overwhelmed and lose rhythm.

deer-flow uses an Orchestration pattern.

Central Use Case: "The Research Plan"

Let's stick to our example:

User: "Research the history of coffee and save it as a text file."

To answer this, the Lead Agent must execute a complex mental flow:

  1. Ingest: Read the user's request.
  2. Contextualize: Check memory (Did we talk about coffee before?).
  3. Plan: "I need to search the web first."
  4. Delegate: Activate the Researcher Skill.
  5. Review: Look at the search results.
  6. Act: Write the final file.

The Lead Agent manages this entire lifecycle using LangGraph and a powerful concept called Middleware.


Key Concept: The Middleware Architecture

Before the Lead Agent (the LLM) even sees your message, the data goes through a pipeline of "Middleware."

Think of these like filters on a camera lens.

Only after passing through these filters does the "light" (the message) hit the "sensor" (the Lead Agent).

Visualizing the Flow

sequenceDiagram participant U as User participant MW as Middleware Layer participant LA as Lead Agent (Brain) participant SA as Skills (Tools) U->>MW: "Research coffee" note over MW: 1. Attach Uploads<br/>2. Add Memory<br/>3. Summarize History MW->>LA: Enriched Prompt note over LA: Thinking...<br/>"I need to search." LA->>SA: Call Tool (Search) SA-->>LA: Return Results LA-->>U: "Here is your report."

Implementation: Building the Lead Agent

Let's look at backend/src/agents/lead_agent/agent.py. This is where we assemble our conductor.

1. The Factory Function

We define a function make_lead_agent. This creates our graph node. It connects the Model (the LLM) with its Tools and Middleware.

# src/agents/lead_agent/agent.py

def make_lead_agent(config: RunnableConfig):
    # 1. Get configuration (e.g., which AI model to use)
    model_name = config.get("configurable", {}).get("model_name")
    
    # 2. Create the Agent with the specific prompt and middleware list
    return create_agent(
        model=create_chat_model(name=model_name),
        tools=get_available_tools(model_name=model_name),
        middleware=_build_middlewares(config), # <--- The Magic happens here
        state_schema=ThreadState,
    )

2. Building the Middleware Chain

The _build_middlewares function constructs the pipeline. Each middleware modifies the state before the Agent acts.

def _build_middlewares(config: RunnableConfig):
    middlewares = [
        ThreadDataMiddleware(),    # 1. Sets up thread IDs
        UploadsMiddleware(),       # 2. Handles file attachments
        SandboxMiddleware(),       # 3. Prepares the code execution sandbox
        DanglingToolCallMiddleware() # 4. Fixes broken tool calls from history
    ]

    # ... more logic ...
    
    return middlewares

Let's look at two specific middlewares that give the Lead Agent its "intelligence."

3. Summarization Middleware (Long-Term Focus)

If a conversation gets too long, the AI forgets the beginning. This middleware summarizes old messages so the Agent keeps the context without reading 100 pages of text.

def _create_summarization_middleware() -> SummarizationMiddleware | None:
    config = get_summarization_config()
    
    # If not enabled, skip it
    if not config.enabled:
        return None

    # Create middleware that condenses chat history
    return SummarizationMiddleware(
        model=config.model_name,
        trigger=config.trigger, # When to summarize (e.g., every 10 msgs)
        keep=config.keep        # How many recent msgs to keep raw
    )

4. Todo List Middleware (Planning)

When the user asks for a complex task, we don't want the Agent to guess. We want a plan. The TodoListMiddleware injects a special system prompt telling the Agent: "If the task is complex, write a Todo list first."

def _create_todo_list_middleware(is_plan_mode: bool):
    if not is_plan_mode:
        return None

    # Injecting instructions on how to behave
    system_prompt = """
    You have access to the `write_todos` tool.
    CRITICAL RULES:
    - Mark todos as completed IMMEDIATELY after finishing each step.
    - Keep EXACTLY ONE task as `in_progress` at any time.
    """
    
    return TodoListMiddleware(system_prompt=system_prompt)

Technical Deep Dive: The Thinking Process

When the Lead Agent receives the data from the middleware, it enters a "Thinking" loop (often called Chain of Thought).

Modern models (like the DeepSeek model supported in patched_deepseek.py) generate an internal monologue before answering.

Preserving Thoughts (patched_deepseek.py)

Standard libraries often strip away the "Thinking" part of the AI's response to save space. However, in deer-flow, we want to keep this logic so the Frontend can display it (as seen in Chapter 1).

# src/models/patched_deepseek.py

class PatchedChatDeepSeek(ChatDeepSeek):
    def _get_request_payload(self, input_, **kwargs):
        # 1. Get standard payload
        payload = super()._get_request_payload(input_, **kwargs)
        
        # 2. Re-inject the "reasoning_content" (the thoughts) 
        # so the model remembers what it just thought about in the next turn.
        for payload_msg, orig_msg in zip(payload_messages, original_messages):
             payload_msg["reasoning_content"] = orig_msg.reasoning_content
             
        return payload

Summary

In this chapter, we defined the Lead Agent:

  1. The Conductor: It manages the flow, utilizing LangGraph to maintain state.
  2. Middleware: A series of "layers" (Summarization, Todo, Sandbox) that prepare the context before the Agent thinks.
  3. Thought Preservation: We ensure the Agent remembers its own reasoning steps.

The Lead Agent is now smart, capable of planning, and has memory. But currently, it has no hands. It can decide to "Research Coffee," but it doesn't know how to browse the web yet.

In the next chapter, we will give the Lead Agent its instruments.

Next Chapter: Skills & Capabilities System


Generated by Code IQ