Chapter 3 ยท CORE

Skills & Capabilities System

๐Ÿ“„ 03_skills___capabilities_system.md ๐Ÿท Core

Chapter 3: Skills & Capabilities System

In Chapter 2: Lead Agent & Orchestration, we built the "Brain" of our system. We created a Lead Agent capable of thinking, planning, and maintaining context.

However, a brain without hands cannot interact with the world. If you ask the Lead Agent to "Analyze this spreadsheet" or "Scrape this website," it can describe how to do it, but it cannot actually execute the code to do it.

In this chapter, we give our Agent its "Hands." We call them Skills.

The Motivation: The "Matrix" Learning Style

In traditional software, adding a new feature usually means rewriting the core code of the bot. In deer-flow, we want a system that is modular and extensible.

Think of Skills like the learning pods in the movie The Matrix.

  1. The Agent realizes: "I need to fly a helicopter."
  2. It loads the "Helicopter Pilot Skill."
  3. It instantly knows the controls (parameters) and the maneuvers (scripts).

If you want to add a new capability (like sending emails or resizing images), you don't touch the Lead Agent's code. You simply drop a new Skill Folder into the project.

Central Use Case: "Analyze Sales Data"

Let's imagine the user uploads a file named sales_2024.csv and asks:

User: "What is the average revenue per region in this file?"

To solve this, the Lead Agent needs a Data Analysis Skill. It needs to know:

  1. When to use it (when users ask about data).
  2. How to use it (what arguments to pass).
  3. The Code to actually crunch the numbers.

The Anatomy of a Skill

In deer-flow, a Skill is not just a Python script. It is a package consisting of two parts:

  1. The Manual (SKILL.md): Instructions for the AI.
  2. The Tools (scripts/): Executable code for the computer.

Let's look at the folder structure:

skills/public/data-analysis/
โ”œโ”€โ”€ SKILL.md           <-- The Manual for the AI
โ””โ”€โ”€ scripts/
    โ””โ”€โ”€ analyze.py     <-- The Python script that does the work

Part 1: The Manual (SKILL.md)

This is the interface. It tells the Lead Agent what this skill does. It uses Markdown because LLMs (Large Language Models) are very good at reading documentation.

Simplified SKILL.md:

---
name: data-analysis
description: Use this skill when the user uploads Excel or CSV files and wants to generate statistics or SQL queries.
---
# Data Analysis Manual

## Workflow
1. Inspect the file structure first.
2. Run an SQL query to get the answer.

## Usage
Run the script with: `python analyze.py --files <path> --sql <query>`

Explanation: When the system starts, it reads this file. It injects this "knowledge" into the Lead Agent. Now, when you ask about CSVs, the Agent thinks: "Aha! I have a manual for that."

Part 2: The Tools (scripts/)

This is the engine. The Lead Agent cannot run the analysis itself (it's just a text generator). It delegates the heavy lifting to a script.

Simplified scripts/analyze.py:

import pandas as pd
import duckdb

# 1. Load the file the AI told us to load
df = pd.read_csv("sales_2024.csv")

# 2. Run the SQL query the AI wrote
result = duckdb.query("SELECT region, AVG(revenue) FROM df").df()

# 3. Print the result so the AI can read it
print(result.to_markdown())

Explanation: This script doesn't know about the User or the Chat. It just takes a file, runs a query, and outputs text.


How It Works: The Execution Flow

How do we connect the Brain (Chapter 2) to the Hands (Chapter 3)?

When the User asks "Analyze sales.csv", the following sequence happens:

  1. Selection: The Lead Agent looks at its available tools and selects data-analysis.
  2. Parameterization: Based on the SKILL.md, the Agent figures out it needs to write an SQL query.
  3. Command Generation: The Agent constructs a terminal command.
  4. Execution: The system runs the command and captures the output.

Sequence Diagram

sequenceDiagram participant LA as Lead Agent participant S as Skill System participant PY as Python Script LA->>LA: User wants analysis. LA->>LA: I see "data-analysis" in my tool list. LA->>LA: reading SKILL.md... I need to run a query. LA->>S: Execute Tool: "data-analysis" Note right of LA: args: { sql: "SELECT AVG(rev)..." } S->>PY: Run `python analyze.py --sql ...` PY-->>S: Return Result Table (Markdown) S-->>LA: Here is the data. LA-->>User: "The average revenue is..."

Internal Implementation: Loading Skills

How does the backend find these skills? We use a Dynamic Loader.

Instead of hard-coding imports like import data_analysis, we scan the directory. This is why you can add skills without rebooting the core logic in some advanced setups (though usually, a restart is required to load new definitions).

1. The Discovery Loop

We look for every folder that contains a SKILL.md.

# src/skills/loader.py (Simplified)

def load_skills(directory: str):
    skills = []
    
    # 1. Loop through every folder in skills/public
    for folder in os.listdir(directory):
        skill_path = os.path.join(directory, folder)
        
        # 2. Check if SKILL.md exists
        if os.path.exists(f"{skill_path}/SKILL.md"):
            # 3. Parse the file and add to list
            skills.append(parse_skill_markdown(skill_path))
            
    return skills

Explanation: This function acts like a librarian walking through the aisles, noting down every book (Skill) available on the shelf.

2. Converting to AI Tools

Once we have the Skill definition, we need to convert it into a format the LLM (like DeepSeek or GPT) understands. We use the LangChain tool definition format.

# src/skills/converter.py (Simplified)

def create_tool_from_skill(skill):
    return StructuredTool.from_function(
        # The name the AI sees (e.g., "data_analysis")
        name=skill.name,
        
        # The description from the YAML top matter
        description=skill.description,
        
        # The function to call when the AI chooses this
        func=lambda **args: run_script(skill.script_path, args)
    )

Explanation: We wrap the raw script execution in a standardized "Tool" object. The LLM sees this object as a function it can call, complete with a description of what it does.

3. Creating New Skills (init_skill.py)

To make it easy for developers to add new capabilities, deer-flow includes a helper script. You don't need to manually create folders and files.

Command:

python skills/public/skill-creator/scripts/init_skill.py my-new-skill --path skills/public

What it does:

  1. Creates skills/public/my-new-skill.
  2. Creates a template SKILL.md (The Manual).
  3. Creates a scripts/ folder with a dummy Python script.

This standardization ensures every skill looks the same, making the system cleaner.


Why This Architecture?

Separating the Definition (Markdown) from the Execution (Code) has massive benefits:

  1. Language Agnostic: Your script can be in Python, Node.js (chart-visualization skill), or Bash. The Lead Agent doesn't care; it just reads the Markdown and runs the command.
  2. Self-Correction: If the script fails (e.g., "Syntax Error"), the output is sent back to the Lead Agent. The Agent reads the error, thinks "Whoops, I wrote bad SQL," and tries again with a fixed query.
  3. Safety: We can control exactly what tools are available. We aren't letting the AI write arbitrary Python code and run it; we are letting it call specific scripts we designed.

Summary

In this chapter, we equipped our Lead Agent with Skills.

Now our Agent knows how to analyze data, but running complex scripts (like scraping 100 websites or processing big data) can take time. If the Lead Agent runs this script directly, the chat interface will freeze until the script finishes.

To solve this, we need to execute these skills in the background.

Next Chapter: Sub-Agent Execution System


Generated by Code IQ