Chapter 5 · CORE

Sandboxed Environment

📄 05_sandboxed_environment.md 🏷 Core

Chapter 5: Sandboxed Environment

In Chapter 4: Sub-Agent Execution System, we built a team of Sub-Agents capable of writing code and executing complex tasks in the background.

But this creates a terrifying problem.

If an AI Agent decides to "cleanup disk space" and runs rm -rf /, and that Agent is running directly on your laptop... it deletes your files.

In this chapter, we introduce the Sandboxed Environment. This is the safety layer that ensures deer-flow is a helpful assistant, not a digital hazard.

The Motivation: The "Bio-Secure Lab"

Imagine a scientist studying a dangerous virus. They don't do it at their kitchen table. They work in a Bio-Secure Lab with thick glass walls and robotic arms.

The Scientist = The Lead Agent (The Brain).
The Lab = The Sandbox.
The Robotic Arms = The Sandbox Interface.

The Scientist can see inside and manipulate things using the arms, but nothing inside the lab can escape to infect the outside world.

Central Use Case: "The Infinite Loop"

User: "Write a Python script that counts to infinity."

If you run this locally:

The script starts.
It uses 100% of your CPU.
Your computer freezes. You have to force-restart.

With a Sandbox:

The Agent starts the script inside a Docker Container.
The Container hits 100% CPU.
The Sandbox Controller notices the timeout, kills the container, and reports back: "The script timed out."
Your main computer remains completely unaffected.

Key Concept: The `Sandbox` Abstraction

In deer-flow, we don't hard-code "Docker" everywhere. We create a generic interface—a standard set of controls—that any secure environment must obey.

This is defined in backend/src/sandbox/sandbox.py.

The Remote Control

Think of the Sandbox class as a universal remote control for our "Bio-Secure Lab."

Simplified Interface (sandbox.py):

from abc import ABC, abstractmethod

class Sandbox(ABC):
    @abstractmethod
    def execute_command(self, command: str) -> str:
        """Run a terminal command (e.g., 'ls', 'python main.py')"""
        pass

    @abstractmethod
    def write_file(self, path: str, content: str) -> None:
        """Create a file inside the box"""
        pass

    @abstractmethod
    def read_file(self, path: str) -> str:
        """Read a file from inside the box"""
        pass

Explanation: The Lead Agent uses these three buttons. It doesn't care if the sandbox is a Docker container, a Kubernetes Pod, or a Firecracker MicroVM. It just knows it can Execute, Read, and Write.

How It Works: The Lifecycle of a Hazardous Task

Let's see what happens when the Lead Agent wants to run that dangerous code.

Sequence Diagram

sequenceDiagram participant LA as Lead Agent participant S as Sandbox Interface participant D as Docker Container LA->>S: write_file("virus.py", "delete everything") S->>D: [Inject File into /workspace/virus.py] LA->>S: execute_command("python virus.py") Note right of D: Script runs isolated.<br/>It deletes files inside the container.<br/>Your host OS is safe. S->>D: [Capture Output] D-->>S: "Files deleted." S-->>LA: Command finished. Note right of LA: The Agent thinks it worked.<br/>But no real damage was done.

Internal Implementation: Under the Hood

How do we actually implement this "Lab"? deer-flow uses container technology.

1. The Container Factory

When the system starts up (or when a task begins), we spin up a lightweight Linux environment.

In our docker-compose-dev.yaml, we use a specific image for this:

# Inside docker-compose-dev.yaml
provisioner:
  environment:
    - SANDBOX_IMAGE=all-in-one-sandbox:latest

This image is a stripped-down Linux OS containing:

Python
Node.js
Common libraries (Pandas, Numpy)
Nothing else. (No access to your passwords, photos, or system keys).

2. Executing Commands

When the Agent calls execute_command("python script.py"), we don't run os.system() (which is dangerous). We use the container's API.

Simplified Logic (Conceptual):

class DockerSandbox(Sandbox):
    def execute_command(self, command: str):
        # We tell Docker to run this inside the specific container ID
        container = docker_client.containers.get(self.id)
        
        # This runs INSIDE the box, not on the host
        result = container.exec_run(command)
        
        return result.output.decode("utf-8")

Explanation: The exec_run function acts like a portal. It teleports the command into the container, runs it there, and teleports the text output back.

3. File System Isolation (Mounting)

The Agent needs to see files, but only specific files. We use Volume Mounting.

Imagine a hotel room (The Sandbox).

The hotel room is empty.
The Agent brings a suitcase (The Workspace Volume).
The Agent can mess up the room or the suitcase, but they cannot access the hotel safe (Your Host System).

In docker-compose-dev.yaml:

volumes:
  - ${DEER_FLOW_ROOT}/backend/.deer-flow/threads:/root/workspace

Host Path: ${DEER_FLOW_ROOT}/.../threads (A specific folder on your disk).
Container Path: /root/workspace (Where the Agent thinks it lives).

If the Agent runs rm -rf /root/workspace, it only deletes the temporary thread files, not your actual project code.

Safety Features

The Sandbox isn't just about file isolation; it enforces behavior limits.

1. Timeouts

In config.yaml, we define how long a command is allowed to run.

# Simplified Logic
def execute_with_timeout(command):
    try:
        # Allow 30 seconds max
        return container.exec_run(command, timeout=30)
    except TimeoutError:
        container.kill() # Emergency Stop
        return "Error: Execution took too long."

2. Network Restrictions (Optional)

We can configure the sandbox to have No Internet Access. This prevents a malicious script (or a confused AI) from uploading your data to a random server.

Summary

In this chapter, we secured our system.

The Sandbox is a "Bio-Secure Lab" for AI code.
The Interface (Sandbox class) provides a standard remote control (Read, Write, Execute).
The Implementation uses Docker/Containers to ensure that if the AI deletes files or crashes the system, it only destroys a disposable box, not your computer.

Now we have a system that can:

Talk to users (Frontend).
Plan tasks (Lead Agent).
Use tools (Skills).
Execute code safely (Sandbox).

But there is one problem left. If the AI learns something important about you (e.g., "The user hates Java"), it forgets it as soon as the chat history gets too long. We need a way to store facts permanently.

Next Chapter: Long-Term Memory Updater

Generated by Code IQ

← Previous

Sub-Agent Execution System

Long-Term Memory Updater