Chapter 2 ยท CORE

The Cognitive Brain

๐Ÿ“„ 02_the_cognitive_brain.md ๐Ÿท Core

Chapter 2: The Cognitive Brain

In the previous chapter, Agent Adapters, we built the physical "body" for our AI. We gave it a Minecraft skin, a Discord account, or a Telegram bot token.

But right now, that body is empty. If a zombie walks up to our Minecraft bot, the bot will just stand there and get eaten. It has eyes (data) and hands (API connections), but it lacks the will to use them.

In this chapter, we will build The Cognitive Brain. This is the "Prefrontal Cortex" of airi. It transforms raw data into intelligent action.

The Motivation: From Chatbot to Agent

Most AI Chatbots work like this:

  1. User types "Hello".
  2. AI replies "Hi there".

Airi is different. It is an Agent. It lives in a continuous loop.

  1. Event: A zombie appears 5 blocks away.
  2. Thought: "I am low on health. I should run."
  3. Action: The AI writes code to make the character sprint away.

We need a system that doesn't just output text, but actually writes and executes computer code to control its body.

Key Concepts

To understand the Brain, think of a pilot in a cockpit.

1. Perception (The Dials)

The Brain receives a constant stream of "Events." In Minecraft, this might be chat_message, entity_spawn, or health_change. This is the raw sensory input.

2. Context (The Flight Log)

The Brain maintains a short-term memory. It remembers the conversation history and what it just did. If it just tried to open a door and failed, the context reminds it: "Door is locked."

3. The REPL (The Controls)

This is the superpower of airi. REPL stands for Read-Eval-Print Loop. Instead of the AI saying "I want to jump," the AI outputs actual JavaScript code: await bot.jump(). The Brain executes this code immediately.

How to Use: Installing the Brain

You generally don't "call" the Brain directly. Instead, you plug it into your Agent Adapter.

In the Minecraft example (services/minecraft/src/main.ts), we load the brain as a plugin.

import { CognitiveEngine } from './cognitive'

// ... inside the main function ...

// 1. Connect to the Brain server
const airiClient = new Client({ /* config */ })

// 2. Load the Brain into the Bot
await bot.loadPlugin(CognitiveEngine({ airiClient }))

Explanation: This single line starts the heartbeat. The CognitiveEngine takes over the bot's nervous system, listening for events and preparing to issue commands.

Internal Implementation: The Thinking Loop

What happens when a zombie appears? Let's trace the thought process.

The Flow of Consciousness

sequenceDiagram participant World as Minecraft World participant Percept as Perception System participant Brain as Cognitive Brain participant LLM as AI Model (GPT/Claude) World->>Percept: Zombie approaches! Percept->>Brain: Event: { type: 'danger', source: 'zombie' } Note right of Brain: Brain adds "Context"<br/>(Health: Low, Inventory: Sword) Brain->>LLM: Here is the situation. What do I do? LLM->>Brain: Execute code: await runAway() Brain->>World: Executes JavaScript to move player

Deep Dive: Inside the Code

The logic resides primarily in services/minecraft/src/cognitive/conscious/brain.ts. Let's break down the massive file into bite-sized logic chunks.

1. Processing Events

The Brain sits idle until an event arrives. It puts events in a queue so it doesn't get overwhelmed.

// derived from Brain.processEvent method
private async processEvent(bot: MineflayerWithAgents, event: BotEvent): Promise<void> {
  // 1. Check if we are paused (sleeping)
  if (this.paused) return

  // 2. Create a "Turn ID" to track this specific thought
  const turnId = ++this.turnCounter

  // 3. Build the prompt for the AI
  const userMessage = this.buildUserMessage(event, contextView)
  
  // ... continue to LLM ...
}

Explanation: This is the start of a "Cognitive Cycle." The brain checks if it's awake, assigns a number to the current thought (Turn ID), and prepares the data.

2. The Prompt (What the AI sees)

The buildUserMessage function constructs a text description of the world for the Large Language Model (LLM).

// derived from Brain.buildUserMessage
private buildUserMessage(event: BotEvent, contextView: string): string {
  const parts: string[] = []

  // Describe the event: "Player Bob said Hello"
  parts.push(`[EVENT] ${event.type}: ${JSON.stringify(event.payload)}`)

  // Describe the world: "I see a tree. I have an axe."
  parts.push(contextView)

  return parts.join('\n\n')
}

Explanation: The AI cannot "see" the screen. It relies on this text description. If this text is inaccurate, the AI will hallucinate.

3. The Decision (The REPL)

This is the most critical part. The LLM returns a string of text. The Brain treats this text as JavaScript code.

// derived from Brain.processEvent execution phase
try {
  // 1. The LLM gives us text (e.g., "await chat('hello')")
  const codeToEvaluate = result 

  // 2. We execute that text as real code
  const runResult = await this.repl.evaluate(
    codeToEvaluate,
    availableActions, 
    runtimeGlobals 
  )

  // 3. Log the result (Success or Failure)
  this.deps.logger.log(`Executed actions: ${runResult.actions.length}`)
} catch (err) {
  // If the AI wrote bad code, we catch the error here
  console.error('Brain: Failed to execute decision', err)
}

Explanation: Most chatbots stop after step 1. Airi goes to step 2.

The "Action" Functions

The Brain writes code, but what functions can it call? These are defined as Skills or Actions.

For example, gatherWood is a complex action defined in services/minecraft/src/skills/actions/gather-wood.ts:

export async function gatherWood(bot: Mineflayer, count: number) {
  // Logic to find a tree
  const woodBlock = bot.findBlock({ matching: 'log' })
  
  // Logic to walk to it
  await goToPosition(bot, woodBlock.position)

  // Logic to break it
  await breakBlockAt(bot, woodBlock.position)
}

The Brain simply writes await gatherWood(5), and this complex script runs. This is handled by the Native Capabilities Bridge, which we will cover later.

Summary

The Cognitive Brain is the loop that turns Perception into Action.

  1. It waits for events (via Agent Adapters).
  2. It converts the world state into text.
  3. It asks an LLM for a solution.
  4. It executes the LLM's response as JavaScript code.

Now that our AI is thinking and acting, we need a way to see what it is thinking in real-time.

Next Chapter: The "Stage" (Visual Presentation Layer)


Generated by Code IQ