Chapter 6 ยท CORE

Native Capabilities Bridge

๐Ÿ“„ 06_native_capabilities_bridge.md ๐Ÿท Core

Chapter 6: Native Capabilities Bridge

In the previous chapter, Sensory Audio Processing (Hearing), we gave airi the ability to listen to your voice. Before that, in The "Stage" (Visual Presentation Layer), we gave her a face.

But currently, airi is like a "Brain in a Jar." She lives inside a web browser window. She can chat with you, but she cannot interact with your computer. She cannot see your other open windows, she cannot click your mouse, and she cannot access heavy hardware power directly.

In this chapter, we will build the Native Capabilities Bridge. This is the nervous system that connects the digital brain to the physical hardware of your computer.

The Motivation: Escaping the Browser Sandbox

Web browsers (like Chrome or the one embedded in Airi) are designed to be Sandboxed. This means a website is strictly forbidden from touching your operating system for security reasons. A website cannot say: "Move the user's mouse to the left."

The Problem: We want airi to be an assistant, not just a chatbot.

  1. Use Case: You say, "Airi, look at this error in my code editor."
  2. Requirement: Airi needs to take a screenshot of your entire screen, not just her own window.
  3. Restriction: Standard web pages cannot do this automatically.

The Solution: We create a Bridge.

Key Concepts

To understand this system, think of a Mech Suit.

1. The Pilot (The Web Frontend)

This is the HTML/JS interface you see. It is smart but weak. It decides what to do (e.g., "I want to click that button").

2. The Mech (The Native Process)

This is the application wrapper. We use two technologies here:

3. IPC (Inter-Process Communication)

This is the dashboard control panel. When the Pilot pushes a button, a signal travels down a wire to the Mech's hydraulic arms. In coding terms, the Frontend sends a message (Event) to the Backend, and the Backend executes the system command.

How to Use: "Ghost Mode" (Pass-Through)

Let's look at a cool feature called Window Pass-Through. Sometimes, you want Airi to float on your screen like a hologram. You want to be able to click through her to the window behind her.

This requires native OS commands. Here is how the Frontend asks for this superpower.

The Command (Frontend)

The web page calls a function exposed by the bridge.

// Inside a Vue component or logic file
import { invoke } from '@tauri-apps/api/core'

async function enableGhostMode() {
  // We send a command string to the Rust backend
  await invoke('start_pass_through')
  
  console.log('Clicks now go through the window!')
}

Explanation: The invoke function is the magic telephone. We dial 'start_pass_through'. We don't need to know how it works, just that the backend handles it.

Internal Implementation: The Architecture

How does a JavaScript command turn into a Windows or macOS system call?

The Command Flow

sequenceDiagram participant UI as Web Frontend (Pilot) participant Bridge as IPC Bridge participant Rust as Rust Plugin (Mech) participant OS as Operating System UI->>Bridge: invoke('start_tracing_cursor') Bridge->>Rust: Calls start_tracing_cursor() function loop Every 32ms Rust->>OS: Where is the mouse? OS-->>Rust: X: 500, Y: 300 Rust->>UI: Emit event 'cursor-position' {x: 500, y: 300} end

Deep Dive 1: The Rust Plugin (The Muscle)

Let's look at crates/tauri-plugin-window-pass-through-on-hover/src/lib.rs. This is where the raw power lives.

Unlike JavaScript, Rust can talk directly to the operating system (Windows API or macOS Cocoa).

// This function is callable from JavaScript
#[tauri::command]
async fn start_pass_through<R: Runtime>(
  window: tauri::Window<R>
) -> Result<(), String> {
  
  // Call the helper that speaks "Operating System" language
  // This changes the window style to be transparent to clicks
  set_pass_through_enabled(&window, true).map_err(|e| {
    log::error!("Failed: {e}");
    e
  })
}

Explanation:

  1. #[tauri::command]: This label tells the system "Expose this function to the frontend."
  2. set_pass_through_enabled: This is a lower-level function (specific to Windows or Mac) that changes the window flags.

Deep Dive 2: Global Input Listening (Feeling the World)

Airi can also "feel" when you type or move your mouse, even when her window is minimized. This uses a library called rdev.

From crates/tauri-plugin-rdev/src/lib.rs:

// Start a background thread to listen to the OS
std::thread::spawn(move || {
  // 'listen' hooks into the OS global event loop
  listen(move |event: Event| {
    
    // Convert OS event to a Web Event name
    let event_name = match event.event_type {
      EventType::KeyPress(_) => "keydown",
      EventType::MouseMove { .. } => "mousemove",
      _ => return,
    };

    // Send the news up to the Frontend
    app.emit(event_name, &event);
  });
});

Explanation:

  1. std::thread::spawn: We create a parallel process so we don't freeze the app.
  2. listen: This connects to the global OS hook. It intercepts every mouse move on your computer.
  3. app.emit: This broadcasts the event back to the web page. Now the Cognitive Brain knows you are active!

Deep Dive 3: The Universal Tool Connector (MCP)

Airi uses the Model Context Protocol (MCP) to connect to external tools (like a database or a file explorer). This acts like a universal USB port.

From crates/tauri-plugin-mcp/src/lib.rs:

#[tauri::command]
async fn call_tool(
  state: State<'_, Mutex<McpState>>,
  name: String,
  args: Option<Map<String, Value>>,
) -> Result<CallToolResult, String> {
  // 1. Get the connected tool client
  let state = state.lock().await;
  
  // 2. Send the command to the external tool
  let result = state.client.as_ref().unwrap()
    .call_tool(CallToolRequestParam { name, arguments: args })
    .await;

  // 3. Return the result to the AI
  Ok(result.unwrap())
}

Explanation: This allows the AI to say "List Files in Folder X." The request goes from JS -> Rust -> External Tool process, and the result flows all the way back.

Wiring it all up: The Main Process

Finally, all these superpowers need to be registered when the app starts. This happens in the Electron main file: apps/stage-tamagotchi/src/main/index.ts.

app.whenReady().then(async () => {
  // 1. Setup the screen capture capability
  initScreenCaptureForMain()

  // 2. Setup the "Server Channel" (Another bridge type)
  const serverChannel = injeca.provide('modules:channel-server', 
    () => setupServerChannelHandlers()
  )

  // 3. Create the actual windows
  setupMainWindow(dependsOn)
})

Explanation: This acts as the bootloader. It initializes the "nervous system" before the "eyes" (windows) even open.

Summary

The Native Capabilities Bridge breaks the walls of the web browser.

  1. It uses IPC (Inter-Process Communication) to send messages between the UI and the System.
  2. It uses Rust plugins for high-performance tasks like global mouse tracking and input simulation.
  3. It allows airi to become a true desktop agent that can see your screen, click through windows, and control external tools via MCP.

Now we have a fully functional body. We have senses, muscles, a brain, and a face. But how do we know what is going on inside that complex brain when things go wrong?

Next Chapter: Introspection & Debugging System


Generated by Code IQ