Welcome to the "Brain" of Jarvis!
In the previous chapter, Hybrid Transcription Engine, we successfully turned your voice into text. Whether you use the cloud or a local model, Jarvis now receives a string of text like: "Open Spotify and play my discovery weekly."
But right now, that is just text. Jarvis doesn't know that "Open" is a command or that "Spotify" is an app. It's just a string of characters.
In this chapter, we will build the Unified AI Agent. This is the intelligence layer that takes that text, thinks about it, and decides whether to simply talk back to you or to perform an action on your computer.
Imagine you have a Smart Home with devices from different brands: a Samsung TV, Philips lights, and a Sony speaker. You do not want three different remote controls. You want one Universal Remote.
We face a similar challenge with AI models:
We don't want to rewrite our app every time we switch models. We want a "Universal Remote" architecture.
Furthermore, we need to distinguish between Chatting and Doing.
In programming, an "Interface" is like a contract. We create a strict rule that says: "Any AI model we use MUST have a function called generateText."
This means the rest of our app doesn't care if it's talking to Google Gemini or a local Llama model; it just calls the function defined in the contract.
This is the manager. It sits between the user and the raw AI models. It holds the "Memory" of the conversation and manages the "Tools" available to the system.
Standard AI just predicts the next word. "Function Calling" is a special feature where the AI can respond with a structured data packet (JSON) saying: "I want to use the 'open_app' tool with the argument 'Spotify'."
Here is how the Agent decides what to do:
Let's build this system from the bottom up.
We define what an AI Provider looks like. This code lives in src/core/llm-provider.ts.
// src/core/llm-provider.ts
// The "Universal Remote" definition
export interface LLMProvider {
readonly name: string;
// 1. Simple chat
generateText(prompt: string): Promise<string>;
// 2. Smart thinking (Decision making)
callWithTools(prompt: string, tools: ToolDefinition[]): Promise<LLMResponse>;
}
Explanation: Every AI service we add later (OpenAI, Gemini, Ollama) must implement these two methods. If they do, they can plug into Jarvis immediately.
Now let's implement the contract for our local model using Ollama. This allows us to run intelligence offline.
This creates src/core/providers/ollama-provider.ts.
// src/core/providers/ollama-provider.ts
export class OllamaProvider implements LLMProvider {
readonly name = 'Ollama';
async generateText(prompt: string): Promise<string> {
// We talk to the local Ollama server via HTTP
const response = await fetch('http://127.0.0.1:11434/api/chat', {
method: 'POST',
body: JSON.stringify({
model: 'llama3.2',
messages: [{ role: 'user', content: prompt }]
})
});
const data = await response.json();
return data.message.content; // The AI's reply
}
// callWithTools implementation is handled similarly, sending tool definitions to Ollama
}
Beginner Note: fetch is the standard way JavaScript sends data to servers. Even though Ollama is on your computer, it acts like a mini-web-server listening on port 11434.
Before the AI can use tools, we need to define them. We use a Registry. Think of this as a toolbox where we label every tool so the AI knows what they do.
Located in src/tools/tool-registry.ts.
// src/tools/tool-registry.ts
export class ToolRegistry {
private tools = new Map();
register(name: string, description: string, func: Function) {
this.tools.set(name, { description, func });
}
async execute(name: string, args: any) {
const tool = this.tools.get(name);
return await tool.func(args); // Actually run the code
}
}
We populate this registry with actual code:
// src/tools/tool-registry.ts (Example Usage)
const registry = new ToolRegistry();
// Teach Jarvis how to open apps
registry.register(
'open_app',
'Opens an application on the computer',
async ({ target }) => {
// This is the actual system command to open an app
require('child_process').exec(`open -a "${target}"`);
return `Opened ${target}`;
}
);
Finally, we combine everything in src/agents/unified-agent.ts. This is the code that orchestrates the flow.
// src/agents/unified-agent.ts
export class UnifiedAgent {
constructor(private provider: LLMProvider, private tools: ToolRegistry) {}
async processQuery(userInput: string): Promise<string> {
// 1. Ask the AI: "Here is what the user said, and here are my tools. What should I do?"
const response = await this.provider.callWithTools(
userInput,
this.tools.getDefinitions()
);
// 2. Did the AI decide to use a tool?
if (response.type === 'tool_call') {
// 3. Yes! Execute the tool (e.g., Open Calculator)
const result = await this.tools.execute(
response.toolName,
response.toolArgs
);
return result;
}
// 4. No, it's just a chat. Return the text.
return response.text;
}
}
Let's look at a concrete example of how data flows when you speak to Jarvis now.
get_system_info."In this chapter, we built the Unified AI Agent.
Now Jarvis has Ears (Chapter 2), Voice-to-Text (Chapter 3), and a Brain (Chapter 4).
However, we have different parts of the app running in different places (native C++ threads, background Node.js processes, and the frontend window). How do we keep them all in sync?
In the next chapter, we will learn how to send messages between these separate parts without the app freezing.
Next Chapter: IPC & State Management
Generated by Code IQ