Welcome to the Remote project!
Imagine you are building a robot that lives in a secure room miles away. You are sitting at your laptop at home. You need a way to tell the robot what to do ("Pick up the box") and a way for the robot to tell you what is happening ("I picked it up").
In our project, the "robot" is a remote AI agent running in a secure container, and your laptop is the user interface.
Remote Session Orchestration is the "Project Manager" that sits in the middle. It handles the phone lines, ensuring your instructions get to the agent and the agent's replies get back to you.
Connecting to a remote AI isn't as simple as opening a single pipe. We have two distinct needs:
Without a central manager, your code would be a mess of loose wires. The RemoteSessionManager ties everything together into one clean package.
The RemoteSessionManager class is the heart of this system. It initializes the connection and stays alive as long as you are talking to the AI. It remembers who you are (Authentication) and which session you are looking at (Session ID).
To make the experience smooth, the manager uses two different methods of communication simultaneously:
Let's look at how to set up this manager in your application.
First, we need to pack our "luggage"βthe configuration data required to travel to the remote server.
import { createRemoteSessionConfig } from './RemoteSessionManager';
// Setup the credentials and target session
const config = createRemoteSessionConfig(
'session-uuid-123', // The specific room ID
() => 'my-secret-token', // Your access pass
'org-uuid-456' // Your organization ID
);
Explanation: We create a config object that holds the "address" of the remote session and the keys to get in.
The manager needs to know what to do when it hears from the AI. We provide "callbacks"βfunctions that run when specific events happen.
// What should happen when the AI speaks?
const callbacks = {
onConnected: () => console.log('We are live!'),
onMessage: (msg) => {
// The AI sent a text message or code update
console.log('AI said:', msg);
},
// The AI wants permission to do something risky
onPermissionRequest: (req, id) => {
console.log('AI wants to use a tool:', req);
}
};
Explanation: Here we define our reactions. When the AI sends a message, we log it. When the AI asks for permission (like "Can I delete this file?"), we get notified.
Now we put it all together and tell the manager to connect.
import { RemoteSessionManager } from './RemoteSessionManager';
// Initialize the manager
const manager = new RemoteSessionManager(config, callbacks);
// Open the connection
manager.connect();
Explanation: manager.connect() starts the "Ears" (WebSocket). It connects to the server and waits for the onConnected callback to fire.
When the user types something in the UI, we use the manager to send it.
// The user typed "Hello!"
const userContent = { type: 'text', text: 'Hello!' };
// Send it to the remote session
await manager.sendMessage(userContent);
Explanation: This uses the "Mouth" (HTTP). We don't wait for a reply here; the reply will come later through the onMessage callback we defined earlier.
How does the RemoteSessionManager actually juggle these tasks?
Here is a high-level view of what happens when a user interacts with the session.
Let's look at the actual code inside RemoteSessionManager.ts to see how it handles incoming data.
The manager receives a raw message from the WebSocket and decides if it's a simple chat message or a system request.
// Inside RemoteSessionManager.ts
private handleMessage(message): void {
// Is the AI asking for permission?
if (message.type === 'control_request') {
this.handleControlRequest(message);
return;
}
// Is it a normal chat message?
if (isSDKMessage(message)) {
this.callbacks.onMessage(message);
}
}
Explanation: This acts like a mail sorter. If the message is a "control request" (like asking for permission), it goes to a special handler. If it's standard content (text, code), it goes to the user interface.
To understand the specific format of these messages, we will explore the Remote Control Protocol in the next chapter.
The manager relies on a helper class for the actual WebSocket connection.
// Inside connect()
this.websocket = new SessionsWebSocket(
this.config.sessionId,
this.config.orgUuid,
this.config.getAccessToken,
wsCallbacks // Pass our internal handlers
);
void this.websocket.connect();
Explanation: The RemoteSessionManager doesn't know *how to speak WebSocket itself; it delegates that to SessionsWebSocket. We will cover how we keep this connection alive in Resilient WebSocket Transport.*
One of the most critical roles of the Orchestrator is handling "Control Requests." This happens when the AI wants to run a command on the remote machine.
// Inside handleControlRequest()
if (inner.subtype === 'can_use_tool') {
// Store the request so we can answer it later
this.pendingPermissionRequests.set(request_id, inner);
// Ask the UI to show a "Yes/No" prompt
this.callbacks.onPermissionRequest(inner, request_id);
}
Explanation: The manager pauses the AI's action, stores the request ID, and asks the user interface to display a prompt. The AI will wait until the user responds.
The Remote Session Orchestration layer is the glue that holds the application together. It provides a simple interface for the UI to send commands and receive updates, hiding the complexity of managing HTTP requests and WebSocket streams.
Now that we understand how the data moves, we need to understand what that data actually looks like.
Let's move on to Remote Control Protocol to see the language our agent speaks.
Generated by Code IQ