In the previous chapter, Remote Session Orchestration, we built the "Manager" that handles the phone lines between you and the remote AI agent.
But just having a connection isn't enough. We need to agree on a set of rules for controlling the agent.
Imagine you are texting a friend. You send messages back and forth asynchronously. This is the Chat Stream.
Now, imagine your friend is driving your car.
These interactions are different from chat. They are Control Signals. They require the agent to pause, wait for a specific ID-matched response, or halt execution immediately. This is what the Remote Control Protocol handles.
When the Agent needs permission (e.g., to run a terminal command), it sends a control_request. This request has a unique ID (like a ticket number). The Agent freezes until it receives a control_response with that exact matching ticket number.
Sometimes the user needs to force the Agent to stop thinking or working. This is a one-way signal sent from the User to the Agent to cancel the current operation.
Let's look at how to handle these control signals in your application code.
The Agent wants to use a tool (like listing files). It sends a request, and your app needs to show a "Yes/No" popup.
When setting up the Manager (from Chapter 1), we define onPermissionRequest.
// Inside your callbacks definition
onPermissionRequest: (request, requestId) => {
console.log(`Agent wants to: ${request.tool_name}`);
// Save this ID! We need it to reply.
showUserPrompt(requestId, request.tool_name);
}
Explanation: The Manager unpacks the complex protocol message and gives you just what you need: the request details and the requestId ticket number.
The user clicks "Yes" in your UI. You must send a response back using the same ticket number.
// User clicked "Allow"
const response = {
behavior: 'allow',
updatedInput: {} // You can even edit the input here!
};
// Send the approval back to the manager
manager.respondToPermissionRequest(savedRequestId, response);
Explanation: We tell the manager to find the request with savedRequestId and mark it as allowed. The Agent will now resume working.
The user clicks "No".
// User clicked "Deny"
const response = {
behavior: 'deny',
message: 'User rejected this command.'
};
manager.respondToPermissionRequest(savedRequestId, response);
Explanation: We send a denial. The Agent will receive an error on its side saying the user refused the action.
If the Agent is generating a long response or running a script you don't like, you can hit the "Stop" button.
// User clicked "Stop" / Ctrl+C
manager.cancelSession();
Explanation: This sends an interrupt signal. It doesn't need a ticket number; it applies to whatever is happening right now.
How does the system distinguish a simple chat message from a high-priority control signal?
Here is how a Permission Request travels through the system.
Let's look at RemoteSessionManager.ts to see how it separates these messages.
When a message arrives from the WebSocket, the Manager checks its type.
// RemoteSessionManager.ts
private handleMessage(message): void {
// Priority 1: Control Requests (Permissions)
if (message.type === 'control_request') {
this.handleControlRequest(message);
return;
}
// Priority 2: Standard Chat
if (isSDKMessage(message)) {
this.callbacks.onMessage(message);
}
}
Explanation: This is the traffic cop. If the packet says control_request, it goes to the special handler. Everything else is treated as standard chat data.
We can't just send a response into the void; we need to know which request we are answering. The Manager keeps a list.
// RemoteSessionManager.ts
private handleControlRequest(request): void {
const { request_id, request: inner } = request;
// Store the request in memory
this.pendingPermissionRequests.set(request_id, inner);
// Notify the UI
this.callbacks.onPermissionRequest(inner, request_id);
}
Explanation: We use a Map (a key-value store) to hold the request. The key is the ID. We won't delete this entry until the user responds.
When you call respondToPermissionRequest, the Manager wraps your answer in the correct protocol JSON.
// RemoteSessionManager.ts
const response = {
type: 'control_response',
response: {
subtype: 'success',
request_id: requestId, // The Ticket Number
response: {
behavior: result.behavior // 'allow' or 'deny'
},
},
};
this.websocket?.sendControlResponse(response);
Explanation: The Agent is expecting this exact nested JSON structure. If the structure is wrong, the Agent will keep waiting forever.
The Remote Control Protocol is the safety layer of our application. It allows synchronous, high-stakes interactions like tool permissions to happen safely alongside the asynchronous chat stream.
By using Request IDs, we ensure that when you say "Yes," you are approving exactly the action the Agent is currently asking about, keeping the human and the AI perfectly in sync.
Now that we can talk to the agent and control it, we face a new problem: How do we know what the Agent's computer looks like? Is a file open? Is the terminal running?
Let's explore how we mirror the remote computer's status in Synthetic State Bridging.
Generated by Code IQ