In the previous chapter, Tool Design (The Contract), we built a dashboard of buttons (Tools) for our agent. We taught it how to check the weather or search a database.
However, just because an agent can push a button doesn't mean it should.
Imagine you hire an assistant and say, "Book me a flight to Paris."
Most basic LLM implementations are Reactive. They see a prompt and immediately rush to generate the final answer or call a tool. This leads to hallucinations, wasted money on API calls, and silly mistakes.
We need to force the agent to "show its work" before it acts. We call this Interleaved Thinking.
Interleaved Thinking is the process where the Model explicitly outputs a "Thought" block before it outputs an "Action" or "Answer."
Think of a mathematician solving a hard problem. They don't just stare at the page and write "42." They scribble in the margins:
By writing this down, the mathematician (and the Agent) can catch their own errors before they commit to an answer.
Instead of a straight line, our agent now moves in a loop.
Modern models (like MiniMax M2.1 or reasoning-heavy models) support this natively. They separate the Thinking (Internal Monologue) from the Content (External Speech).
When the model responds, it doesn't just give us text. It gives us a structure.
# Conceptual response structure
response = {
"thinking": "The user wants weather. I need to convert 'The Big Apple' to 'New York'.",
"tool_call": "get_weather(city='New York')",
"content": "" # Empty, because we are calling a tool
}
We need to display the "Thinking" to the developer (for debugging) but hide it from the final user (to keep the interface clean).
This is where most beginners fail.
If the agent thinks "I need to check the database," and you execute the tool, you must feed that thought back into the agent's memory for the next turn.
If you don't save the thought, the agent forgets why it called the tool. It wakes up with a database result in its hand and no idea what it was looking for.
# WRONG: We only save the tool result
messages.append({"role": "tool", "content": "Database Result: 50 users"})
# RIGHT: We save the thought AND the result
messages.append({
"role": "assistant",
"thinking": "I will query the database for active users.", # <--- CRITICAL
"tool_call": "query_db()"
})
messages.append({"role": "tool", "content": "Database Result: 50 users"})
Let's look at how we code this loop. We are essentially building a REPL (Read-Eval-Print Loop) for the agent.
We check if the agent wants to perform an action. If yes, we run it, append the result, and ask the agent again.
def run_agent_loop(user_query):
messages = [{"role": "user", "content": user_query}]
while True:
# 1. Ask the LLM
response = llm.generate(messages)
# 2. Print the "Thinking" (Show your work)
if response.thinking:
print(f"๐ง THOUGHT: {response.thinking}")
# 3. If it wants to use a tool, do it
if response.tool_call:
print(f"โ๏ธ ACTION: {response.tool_call.name}")
result = execute_tool(response.tool_call)
# 4. CRITICAL: Add the Thought + Tool Call + Result to history
messages.append(response.full_message_object)
messages.append({"role": "tool", "content": result})
else:
# No tool call? We are done.
return response.content
Explanation:
while True creates the loop.response.thinking so we can see the agent planning.Let's see why this is powerful.
search("CEO of Apple age").
1. Thought: "I need to find the CEO."
2. Action: search("current CEO of Apple").
3. Result: "Tim Cook."
4. Thought: "Now I need Tim Cook's birth date to calculate age."
5. Action: search("Tim Cook birth date").
6. Result: "November 1, 1960."
7. Thought: "Now I calculate 2024 - 1960."
8. Answer: "Tim Cook is the CEO and he is 64 years old."
By interleaving Thought -> Action -> Thought -> Action, the agent can chain simple steps into complex reasoning.
In this chapter, you learned:
We have a problem though. As our agent "thinks" and loops, the chat history grows very fast. Thoughts, tool calls, JSON results, error messages... our Context Window is filling up with implementation details.
If the agent works for an hour, it will forget the user's name. We need a better way to store information than just a linear list of messages.
Next Chapter: Structured Memory Systems
Generated by Code IQ