In Chapter 2: External Tool Integration, we gave our Agent "hands" to search the internet for live data (like stock prices).
But what if the answer isn't on the public internet? What if you need the AI to answer questions about your company's private 500-page employee handbook or a specific legal contract?
You can't just copy-paste 500 pages into the chat prompt. It's too expensive, and the model might get confused.
The solution is RAG (Retrieval-Augmented Generation).
Imagine you are taking a test about a history book.
RAG turns the AI from a Memorizer into a Researcher.
RAG might sound technical, but it works exactly like a library.
We will use the framework Agno (as seen in our project file agentic_rag_with_o-3-mini_and_duckduckgo/app.py) to build this.
First, we need a place to store our "chunks" of text. We use a Vector Database (like Milvus).
from agno.vectordb.milvus import Milvus
from agno.embedder.openai import OpenAIEmbedder
# Define where to store the data
vector_db = Milvus(
collection="rag_documents_openai",
uri="http://localhost:19530",
embedder=OpenAIEmbedder()
)
Milvus: The database software.embedder: The tool that converts text into numbers (vectors).Next, we load a PDF into that database. This step automatically handles Chunking (cutting text) and Embedding (converting to numbers).
from agno.knowledge.pdf import PDFKnowledgeBase
# Connect a PDF file to our Vector DB
knowledge_base = PDFKnowledgeBase(
path="company_manual.pdf",
vector_db=vector_db,
)
# This triggers the "Reading" process
knowledge_base.load(recreate=True)
PDFKnowledgeBase: A helper class that knows how to read PDFs..load(): This is where the magic happens. It reads the file, splits it, and saves it to the database.
Finally, we create the Agent. Notice we pass the knowledge parameter.
from agno.agent import Agent
agent = Agent(
model=OpenAIChat(id="gpt-4o"),
knowledge=knowledge_base,
search_knowledge=True, # Enable RAG features
instructions=["Always search the knowledge base first."]
)
agent.print_response("What is the policy on remote work?", stream=True)
What just happened?
knowledge_base.You might be wondering: How does the computer know which paragraph is relevant?
It uses Vector Embeddings. An embedding model turns text into a list of coordinates (numbers).
[0.9, 0.1, 0.2][0.85, 0.1, 0.2] (Close to Apple)[0.1, 0.9, 0.8] (Far away)When you ask "What fruit is red?", the system converts your question into numbers and looks for the closest matching numbers in the database.
Here is the flow when you ask a question:
In our project file agentic_rag_with_o-3-mini_and_duckduckgo/app.py, we combine RAG with the Web Search we learned in Chapter 2.
This creates a "Super Agent" that checks internal documents first, and if the answer isn't there, it checks the internet.
# From agentic_rag_with_o-3-mini_and_duckduckgo/app.py
# Instructions tell the agent the order of operations
instructions = [
"1. Knowledge Base Search:",
" - ALWAYS start by searching the knowledge base",
"2. External Search:",
" - If knowledge base yields insufficient results, use duckduckgo_search"
]
agent = Agent(
model=model,
knowledge=knowledge_base,
tools=[DuckDuckGoTools()], # Tool from Chapter 2
instructions=instructions
)
This is the power of Agentic RAG. It's not just looking up text; it's making a decision: "Do I have this info in my files? No? Okay, I'll Google it."
In this chapter, we learned:
We now have a single agent that can use tools and read documents. But in a large software system, one agent isn't enough. You might need a team of agents working together.
๐ Next Step: Multi-Agent Orchestration
Generated by Code IQ