Welcome back! In the previous chapter, Memorize Pipeline (The Digestion System), we taught our AI how to "eat" raw information and digest it into organized facts (Items) and topics (Categories).
Now, the database is full of memories. But how do we get them out?
If you ask your AI, "What did I say about my allergies?", it needs a strategy to find that specific fact without reading every single document in its brain. This strategy is called the Retrieve Pipeline, or the Recall System.
Imagine walking into a massive library and asking a librarian: "What is the main ingredient in that recipe I liked?"
A bad librarian (simple retrieval) would run down every aisle, pulling every book off the shelf, looking for the word "recipe." This takes forever and is very tiring.
A smart librarian (memU's approach) does this:
This "Lazy Search" strategy saves time (latency) and money (token costs).
memU searches in layers. It starts high-level and only drills down if necessary.
Users often talk in fragments.
The pipeline looks at your chat history and rewrites the query to be complete before it starts searching.
At every step, the AI pauses and asks itself: "Do I have enough information to answer the user?"
Using the retrieval system is simple because the MemoryService (The Central Brain) wraps all the complexity for you.
# The service handles the entire pipeline automatically
response = await service.retrieve(
query="What is the deadline for the project?",
user_id="user_123"
)
print(f"Answer: {response.answer}")
# Output: Answer: The deadline is next Friday.
The result isn't just text; it tells you where it found the answer.
# You can see exactly which memories were used
print(response.retrieved_items)
# Output: [{"summary": "Deadline set for Friday", "score": 0.89}, ...]
How does this work in code? Just like the Memorize Pipeline, the Retrieve Pipeline is a list of steps (a Workflow).
Let's look at src/memu/app/retrieve.py. This file defines the steps the "Librarian" takes.
There are two ways to search:
Here is the simplified "RAG" workflow from the code:
def _build_rag_retrieve_workflow(self):
return [
WorkflowStep(step_id="route_intention", ...), # 1. Rewrite Query
WorkflowStep(step_id="route_category", ...), # 2. Find Folder
WorkflowStep(step_id="sufficiency_after_category", ...), # 3. Check if done
WorkflowStep(step_id="recall_items", ...), # 4. Find Facts (if needed)
WorkflowStep(step_id="sufficiency_after_items", ...), # 5. Check if done
WorkflowStep(step_id="recall_resources", ...),# 6. Find Raw Docs (if needed)
WorkflowStep(step_id="build_context", ...), # 7. Finalize Answer
]
This is the coolest part of the system. It's an AI judging itself.
In the function _decide_if_retrieval_needed, the system sends a prompt to the LLM:
async def _decide_if_retrieval_needed(self, query, retrieved_content, ...):
# We ask the LLM: "Here is what we found so far. Is it enough?"
prompt = f"""
User Question: {query}
What we found: {retrieved_content}
Decision: [ENOUGH] or [NEED_MORE]
"""
response = await client.chat(prompt)
return response == "NEED_MORE"
How does it find the right folder? It uses Embeddings (vectors).
async def _rag_route_category(self, state, ...):
# 1. Turn the user's question into a list of numbers (Vector)
query_vector = await embed_client.embed(state["active_query"])
# 2. Compare numbers against all Category descriptions
hits = await self._rank_categories_by_summary(
query_vector,
top_k=3,
...
)
# 3. Store the best matching folders in the state
state["category_hits"] = hits
return state
This is extremely fast. It can scan thousands of categories in milliseconds.
In this chapter, we learned that retrieval in memU is not a brute-force search. It is a smart, layered investigation.
So, we have digested memories (Chapter 3) and we can recall them (Chapter 4). But where do these files actually live on your computer? Is it a database? A JSON file?
It's time to open the filing cabinet and look at the Storage Layer.
Next Chapter: Storage Layer (The Filing Cabinet)
Generated by Code IQ