Chapter 4 ยท CORE

Retrieve Pipeline (The Recall System)

๐Ÿ“„ 04_retrieve_pipeline__the_recall_system_.md ๐Ÿท Core

Chapter 4: Retrieve Pipeline (The Recall System)

Welcome back! In the previous chapter, Memorize Pipeline (The Digestion System), we taught our AI how to "eat" raw information and digest it into organized facts (Items) and topics (Categories).

Now, the database is full of memories. But how do we get them out?

If you ask your AI, "What did I say about my allergies?", it needs a strategy to find that specific fact without reading every single document in its brain. This strategy is called the Retrieve Pipeline, or the Recall System.

The Motivation: The Smart Librarian

Imagine walking into a massive library and asking a librarian: "What is the main ingredient in that recipe I liked?"

A bad librarian (simple retrieval) would run down every aisle, pulling every book off the shelf, looking for the word "recipe." This takes forever and is very tiring.

A smart librarian (memU's approach) does this:

  1. Clarifies: "Which recipe? The pasta one from last week?" (Query Rewriting)
  2. Checks Sections: Walks to the "Food & Cooking" section first. (Category Search)
  3. Checks Sufficiency: "I found a note on 'Spicy Pasta'. Is that enough?" (Sufficiency Check)
  4. Digs Deeper: If that note is too vague, only then do they open the specific cookbook. (Item/Resource Search)

This "Lazy Search" strategy saves time (latency) and money (token costs).

Key Concept: Progressive Drilling

memU searches in layers. It starts high-level and only drills down if necessary.

graph TD Q[User Query] --> L1[1. Search Categories] L1 --> Check1{Found Answer?} Check1 -- Yes --> Stop[Return Answer] Check1 -- No --> L2[2. Search Memory Items] L2 --> Check2{Found Answer?} Check2 -- Yes --> Stop Check2 -- No --> L3[3. Search Raw Resources] L3 --> Stop style Q fill:#ffcc80 style L1 fill:#a5d6a7 style L2 fill:#90caf9 style L3 fill:#ffab91

1. Contextual Rewriting

Users often talk in fragments.

The pipeline looks at your chat history and rewrites the query to be complete before it starts searching.

2. Sufficiency Checks

At every step, the AI pauses and asks itself: "Do I have enough information to answer the user?"


How to Use It

Using the retrieval system is simple because the MemoryService (The Central Brain) wraps all the complexity for you.

Basic Retrieval

# The service handles the entire pipeline automatically
response = await service.retrieve(
    query="What is the deadline for the project?",
    user_id="user_123"
)

print(f"Answer: {response.answer}")
# Output: Answer: The deadline is next Friday.

Understanding the Response

The result isn't just text; it tells you where it found the answer.

# You can see exactly which memories were used
print(response.retrieved_items)
# Output: [{"summary": "Deadline set for Friday", "score": 0.89}, ...]

Internal Implementation: The Assembly Line

How does this work in code? Just like the Memorize Pipeline, the Retrieve Pipeline is a list of steps (a Workflow).

1. The RAG Recipe

Let's look at src/memu/app/retrieve.py. This file defines the steps the "Librarian" takes.

There are two ways to search:

  1. RAG (Retrieval-Augmented Generation): Uses fast math (vectors) to find similar text.
  2. LLM (Model-based): Uses the AI to reason about which folder to open. (Slower, but smarter).

Here is the simplified "RAG" workflow from the code:

def _build_rag_retrieve_workflow(self):
    return [
        WorkflowStep(step_id="route_intention", ...), # 1. Rewrite Query
        WorkflowStep(step_id="route_category", ...),  # 2. Find Folder
        WorkflowStep(step_id="sufficiency_after_category", ...), # 3. Check if done
        WorkflowStep(step_id="recall_items", ...),    # 4. Find Facts (if needed)
        WorkflowStep(step_id="sufficiency_after_items", ...),    # 5. Check if done
        WorkflowStep(step_id="recall_resources", ...),# 6. Find Raw Docs (if needed)
        WorkflowStep(step_id="build_context", ...),   # 7. Finalize Answer
    ]

2. The Sufficiency Check (The Brain)

This is the coolest part of the system. It's an AI judging itself.

In the function _decide_if_retrieval_needed, the system sends a prompt to the LLM:

async def _decide_if_retrieval_needed(self, query, retrieved_content, ...):
    # We ask the LLM: "Here is what we found so far. Is it enough?"
    prompt = f"""
    User Question: {query}
    What we found: {retrieved_content}
    
    Decision: [ENOUGH] or [NEED_MORE]
    """
    
    response = await client.chat(prompt)
    return response == "NEED_MORE"

How does it find the right folder? It uses Embeddings (vectors).

async def _rag_route_category(self, state, ...):
    # 1. Turn the user's question into a list of numbers (Vector)
    query_vector = await embed_client.embed(state["active_query"])
    
    # 2. Compare numbers against all Category descriptions
    hits = await self._rank_categories_by_summary(
        query_vector, 
        top_k=3, 
        ...
    )
    
    # 3. Store the best matching folders in the state
    state["category_hits"] = hits
    return state

This is extremely fast. It can scan thousands of categories in milliseconds.

Summary

In this chapter, we learned that retrieval in memU is not a brute-force search. It is a smart, layered investigation.

  1. Hierarchy: It checks broad Categories first, then specific Items.
  2. Rewriting: It fixes vague user questions using chat history.
  3. Sufficiency: It stops searching the moment it has the answer, saving speed and cost.

So, we have digested memories (Chapter 3) and we can recall them (Chapter 4). But where do these files actually live on your computer? Is it a database? A JSON file?

It's time to open the filing cabinet and look at the Storage Layer.

Next Chapter: Storage Layer (The Filing Cabinet)


Generated by Code IQ