Welcome back! In the previous chapter, MemoryService (The Central Brain), we set up the central manager that coordinates everything.
Now, we need to teach this brain how to learn.
If you hand a 50-page PDF or a 10-minute audio recording to an AI, it can't just "remember" it all instantly. It needs to process it. It needs to break it down.
We call this the Memorize Pipeline, but a better name might be the Digestion System.
Imagine you read a biography about Steve Jobs. Years later, you don't remember every single sentence (the raw data). You remember specific facts (he founded Apple), events (the iPhone launch), and traits (he wore turtlenecks).
If memU just saved the raw text of every chat you ever had, your database would be huge, slow, and expensive to search.
Instead, the Memorize Pipeline does this:
Let's visualize the journey of a piece of information, like a User saying: "I'm allergic to peanuts and I have a flight tomorrow."
This step simply fetches the file. Whether it's a local file path or a URL, the pipeline grabs the data.
This step ensures everything becomes Text.
This is the most intelligent part. We send the text to an LLM with specific instructions (Prompts).
Finally, the system calculates vectors (embeddings) for these new facts, finds the right MemoryCategories (like "Medical" or "Travel"), and saves them.
You don't need to manually run these steps. The MemoryService wraps them all into one simple command.
# You just tell it WHAT to eat and HOW to eat it.
result = await service.memorize(
resource_url="./recordings/meeting.mp3",
modality="audio"
)
# Output is a summary of what was learned
print(result["items"])
# Output: [{"summary": "Project deadline is Friday", "type": "event"}, ...]
Behind the scenes, the pipeline automatically transcribed the MP3, found the deadline, and saved it.
Let's look under the hood at src/memu/app/memorize.py. This file defines the "Workflow."
memU uses a concept called a WorkflowStep. The pipeline is just a list of steps executed in order.
In _build_memorize_workflow, you can see the exact recipe:
def _build_memorize_workflow(self) -> list[WorkflowStep]:
return [
WorkflowStep(step_id="ingest_resource", ...), # 1. Fetch file
WorkflowStep(step_id="preprocess_multimodal", ...), # 2. Handle Audio/Video
WorkflowStep(step_id="extract_items", ...), # 3. Ask LLM for facts
WorkflowStep(step_id="categorize_items", ...), # 4. Save to DB
WorkflowStep(step_id="persist_index", ...), # 5. Update Folder Summaries
]
The extract_items step is where the intelligence lives. It calls _generate_structured_entries.
It grabs specific prompts (instructions) based on what you want to remember (Profiles, Events, etc.).
async def _generate_entries_from_text(self, resource_text, ...):
# 1. Prepare the prompts (e.g., "Extract user profile info")
prompts = [
self._build_memory_type_prompt(mtype, resource_text, ...)
for mtype in memory_types
]
# 2. Ask the LLM (run in parallel)
responses = await asyncio.gather(
*[client.chat(p) for p in prompts]
)
# 3. Parse the XML/JSON response into Python objects
return self._parse_structured_entries(memory_types, responses)
Why XML?
If you look at the prompt files (like src/memu/prompts/memory_type/profile.py), you'll see we ask the LLM to output data in tags like <memory><content>...</content></memory>. This makes it much easier for the code to read the result reliably than free-form text.
Once we have the facts (MemoryItems), we need to put them in folders. This happens in _memorize_categorize_items.
async def _memorize_categorize_items(self, state, ...):
# For every fact the LLM found...
for entry in extracted_entries:
# 1. Create the MemoryItem in the database
item = store.memory_item_repo.create_item(
summary=entry.content,
embedding=..., # Calculated automatically
)
# 2. Link it to the correct Category (Folder)
for cat_id in entry.category_ids:
store.category_item_repo.link_item_category(item.id, cat_id)
There is one final, crucial step: persist_index.
If you add a note saying "I love sushi" to the "Food" folder, the "Food" folder's description needs to be updated.
async def _update_category_summaries(self, updates, ...):
# 1. Grab the current folder summary
original_summary = category.summary
# 2. Ask LLM: "Combine the old summary with these new facts"
prompt = self._build_category_summary_prompt(
category=category,
new_memories=new_items
)
# 3. Save the new, updated folder description
new_summary = await client.chat(prompt)
store.memory_category_repo.update_category(category.id, summary=new_summary)
This ensures that when we search later, the high-level folders accurately describe what is inside them.
In this chapter, we explored the Memorize Pipeline:
Now your AI has a belly full of organized information. But what good is knowledge if you can't use it?
In the next chapter, we will learn how the AI searches through this filing cabinet to answer questions.
Next Chapter: Retrieve Pipeline (The Recall System)
Generated by Code IQ