Chapter 1 ยท CORE

Hierarchical Data Model (The Memory File System)

๐Ÿ“„ 01_hierarchical_data_model__the_memory_file_system_.md ๐Ÿท Core

Chapter 1: Hierarchical Data Model (The Memory File System)

Welcome to the first chapter of the memU tutorial! If you are building an AI application, you probably want it to remember things.

Most simple AI memory systems use a "flat list." Imagine throwing thousands of sticky notes into a giant pile. When you ask a question, the AI has to dig through the entire pile to find the right note. It's messy, slow, and often inaccurate.

memU takes a different approach. It organizes memory like a computer file system (folders and files). This allows your AI to "browse" topics before diving into details, making retrieval smarter and cheaper.

The Motivation: A Tale of Two Topics

Imagine you are chatting with your AI assistant about two very different things:

  1. Your Coffee Preferences (You love dark roast, hate sugar).
  2. Project Apollo (A work project due next Friday).

If you ask, "What is the status?", a "flat" memory system might get confused. It scans everything. It might find a note saying "Coffee status: empty" alongside "Project status: pending".

memU solves this by grouping these facts into hierarchies:

When you ask about work, the AI knows to look in the Work Projects folder, ignoring your coffee habits entirely.

The 3 Main Layers

In memU, we don't just store text; we store structured objects. Here are the three core building blocks:

graph TD R[1. Resources] -->|Extracts| I[2. Memory Items] C[3. Memory Categories] -->|Groups| I style R fill:#ffecb3,stroke:#ffb74d style I fill:#b3e5fc,stroke:#4fc3f7 style C fill:#dcedc8,stroke:#aed581

1. Resources (The Source Material)

Think of a Resource as the raw source file. This could be a chat log, a PDF document, an image, or a URL. It is the "proof" that something happened.

2. MemoryItems (The Facts)

A MemoryItem is a specific fact or insight extracted from a Resource. Think of this as a specific file inside a folder.

3. MemoryCategories (The Folders)

A MemoryCategory is a high-level topic that groups related items together.


How It Works in Code

Let's look at how memU defines these structures in Python. These models form the backbone of the system.

The Memory Item

The MemoryItem is the most important unit. It holds the summary of the fact and the mathematical "embedding" (vector) that lets the AI search for it.

From src/memu/database/models.py:

class MemoryItem(BaseRecord):
    resource_id: str | None  # Links back to the source (The Book)
    memory_type: str         # e.g., "preference", "event", "fact"
    summary: str             # The actual memory content
    embedding: list[float] | None = None # Vector for search
    happened_at: datetime | None = None
    extra: dict[str, Any] = {} # Custom data

The Category

The MemoryCategory helps the AI browse. Instead of scanning 10,000 items, it might first scan 50 categories to find the right topic.

class MemoryCategory(BaseRecord):
    name: str              # e.g., "Coffee Preferences"
    description: str       # e.g., "Details about what the user drinks"
    embedding: list[float] | None = None
    summary: str | None = None

The Relationship (Linking Files to Folders)

How do we put a MemoryItem inside a MemoryCategory? We use a linking table called CategoryItem.

class CategoryItem(BaseRecord):
    item_id: str      # The ID of the specific fact
    category_id: str  # The ID of the folder it belongs to

This flexible design means one specific memory (like "I have a meeting at 2 PM") can live in multiple categories (e.g., both "Calendar" and "Project Apollo").


Internal Implementation: The Data Flow

When memU processes information, it doesn't just save data; it transforms it.

Here is what happens when you feed data into this hierarchical model. This process is managed by the Memorize Pipeline (which we will cover in Chapter 3), but it's important to visualize the data structure now.

sequenceDiagram participant Source as Raw Input participant Extractor as AI Processor participant DB as Data Storage Source->>Extractor: "I need dark roast coffee." Note right of Extractor: 1. Creates Resource record Extractor->>Extractor: Extracts Fact: "User likes dark roast" Note right of Extractor: 2. Creates MemoryItem Extractor->>Extractor: Identifies Topic: "Preferences" Note right of Extractor: 3. Creates MemoryCategory Extractor->>DB: Saves Resource, Item, Category, & Link

Scoped Models (User Separation)

You might wonder: What if I have multiple users?

memU uses a clever trick called "Scoped Models." If you look at src/memu/database/models.py, you'll see a function called build_scoped_models.

def build_scoped_models(user_model: type[BaseModel]):
    # Merges your User definition with the Resource definition
    resource_model = merge_scope_model(
        user_model, Resource, name_suffix="Resource"
    )
    # ... repeats for Items and Categories
    return resource_model, ...

This simply means every MemoryItem or Resource automatically gets an extra field (like user_id) attached to it. This ensures that User A never accidentally sees the memories inside User B's folders.

Summary

In this chapter, we learned that memU doesn't use a messy flat list. It uses a Hierarchical Data Model:

  1. Resources: The raw evidence (files/chats).
  2. MemoryItems: The actual facts (files).
  3. MemoryCategories: The topics (folders).

This structure provides the foundation for the AI to "think" in an organized way.

But who actually manages these files and folders? Who creates them and searches them? That is the job of the MemoryService, the central brain of the operation.

Next Chapter: MemoryService (The Central Brain)


Generated by Code IQ