Chapter 5 · CORE

Storage Layer (The Filing Cabinet)

📄 05_storage_layer__the_filing_cabinet_.md 🏷 Core

Chapter 5: Storage Layer (The Filing Cabinet)

Welcome back! In the previous chapter, Retrieve Pipeline (The Recall System), we learned how the AI acts as a librarian to find information.

But a librarian is useless without a library.

If you turn off your computer, what happens to the memories the AI just learned? If they are just variables in Python's memory (RAM), they vanish like writing on a whiteboard. We need a permanent place to keep them—a Filing Cabinet.

This is the job of the Storage Layer.

The Motivation: Whiteboards vs. Filing Cabinets

When building an app, you face a dilemma:

Testing is hard: Setting up a real database server just to test a 5-line script is annoying.
Production is hard: A simple text file isn't fast enough when you have 1,000 users.

You need a system that acts like a whiteboard for testing (fast, temporary) but acts like a steel filing cabinet for production (permanent, secure).

memU solves this using an Abstraction Layer. The rest of the app (the Brain, the Pipelines) doesn't know how data is saved. It just says, "Save this." The Storage Layer handles the details.

Key Concepts

The Storage Layer is built on three main ideas.

1. The Factory (The Builder)

Imagine you are hiring a construction crew. You tell them: "Build me a storage unit."

If you say "I'm just testing," they build a Tent (In-Memory).
If you say "I need this forever," they build a Concrete Vault (SQLite or PostgreSQL).

This switch happens automatically based on your configuration.

2. The Repositories (The Drawers)

Inside your filing cabinet, you don't throw everything into one pile. You have labeled drawers:

Resource Repo: For raw files (PDFs, Transcripts).
Item Repo: For specific facts (The "files").
Category Repo: For the folders/topics.

3. Vector Storage (The Index)

To make the AI smart, we store "Vectors" (lists of numbers representing meaning).

Simple Store (SQLite): It calculates similarity using basic math (slow for millions of items, fine for personal apps).
Advanced Store (Postgres): It uses specialized plugins (like pgvector) for lightning-fast search.

How to Use It

As a user of memU, you rarely touch the database code directly. You simply tell the MemoryService which "Builder" to use.

Configuration Example

Here is how you swap the backend in your code:

from memu.app import MemoryService

# OPTION A: The Tent (In-Memory)
# Great for unit tests. Data is lost when script ends.
service_test = MemoryService(
    database_config={"provider": "inmemory"}
)

# OPTION B: The Vault (SQLite)
# Great for local apps. Data is saved to a file.
service_prod = MemoryService(
    database_config={
        "provider": "sqlite",
        "dsn": "sqlite:///my_memory.db"
    }
)

By changing one line of text (provider), you completely change the storage engine without rewriting any other code.

Internal Implementation: Under the Hood

How does memU pull off this magic trick? Let's look at the flow when you start the app.

sequenceDiagram participant User participant Factory as Database Factory participant SQL as SQLite Store participant Repo as Item Repository User->>Factory: "Give me a SQLite database" Factory->>SQL: Initialize(connection_string) SQL->>SQL: Create Tables (if missing) SQL->>Repo: Set up Drawers (Repositories) SQL-->>User: Returns ready-to-use Database Object

1. The Factory Pattern

The magic starts in src/memu/database/factory.py. This function acts as the traffic controller.

def build_database(config, user_model):
    # 1. Check what the user wants
    provider = config.metadata_store.provider
    
    # 2. Return the correct storage engine
    if provider == "inmemory":
        return build_inmemory_database(...)
        
    elif provider == "sqlite":
        # Only imports SQLite code if actually needed!
        from memu.database.sqlite import build_sqlite_database
        return build_sqlite_database(...)

Why do we import inside the if? If you are using PostgreSQL, you don't want to crash because you are missing a SQLite driver, and vice versa. This keeps dependencies clean.

2. The Database Class (The Cabinet)

Let's look at src/memu/database/sqlite/sqlite.py. This class manages the connection and the repositories.

class SQLiteStore(Database):
    def __init__(self, dsn, ...):
        self._sessions = SQLiteSessionManager(dsn=dsn)
        
        # 1. Ensure the physical tables exist
        self._create_tables()

        # 2. Open the specific "Drawers"
        self.memory_item_repo = SQLiteMemoryItemRepo(...)
        self.resource_repo = SQLiteResourceRepo(...)
        
        # 3. Load cache (for speed)
        self.load_existing()

The SQLiteStore is the boss. It holds the connection to the file (my_memory.db) and owns the repositories.

3. The Models (The Forms)

Before we save anything, we need to know what the data looks like. memU uses Pydantic models to define the shape of the data.

In src/memu/database/models.py, we define the "Form" every memory must fill out:

class MemoryItem(BaseRecord):
    # Every memory has these fields
    id: str
    summary: str             # The content
    embedding: list[float]   # The vector
    created_at: datetime
    
    # Links to other tables
    resource_id: str | None

Because we use this standard model, the rest of the app doesn't care if the underlying database is SQL, NoSQL, or a JSON file. It just sends a MemoryItem.

4. The Repository (The Drawer)

When you want to find an item, you ask the repository. The repository translates your request into the specific language of the database (e.g., SQL queries).

Conceptual code for a Repository:

class SQLiteMemoryItemRepo:
    def create_item(self, summary, embedding):
        # 1. Convert Python object to SQL row
        row = MemoryItemSQL(summary=summary, ...)
        
        # 2. Save to file
        self.session.add(row)
        self.session.commit()
        return row

Summary

In this chapter, we explored the Storage Layer:

Persistence: It moves data from RAM to permanent storage.
Flexibility: The Factory Pattern lets us switch between "In-Memory" (for testing) and "SQLite/Postgres" (for real usage) instantly.
Organization: It uses Repositories to keep Resources, Items, and Categories in their own lanes.

Now we have a Brain (Service), a way to learn (Memorize), a way to remember (Retrieve), and a place to keep it all (Storage).

But there is one final piece of the puzzle. How do we string all these steps together? How do we ensure that "Step A" passes data correctly to "Step B"?

We need an assembly line manager.

Next Chapter: Workflow Pipeline Engine (The Assembly Line)

Generated by Code IQ

← Previous

Retrieve Pipeline (The Recall System)

Workflow Pipeline Engine (The Assembly Line)