Welcome to the final chapter of the memU tutorial!
In the previous chapter, Workflow Pipeline Engine (The Assembly Line), we built a factory that processes information step-by-step.
However, there is one problem. Our factory speaks Python, but the AI models (like GPT-4, Claude, or Llama) speak different "languages" (APIs).
messages.If you hardcode "OpenAI" into your pipeline, you are trapped. If you want to switch to a cheaper local model later, you have to rewrite your entire app.
We need a universal adapter. We need the LLM Client Wrapper.
Imagine you are traveling around the world. Every country has a different electrical outlet.
The LLM Client Wrapper is that adapter.
It sits between memU and the outside world. The rest of the system just says "Chat with this", and the Wrapper figures out the technical details of talking to OpenAI, Anthropic, or a local server.
The Wrapper doesn't just send text. It standardizes four distinct "senses" for the AI:
This is the most powerful feature of the Wrapper.
Imagine you want to know how much money you are spending on AI. You could modify every single function in your code to print the cost... or you could just place a "toll booth" inside the Wrapper.
This toll booth is called an Interceptor. It allows you to hook into the process Before or After the AI speaks.
You typically access the wrapper through the MemoryService (The Central Brain).
You can use the client to perform different tasks without worrying about API headers or JSON formats.
# 1. Access the client from the service
client = service.llm_client
# 2. Chat (Text)
response = await client.chat("Why is the sky blue?")
# 3. Vision (Images)
# The wrapper handles opening the file and encoding it
desc = await client.vision("Describe this", "photo.jpg")
# 4. Transcribe (Audio)
text = await client.transcribe("meeting.mp3")
Let's say you want to print the token usage (cost) every time the AI answers. You don't need to change your business logic. You just register a hook.
# Define a function to run AFTER the AI responds
async def log_cost(context, request, response, usage):
print(f"--- Call Finished ---")
print(f"Model: {context.model}")
print(f"Tokens Used: {usage.total_tokens}")
# Register it with the service
service.intercept_after_llm_call(log_cost)
# Now, whenever you use the service, it prints the cost!
await service.retrieve("Hello", "user_1")
Output:
--- Call Finished ---
Model: gpt-4o-mini
Tokens Used: 154
How does the wrapper handle different providers returning different data formats? It converts everything into a Standard View.
completion_tokens or Anthropic calls it output_tokens, memU standardizes it to output_tokens._invoke Method (The Traffic Controller)
Let's look at src/memu/llm/wrapper.py. The heart of the system is the _invoke method. Every call (Chat, Vision, Embed) goes through this function.
async def _invoke(self, kind, call_fn, ...):
# 1. Run "Before" Interceptors
await self._run_before(interceptors, context, request_view)
try:
# 2. Call the actual AI (OpenAI, Local, etc.)
result = await call_fn()
except Exception as e:
# 3. Handle Errors (and run Error Interceptors)
await self._run_on_error(...)
raise e
# 4. Standardize the result (and calculate tokens)
usage = self._extract_usage(result)
# 5. Run "After" Interceptors
await self._run_after(interceptors, ..., usage)
return result
This design ensures that no AI call ever happens unmonitored. You can always track, log, or modify data flowing in and out of your system.
Remember the LLM Profiles from Chapter 2? This is where they are used.
In src/memu/app/service.py, the service decides which wrapper to build based on your config:
def _init_llm_client(self, config):
if config.client_backend == "sdk":
# Use official OpenAI Python Library
return OpenAISDKClient(...)
elif config.client_backend == "httpx":
# Use raw HTTP requests (good for local models)
return HTTPLLMClient(...)
This Factory Pattern means memU is future-proof. If a new AI provider comes out tomorrow, we just add a new backend class, and your code remains unchanged.
Congratulations! You have completed the memU architecture tutorial.
We have built a complete mental model of an advanced AI memory system:
You are now ready to build intelligent, memory-augmented applications that go far beyond simple chatbots. Good luck!
Generated by Code IQ