In the previous chapter, Semantic Search & RAG, we manually glued together a database and a language model to create a knowledgeable chatbot. It worked, but writing all that "glue code" manually gets messy very quickly.
What if you want your AI to remember your name? What if you want it to browse the web and do math in the same conversation?
This is where LangChain comes in.
Think of building a software application like building a house.
LangChain Orchestration is a framework that helps you chain together prompts, models, and tools to build complex applications easily.
The most basic building block is a Chain. A chain connects a specific Input $\rightarrow$ Prompt $\rightarrow$ Model $\rightarrow$ Output.
You can also connect multiple chains together, creating an assembly line where the output of one step becomes the input of the next.
Let's create a pipeline that:
First, let's load our model (just like in previous chapters):
from langchain.llms import LlamaCpp
# Load the local model
llm = LlamaCpp(
model_path="Phi-3-mini-4k-instruct-q4.gguf",
n_ctx=2048,
verbose=False
)
Now, let's create two separate chains using PromptTemplate.
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
# Chain 1: Generate a Title
prompt1 = PromptTemplate(
template="Write a creative title for a story about {topic}.",
input_variables=["topic"]
)
chain_title = LLMChain(llm=llm, prompt=prompt1, output_key="title")
# Chain 2: Write the Story (uses the 'title' from Chain 1)
prompt2 = PromptTemplate(
template="Write a very short story with the title: {title}",
input_variables=["title"]
)
chain_story = LLMChain(llm=llm, prompt=prompt2, output_key="story")
Finally, we glue them together using the pipe | operator. This tells LangChain: "Run the title chain, then pass the result to the story chain."
from langchain.schema.runnable import RunnableSequential
# Create the assembly line
full_chain = chain_title | chain_story
# Run it!
result = full_chain.invoke({"topic": "a robot who loves gardening"})
print(result['story'])
Output: The model will generate a title (e.g., "The Iron Thumb") and then immediately write a story about it. You didn't have to copy-paste the title manually!
By default, LLMs are stateless. This means they have amnesia. If you say "Hi, I'm Bob," and then ask "What is my name?", the model will say "I don't know."
To fix this, LangChain provides Memory components. Memory acts as a logbook that records the conversation and feeds it back to the model every time you speak.
We use ConversationBufferMemory, which simply stores every message sent back and forth.
from langchain.memory import ConversationBufferMemory
# Create the memory object
memory = ConversationBufferMemory(memory_key="chat_history")
# Create a prompt that creates space for history
template = """
History: {chat_history}
User: {input_text}
AI:"""
prompt = PromptTemplate(
template=template,
input_variables=["input_text", "chat_history"]
)
Now we create a chain that includes this memory logbook.
# Create a chain with memory attached
conversation = LLMChain(
llm=llm,
prompt=prompt,
memory=memory
)
# Turn 1
conversation.invoke({"input_text": "My name is Alice."})
# Turn 2
response = conversation.invoke({"input_text": "What is my name?"})
print(response['text'])
Output:
"Your name is Alice."
The orchestration layer automatically inserted the previous interaction into the prompt before sending it to the LLM.
Chains are rigid (A $\rightarrow$ B $\rightarrow$ C). Agents are flexible.
An Agent is an LLM that has access to a toolkit. When you ask a question, the Agent decides:
It creates a "Reasoning Loop" to solve the problem step-by-step.
LangChain agents often use a logic called ReAct (Reason + Act).
Calculator tool with input 135 * 48.6480.Here is how you set up a simple agent with math skills:
from langchain.agents import load_tools, initialize_agent, AgentType
# Load standard tools (e.g., a math helper)
tools = load_tools(["llm-math"], llm=llm)
# Initialize the Agent
agent = initialize_agent(
tools,
llm,
agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
verbose=True # This lets us see the thinking process!
)
Now, let's ask it a math question.
agent.run("What is 10 raised to the power of 2.5?")
Output (simplified):
> Entering chain...
Thought: I need to calculate a power.
Action: Calculator
Action Input: 10**2.5
Observation: 316.227766
Thought: I have the answer.
Final Answer: 316.227766
The LLM didn't guess. It realized it needed a tool, used it, read the result, and gave you the correct answer.
How does the Agent actually "use" a tool? It is not magic; it is just text processing.
The Agent is running a continuous while loop. It generates text until it encounters a specific keyword (like Action:). LangChain pauses the LLM, runs the corresponding Python function (the tool), and pastes the output back into the prompt for the LLM to read.
LangChain acts as the glue that turns a raw Language Model into a functioning application.
However, all this functionality comes at a cost: Speed and Size. Running complex chains with large models can be slow and require expensive hardware. To make these models run on consumer laptops or smaller devices, we need to make them smaller.
Next Step: Learn how to shrink models without losing intelligence in Quantization.
Generated by Code IQ