Welcome back!
In Chapter 4: Crawl Orchestration, we built a powerful engine to manage thousands of scrapers like a factory line.
But sometimes, you don't need a factory. Sometimes, you have a very smart "Brain" (an Artificial Intelligence like Claude, GPT-4, or Llama) that needs "Eyes" to see the world.
Standard AI models are like encyclopedias locked in a room without internet access. They know everything about the past but nothing about what is happening on a website right now.
In this chapter, we introduce AI Integration via MCP. This acts as a bridge, allowing an AI agent to "drive" Scrapling's vehicles to fetch and read web pages automatically.
Imagine you are chatting with an AI assistant and you ask: "Can you summarize the latest article on this news site?"
Without Scrapling, the AI says: "I cannot browse the live web."
With Scrapling's MCP Integration, the flow changes:
We don't need to write a scraper for every site. We just give the AI the Tool to fetch data itself.
MCP stands for Model Context Protocol. Think of it as a "Universal USB Cable" for AI.
When we turn on this integration, Scrapling exposes its internal Fetchers (which we learned about in Chapter 1: Fetchers Interface) as tools the AI can call.
The AI gets access to three main capabilities:
get: The Motorbike. Fast, for simple HTML pages.fetch: The Van. Opens a browser for dynamic sites.stealthy_fetch: The Spy Car. Solves Cloudflare/Captchas automatically.Crucially, Scrapling doesn't just give the AI raw HTML (which is messy). It automatically converts the page to Markdown, which LLMs understand perfectly.
You don't typically write Python scripts to use the MCP server. Instead, you run the server, and your AI Client connects to it.
However, to see how easy it is to start, here is how you launch it within Python:
from scrapling import ScraplingMCPServer
# Initialize the server wrapper
server = ScraplingMCPServer()
# Start the server (using Standard IO pipe)
# This allows AI apps to talk to it via the command line
if __name__ == "__main__":
server.serve(http=False, host="localhost", port=8000)
What happens here? The script starts and waits silently. It doesn't print anything because it's listening for digital signals from an AI agent on your computer.
If you were to connect this to an AI Agent (like Claude Desktop), you would simply tell the AI:
"Use the
stealthy_fetchtool to readhttps://example.com."
The AI would automatically format a JSON request, send it to Scrapling, wait for the Markdown, and then read it.
How does Scrapling translate a robot request into a browser action?
It uses a translation layer. The AI sends a text request (JSON), Scrapling interprets it, picks the right Fetcher, and cleans up the result.
Let's look inside scrapling/core/ai.py to see how the server is defined. It acts as a wrapper around the Scrapling library.
First, it defines the structure of the data it returns to the AI (ResponseModel). The AI needs to know the status (Did it work?) and the content.
# Simplified from scrapling/core/ai.py
from pydantic import BaseModel
class ResponseModel(BaseModel):
"""What we send back to the AI"""
status: int
content: list[str] # The page text or markdown
url: str
Next, it registers the functions. Here is how the stealthy_fetch tool is exposed. Notice how it takes the complex arguments we learned in Chapter 1 and exposes them to the AI.
# Simplified from scrapling/core/ai.py
class ScraplingMCPServer:
@staticmethod
async def stealthy_fetch(url: str, extraction_type="markdown", **kwargs):
"""
AI Tool: Fetches high-security pages.
"""
# 1. Call the Spy Car (StealthyFetcher)
page = await StealthyFetcher.async_fetch(url, **kwargs)
# 2. Convert HTML to friendly Markdown
# This uses the Adaptive Parser logic internally
content = Convertor.extract(page, type=extraction_type)
# 3. Return structured data
return ResponseModel(status=page.status, content=content, url=page.url)
Finally, the serve method packages these functions using the FastMCP library, which handles the communication protocol.
def serve(self, http: bool, host: str, port: int):
# Create the MCP application
server = FastMCP(name="Scrapling")
# Add the tools so the AI knows they exist
server.add_tool(self.get)
server.add_tool(self.fetch)
server.add_tool(self.stealthy_fetch)
# Start listening
server.run()
You might wonder why we convert the HTML to Markdown before giving it to the AI.
<div>, <span>, and class="xyz". This wastes the AI's memory (context window). Markdown is concise.In this chapter, you learned:
get, fetch, and stealthy_fetch as capabilities for the AI.You have now mastered the art of fetching data, parsing it, automating it, and even connecting it to Artificial Intelligence.
However, all these fetchers rely on a browser engine to work. In the final chapter, we will take a deeper look at the engine itselfβthe session that keeps track of your cookies, headers, and identity.
Next Chapter: Browser Session Engine
Generated by Code IQ