Welcome to Scrapling! If you are looking to extract data from the web, the very first step is actually getting the web page.
In this chapter, we will explore the Fetchers Interface. This is the toolset Scrapling uses to drive to a website and bring back the content.
Imagine the internet as a landscape with different types of terrain.
If you try to drive a motorbike through a muddy swamp, you'll get stuck. If you drive a tank to the grocery store, it's inefficient.
Scrapling solves this by providing three specific "vehicles" (Fetchers) that share the same steering wheel (Interface).
Fetcher (The Motorbike):curl).DynamicFetcher (The Heavy-Duty Van):StealthyFetcher (The Spy Car):Let's say we want to scrape product prices. We will see how to switch vehicles depending on the target.
Fetcher)
If the website is simple HTML (the price is in the source code), use the basic Fetcher. It is lightweight and doesn't need to open a browser window.
from scrapling import Fetcher
# The Motorbike: Fast and direct
response = Fetcher.get('https://books.toscrape.com/')
print(response.status) # 200
print("Page downloaded successfully!")
What happened here? We sent a standard HTTP GET request. The server replied with HTML text. We didn't render images or run JavaScript. It was instant.
DynamicFetcher)
Now, imagine a site where the price doesn't show up until a spinning loading icon finishes. The basic Fetcher would only see the "Loading..." text. We need the "Van" that can wait for the specific element to appear.
from scrapling import DynamicFetcher
# The Van: Loads the full browser engine
response = DynamicFetcher.fetch(
'https://web-scraping.dev/product/1',
headless=True, # Run hidden in background
wait_selector='.price' # Wait for price to appear
)
print("Dynamic content loaded!")
What happened here?
Scrapling opened a real Chrome browser instance in the background, navigated to the URL, executed the JavaScript, waited for the .price element to exist, and then grabbed the HTML.
StealthyFetcher)Finally, you try to scrape a high-security site, and you get a "403 Forbidden" or a "Verify you are human" screen. The previous fetchers fail because they look like bots. Enter the Spy Car.
from scrapling import StealthyFetcher
# The Spy Car: Bypasses defenses
response = StealthyFetcher.fetch(
'https://nopecha.com/demo/cloudflare',
solve_cloudflare=True, # Auto-solve captchas
headless=False # Let's watch it work!
)
print("Protection bypassed!")
What happened here? Scrapling launched a specially patched browser. It masked its "robot" fingerprints (canvas noise, distinct headers). When it saw a Cloudflare checkpoint, it automatically clicked the verification box or solved the challenge before returning the page to you.
The beauty of Scrapling is that you can swap your vehicle without learning how to drive again. All Fetchers produce a Response object.
This Response object is what allows us to find data, which we will cover in the Adaptive Parser chapter.
How does Scrapling manage these complex browser interactions so simply?
When you call fetch, Scrapling acts as a manager. It prepares the configuration, selects the engine, and handles the cleanup.
Internally, StealthyFetcher builds upon the DynamicFetcher. It intercepts the browser creation process to inject "stealth" scripts.
Here is a simplified look at how the StealthyFetcher decides to handle a request. It checks your arguments (like solve_cloudflare) and configures a StealthySession.
# Simplified internal logic representation
class StealthyFetcher:
@classmethod
def fetch(cls, url, **kwargs):
# 1. Configure the stealth options
config = cls._prepare_config(kwargs)
# 2. Start a session (The Engine)
with StealthySession(**config) as session:
# 3. Go to the URL and return the result
return session.fetch(url)
The StealthySession (which we will look at more in Browser Session Engine) handles the heavy lifting of keeping the browser open and managing cookies.
In this chapter, you learned that Fetchers are your vehicles for retrieving data:
Fetcher: The fast motorbike for static pages.DynamicFetcher: The van for JavaScript apps.StealthyFetcher: The spy car for bypassing anti-bot defenses.
Now that you have fetched the web page and have a Response object, you need to extract the specific data (text, links, images) from it.
Generated by Code IQ