Welcome to Chapter 7! In the previous chapter, quiz-app, we built a tool to test our knowledge. We ensured our brains were ready.
Now, we are finally ready to do Real Machine Learning.
Up until now, we have been setting up our environment, installing tools, and learning history. In this chapter, we open the folder 2-Regression to write our first true AI model.
Imagine you are a pumpkin farmer.
Regression is the fancy word for "finding the relationship between things."
The 2-Regression directory contains the code that turns your historical data into a mathematical "Crystal Ball" that can predict numbers.
Inside this folder, we aren't writing magic spells; we are drawing lines.
In Regression, we usually talk about two variables:
X (Feature): The thing we know (e.g., Pumpkin Size).y (Label): The thing we want to predict (e.g., Price).Imagine taking a piece of spaghetti and throwing it onto a graph of your data.
Once the spaghetti (the line) is glued down, we can pick a new size on the graph, look at where the line is, and guess the price.
The 2-Regression folder is a collection of Notebooks (remember notebook.ipynb?).
To "use" this folder, you open the notebooks and run the code cells. Let's look at the core workflow used in this chapter.
First, we need to load our "Pumpkin Sales Ledger" (a CSV file) into Python using a library called Pandas.
import pandas as pd
# Load the data file from the folder
pumpkins = pd.read_csv('../data/US-pumpkins.csv')
# Show the first 5 rows so we can check it
# This is like opening the ledger book
print(pumpkins.head())
What happens: You see a table of data appear on your screen. You can check if the columns "City", "Price", and "Size" exist.
This is the moment we have been waiting for. We use Scikit-learn to train the robot.
from sklearn.linear_model import LinearRegression
# 1. Create the empty robot brain
model = LinearRegression()
# 2. Train the robot (Fit the line to the data)
# X = Size, y = Price
model.fit(X_train, y_train)
# 3. Ask the robot a question
prediction = model.predict([[450]]) # Size 450
print(f"Predicted Price: ${prediction[0]}")
Output:
Predicted Price: $8.50
Explanation:
LinearRegression(): We create a new, empty model. It knows nothing.fit(): We force the model to look at our training data. It adjusts its internal math to find the "Line of Best Fit."predict(): Now that it has learned, we ask: "How much is a size 450 pumpkin?" It calculates the answer based on the line it drew.
What actually happens when we type model.fit()? Does the computer actually draw a line?
Mathematically, yes. It calculates a slope (how steep the line is) and an intercept (where the line starts).
In the 2-Regression lessons, you will learn a critical concept: Splitting.
If we show the robot all the answers, it might just memorize them (cheating). To prevent this, we hide some data.
from sklearn.model_selection import train_test_split
# Split our data into two piles: Study Material and Exam Questions
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Train ONLY on the Study Material
model.fit(X_train, y_train)
# Test on the Exam Questions
score = model.score(X_test, y_test)
print(f"Accuracy Score: {score}")
Explanation:
train_test_split: Randomly shuffles our pumpkin data and cuts it into two piles.fit(X_train...): The robot only sees the study pile.score(X_test...): We see how well the robot predicts prices for pumpkins it has never seen before.You might be thinking, "Why don't I just use Excel?"
In this chapter, we explored 2-Regression. We learned that:
We now have a working mathematical brain in our notebook. But right now, only we can use it. How do we share this prediction tool with the rest of the world?
We need to build a website around it.
Generated by Code IQ