Welcome to Chapter 12! In the previous chapter, 6-NLP, we learned how to teach computers to read and understand human language. We dealt with words, sentences, and emotions.
But there is one dimension we haven't touched yet: Time.
In all our previous lessons, the order of data didn't matter much. A photo of a cat is a photo of a cat, whether you took it yesterday or today. But in the real world, when something happens is often just as important as what happens.
This brings us to the folder 7-TimeSeries.
Imagine you are the manager of a local electricity power plant.
You notice a pattern: People use more electricity in the Winter (heating) and Summer (AC), but less in the Spring.
Time Series Forecasting is the technique of predicting future events by analyzing the trends of the past. It is the closest thing we have to a mathematical time machine.
In 2-Regression, we predicted pumpkin prices based on size. The order didn't matter. In Time Series, the order is everything. Today's temperature depends heavily on yesterday's temperature.
This is the general direction of the data over a long period.
These are patterns that repeat at regular intervals.
This is the random messiness that we can't predict.
To use the 7-TimeSeries folder, we usually use a library called statsmodels alongside our usual pandas. The most common tool for beginners is a model called ARIMA.
Computers love numbers, but they often struggle with dates (Is it Day/Month or Month/Day?). We must first teach the computer that our data is a Time Series.
import pandas as pd
# 1. Load the data
df = pd.read_csv('electricity_usage.csv')
# 2. Convert the text column to real Dates
df['Date'] = pd.to_datetime(df['Date'])
# 3. Make the Date the "Index" (the spine of the book)
df.set_index('Date', inplace=True)
print(df.head())
Explanation: By setting the index to the Date, we tell Python: "Don't treat these as row numbers (1, 2, 3). Treat them as timeline points (Jan 1, Jan 2, Jan 3)."
ARIMA stands for AutoRegressive Integrated Moving Average. Don't worry about the long name; think of it as a machine that looks at its own past to guess its future.
from statsmodels.tsa.arima.model import ARIMA
# 1. Initialize the Time Machine
# The order=(1,1,1) are settings knobs (p,d,q) we tune later
model = ARIMA(df['Usage'], order=(1,1,1))
# 2. Train the machine on history
model_fit = model.fit()
# 3. Predict the next day's usage
forecast = model_fit.forecast(steps=1)
print(f"Tomorrow's predicted usage: {forecast[0]}")
Output:
Tomorrow's predicted usage: 450.2 kWh
Explanation:
ARIMA(...): We create the model.fit(): The model looks at the "wiggle" of the line in the past.forecast(steps=1): It extends that line by one step into the future.How does the computer separate the "Trend" from the "Season"? It uses a process called Decomposition.
Imagine you are listening to a song. Your brain separates the Lyrics (Trend) from the Beat (Seasonality). The computer does the same with data.
In the 7-TimeSeries lessons, you will encounter a difficult concept called Stationarity.
Most statistical models (like ARIMA) struggle to predict data if the mean (average) keeps changing.
Instead of predicting the Price ($100, $110, $120), we predict the Change (+$10, +$10, +$10).
The list [10, 10, 10] is flat and easy to predict!
# Create a new column showing only the change from yesterday
# This removes the "Trend" and makes the data stationary
df['stationary_data'] = df['Usage'].diff()
# Look at the first few rows
print(df['stationary_data'].head())
Explanation:
.diff(): Subtracts today's value from yesterday's value.You might think Time Series is only for Stock Brokers, but it is everywhere.
In this chapter, we explored 7-TimeSeries. We learned that:
We have covered almost every type of static and historical data. But what if we want to build a robot that learns by doing? What if we want to train an AI to play a video game, where it fails, tries again, and gets better?
This requires a completely different approach called Reinforcement Learning.
Generated by Code IQ