Welcome to Chapter 13! In the previous chapter, 7-TimeSeries, we learned how to predict the future by analyzing the history of the past. We looked at static logs of data to guess tomorrow's weather or stock prices.
But what if you aren't just watching the world? What if you are acting in it?
In the previous chapters, we gave the computer the answers (Labels) to study. But in real life, nobody gives you an answer key. You have to try things, fail, and try again.
This brings us to the folder 8-Reinforcement.
Imagine you want to teach a computer to play a video game, like Super Mario Bros.
if enemy_is_here then jump for every pixel on the screen.You let the computer play the game.
Over time, the computer figures out: "Hey, jumping makes my score go up, and running into things makes my score go down." It learns by doing.
Reinforcement Learning (RL) is different from everything else we have learned. It is a loop of interaction between two things:
This is the learner. It is the entity making decisions.
This is the game or the maze. It is everything outside the agent.
This is what the Agent does (e.g., Move Left, Move Right, Jump).
This is the feedback.
To use this folder, we typically use a Python library called Gymnasium (formerly Gym). It provides standard "Environments" (games) for our agents to play in.
Let's pretend we are training a robot to balance a pole on a cart (a classic problem called "CartPole").
import gymnasium as gym
# 1. Create the environment
# "render_mode" lets us see the game on screen
env = gym.make("CartPole-v1", render_mode="human")
# 2. Reset the world to start a new game
# "state" tells us where the pole is right now
state, info = env.reset()
print("Game Started!")
Explanation: We don't need to code the physics of gravity or the graphics of the cart. The library handles the "World." We just need to control the "Agent."
Before the robot learns, it knows nothing. It acts like a baby, flailing its arms randomly.
# 1. Pick a random move
# 0 = Push Left, 1 = Push Right
action = env.action_space.sample()
# 2. Tell the environment to perform that move
observation, reward, terminated, truncated, info = env.step(action)
print(f"I took action: {action}")
print(f"I got reward: {reward}")
Output:
I took action: 1
I got reward: 1.0
Explanation:
env.step(action): This is the magic command. We send an action to the game.reward: The game tells us if that was a good move. In CartPole, you get +1 point for every second the pole stays upright.In a real RL script, we put this inside a loop.
# Play for 1000 steps
for _ in range(1000):
# Take a random action (because we haven't learned yet!)
action = env.action_space.sample()
# Step forward
observation, reward, terminated, truncated, info = env.step(action)
# If the pole falls over, restart the game
if terminated or truncated:
observation, info = env.reset()
Explanation: Right now, the robot is just guessing. To make it smart, we need a way for it to remember which actions led to rewards.
How does the Agent actually learn? It creates a Policy.
A Policy is a strategy. It maps a State (Situation) to an Action.
One of the simplest ways to implement this "Brain" is using a technique called Q-Learning.
Imagine a giant cheat sheet (a table).
The Agent looks at the table to decide what to do. Initially, the table is empty (all zeros).
import numpy as np
# Create a generic Q-Table with zeros
# Imagine 10 possible states and 2 actions
q_table = np.zeros((10, 2))
# ... After taking an action and getting a reward ...
# Update the Cheat Sheet
# Old Value + Learning Rate * (Reward + Future Guess - Old Value)
# Ideally, we increase the score for that action
q_table[state, action] = new_value
Explanation: This code represents the "Memory" update.
Reinforcement Learning is the frontier of AI. It allows computers to solve problems we don't know the answer to yet.
In this chapter, we explored 8-Reinforcement. We learned that:
We have now covered the entire spectrum of Machine Learning! We have set up tools, visualized data, predicted numbers, classified images, grouped music, read text, forecasted time, and taught robots to play games.
So... what now?
How do we take these cool experiments and actually use them in our daily lives or careers?
Generated by Code IQ