Chapter 15 ยท CORE

solution/R/

๐Ÿ“„ 15_solution_r_.md ๐Ÿท Core

Chapter 15: solution/R/

Welcome to Chapter 15! In the previous chapter, 9-Real-World, we finished our journey through the core Machine Learning curriculum. We learned how to build, deploy, and use models in the real world using Python.

But Python isn't the only language spoken in the land of Data Science.

Just like in the human world, where you might say "Hello" in English or "Hola" in Spanish, in Data Science, you can write code in Python or R.

This brings us to the folder solution/R/.

Motivation: The Second Language

Imagine you are a chef.

The solution/R/ directory contains the exact same lessons you just learned, but translated into the R Programming Language.

Why does this exist? While Python is great for general programming (building apps, websites, and scripts), R was invented specifically by statisticians for statistics. If you go into academic research or specialized data analysis, you will see R everywhere.

Key Concepts: R vs. Python

If you open this folder, you will see files that look slightly different from what we used before.

1. R Markdown (.Rmd)

In notebook.ipynb, we used Jupyter Notebooks. The R equivalent is R Markdown.

2. Assignment (<-)

This is the first thing that confuses beginners.

3. The Tidyverse

In Python, we used Pandas to manipulate data. In R, we use a collection of tools called the Tidyverse. It is designed to be very easy to read, almost like English sentences.

How to Use This Abstraction

To use the files in solution/R/, you typically need a specific tool called RStudio. It is the "Command Center" for R, similar to how VS Code is for Python.

Step 1: Loading Data (Comparison)

Let's see how the code looks different for the exact same task: Loading our Pumpkin Data.

Python Version (What you know):

import pandas as pd
df = pd.read_csv("pumpkins.csv")
print(df.head())

R Version (What is in this folder):

library(tidyverse)

# The arrow puts data into 'df'
df <- read_csv("pumpkins.csv")

# Show the first few rows
head(df)

Explanation: They look almost identical! read_csv does the same thing in both languages. The main difference is the arrow <-.

Step 2: Plotting Data

R is famous for its beautiful graphs using a tool called ggplot2.

# Start a plot using the pumpkin data
ggplot(data = df, aes(x = Size, y = Price)) + 
  
  # Draw dots (points)
  geom_point()

Output: A scatter plot appears showing the relationship between Size and Price.

Explanation:

The Internal Structure: Under the Hood

How does the project organize these two languages? They run on parallel tracks.

When you look at the solution/R/ folder, you will see subfolders that match the main chapters perfectly.

sequenceDiagram participant Student participant Main as Python Folder participant Alt as Solution/R Folder Student->>Main: "I want to learn Regression" Main-->>Student: "Here is a Jupyter Notebook (.ipynb)" Student->>Alt: "I want to do it in R" Alt-->>Student: "Here is an R Markdown file (.Rmd)" Note right of Alt: The Math (Linear Algebra) is identical. Note right of Alt: Only the syntax changes.

Breakdown of the Map

Deep Dive: Solving Regression in R

Let's revisit the Pumpkin Price prediction problem from Chapter 7, but solve it the "Statistician's Way" using R.

In Python, we used Scikit-learn. In R, regression is built into the core language.

# 1. Train the model (Linear Model = lm)
# Formula: "Price depends on Size" (Price ~ Size)
model <- lm(Price ~ Size, data = df)

# 2. Look at the results
summary(model)

Output: R produces a very detailed statistical report, showing "P-values," "Residuals," and "F-statistics."

Explanation:

Making a Prediction

Just like in Python, we can ask our R model to guess the future.

# Create a new pumpkin of size 450
new_pumpkin <- data.frame(Size = 450)

# Predict the price
predict(model, new_pumpkin)

Output:

       1 
8.504 

Explanation: The result is $8.50, just like our Python model! Math is universal, even if the programming language changes.

Why this matters for Beginners

You might ask, "I just spent 14 chapters learning Python. Why should I care about R?"

  1. Bilingualism: Knowing two coding languages makes you much more hireable.
  2. Visualization: Many experts believe R's ggplot2 creates better, publication-quality graphs than Python's tools.
  3. Specific Jobs: If you want to work in Biology, Psychology, or rigorous Statistics, R is often the preferred language.

Conclusion

In this chapter, we explored solution/R/. We learned that:

We have now seen how to do Machine Learning in English (Python) and French (R). But what if you want to read this tutorial in actual human languages, like Spanish, Chinese, or Hindi?

Global knowledge should be accessible to everyone, regardless of what language they speak.

Next Chapter: translations


Generated by Code IQ