In the previous chapter, Python Setup, we set up a "kitchen" for cooking data using Python.
However, not every chef uses the same knives. In the world of Data Science, there is another powerful language designed specifically for statistics and data visualization: R.
If you have chosen the R track for this curriculum, this chapter is your starting point. It will guide you through installing the necessary tools to run the code in the lessons.
While Python is a "general-purpose" language (used for websites, games, and data), R was born in the world of statistics. It is like a specialized surgical tool designed expressly for analyzing data.
Let's revisit our goal: predicting pumpkin prices.
pumpkins.csv.The Goal: Configure your computer so you can write R code to visualize this data effortlessly.
The Solution: We need to install the R Language (the engine), RStudio (the dashboard), and the Tidyverse (the accessories).
Setting up R is slightly different from Python. Here are the three main components you need to understand.
This is the computer language itself. It performs the calculations.
This is the interface you will actually look at. It provides windows for your code, your files, your plots, and your data history.
R comes with basic tools, but they can be clunky. We use a collection of modern packages called the Tidyverse (for data handling) and Tidymodels (for Machine Learning).
To solve our use case, follow these three steps.
Go to the CRAN (Comprehensive R Archive Network) website: cloud.r-project.org.
Go to the Posit website: posit.co/download/rstudio-desktop/.
Open RStudio. You will see a "Console" window. We need to install the tools we will use in almost every lesson.
Copy and paste this command into the Console and press Enter:
# Install the core collections of tools
install.packages("tidyverse")
install.packages("tidymodels")
# Install specific tools for our lessons
install.packages(c("here", "skimr", "janitor"))
Explanation: install.packages() tells R to go to the internet (CRAN), find these tools, and download them to your computer. You only need to do this once.
In Lesson Structure, we talked about Notebooks. In R, we use a special file format called R Markdown (.Rmd).
An R Markdown file mixes text and code chunks. It allows you to write a report and run code in the same document.
To use the tools you just installed, you must "load" them at the start of every file.
# This goes at the top of your code
library(tidyverse)
library(tidymodels)
# Now we can read data!
print("Libraries loaded successfully.")
Explanation: library() is like taking a tool out of the box. You have to do this every time you start a new session.
Here is how the "Pumpkin" code looks in R using the Tidyverse.
# Read the pumpkin data
pumpkins <- read_csv("data/US-pumpkins.csv")
# Show the first few rows
glimpse(pumpkins)
Explanation: read_csv is a Tidyverse function that is faster and smarter than the default R loader. glimpse gives you a neat summary of the data structure.
When you type library(tidyverse), what happens? How does R know where to look?
R relies on a network of servers called CRAN.
library().sessionInfo()If you ever run into trouble where code works on one computer but not another, R has a built-in diagnostic tool.
Run this command in your console:
# Check what is currently running
sessionInfo()
It will output text looking like this (simplified):
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur
attached base packages:
[1] stats graphics grDevices utils datasets
other attached packages:
[1] tidymodels_0.1.3 tidyverse_1.3.1
Explanation: This tells you exactly which version of R and which versions of the packages are currently active. This is crucial for troubleshooting bugs reported in Contribution Guidelines.
You might be wondering, "Do I need to do both?"
No. The Repository Structure is designed so that lessons often have parallel tracks.
.ipynb file, it is for Python..Rmd file, it is for R.You simply open the file that matches the language you installed.
In this chapter, we set up the R Environment:
library().Now that your computer is ready to speak the language of data (whether Python or R), we are almost ready to start the lessons. But first, let's look at how we built the interactive tool that tests your knowledge.
Next Chapter: Quiz Application Development
Generated by Code IQ