Welcome to the final chapter!
In the previous chapter, Model Serving, we turned our model into a web service. We made it possible for users to send requests and get predictions.
However, right now, everything is running on your laptop (or a single Google Colab notebook).
We need to move out of your "Home Kitchen" and rent a "Factory." This chapter covers Infrastructure & Deployment.
Building software locally is like cooking at home. It's great for experiments, but you can't feed a whole city from your kitchen. You need a Factory (The Cloud).
But you can't just tell the cloud "Here is my code." You need to provide a Blueprint.
In this project, we define these blueprints using YAML files (simple text configuration files).
First, we need to rent the machines. In the cloud (AWS, GCP, etc.), machines come in different sizes.
We define this in deploy/cluster_compute.yaml.
# deploy/cluster_compute.yaml
cloud: aws
region: us-west-2
head_node_type:
instance_type: m5.2xlarge # A standard CPU machine for the Manager
worker_node_types:
- instance_type: g5.4xlarge # A powerful GPU machine for the Workers
min_workers: 1 # Hire at least 1 worker
max_workers: 4 # Hire up to 4 if we get busy
g5 usually means it has a GPU (Graphics Processing Unit), which is essential for fast Deep Learning.Imagine renting a factory, but the machines arrive empty. No Windows, no Python, no libraries. We need to install everything.
We define this in deploy/cluster_env.yaml. This tells the cloud exactly how to set up the environment before our code runs.
# deploy/cluster_env.yaml
# Start with a standard Ray image (has Python & Ray pre-installed)
base_image: anyscale/ray:2.7.0-py310-cu118
# Install system tools
debian_packages:
- curl
# Install our specific Python libraries
post_build_cmds:
- pip install -r requirements.txt
torch, pandas, fastapi, etc., listed in our requirements.txt.Now we have the machines (Hardware) and the tools (Software). Finally, we need to tell them what to do.
Do we want to Train? Tune? Serve? We define this in deploy/jobs/workloads.yaml.
# deploy/jobs/workloads.yaml
name: madewithml-production
project_id: my_project_id
# Link to the configs we defined above
cluster_env: madewithml-cluster-env
compute_config: madewithml-cluster-compute
# The actual command to run
entrypoint: python madewithml/train.py
What happens when we actually press the "Deploy" button (or run the command)?
Compute Config and physically turns on the requested servers in a data center (e.g., AWS us-west-2).Cluster Env and automatically run pip install ....entrypoint command runs. The code executes exactly as it did on your laptop, but now it has access to massive power.We don't usually write Python code to deploy infrastructure. Instead, we use Command Line Interface (CLI) tools provided by our platform (like Ray or Anyscale).
To send our training job to the cloud, we run a command in our terminal.
# Submit the job using the configuration file
anyscale job submit --config-file deploy/jobs/workloads.yaml
Once the job is running, we can check on it without logging into the machine.
# Check the status of the job
anyscale job status --name madewithml-production
Result: It might say PENDING (machines are turning on), RUNNING (training in progress), or SUCCEEDED.
For Model Serving (the web API), we want the job to run forever, not just finish and stop. We use a "Service" instead of a "Job."
The configuration is almost identical, but the command changes:
# Deploy the web service
anyscale service deploy deploy/services/serve_config.yaml
This ensures that if a machine crashes, the Cloud Manager immediately replaces it with a new one, ensuring our website never goes down.
Congratulations! You have completed the Made With ML journey.
Let's recap what you have built:
You have gone from a messy CSV file to a production-grade Machine Learning system running in the cloud. You are no longer just training models; you are building ML Systems.
Happy Building! ๐
Generated by Code IQ