Welcome to Chapter 7!
In the previous chapter, Model Evaluation, we audited our model. We graded it on a test set and confirmed that it is accurate enough to be useful.
However, right now, our model is "trapped" inside a Python script. If a web developer wants to use our model for their website, they can't. They don't know how to run Python scripts or load PyTorch weights.
We need to turn our model into a Service.
Imagine a fast-food restaurant.
Model Serving is the process of building this Drive-Thru.
We want to create a web address (URL) like http://localhost:8000/predict.
{"title": "Intro to CNNs", "description": "..."}{"prediction": "computer-vision"}.To do this, we use two tools:
First, we define the application. This is like setting up the sign at the drive-thru that says "Order Here."
from fastapi import FastAPI
# Define the application
app = FastAPI(
title="Made With ML",
description="Classify machine learning projects.",
version="0.1",
)
We need a class to hold our model. In previous chapters, we used TorchPredictor. Here, we wrap it in a Ray Serve Deployment.
This class initializes the model once (when the server starts) so it doesn't have to reload the heavy weights for every single customer request.
from ray import serve
from madewithml import predict
# This decorator transforms the class into a scalable service
@serve.deployment(num_replicas="1", ray_actor_options={"num_cpus": 1})
@serve.ingress(app)
class ModelDeployment:
def __init__(self, run_id: str, threshold: int = 0.9):
# Load the best model from our previous training run
best_checkpoint = predict.get_best_checkpoint(run_id=run_id)
# Load the predictor (Brain + Preprocessor)
self.predictor = predict.TorchPredictor.from_checkpoint(best_checkpoint)
self.threshold = threshold
@serve.deployment: Tells Ray "This class will handle web requests."@serve.ingress(app): Connects our FastAPI app to this class.num_replicas="1": Starts 1 copy of the model. If we get too popular, we can increase this number to open more "Drive-Thru windows."
Now we define the logic for what happens when a request hits the /predict/ URL.
We add a safety check here. If the model is only 50% sure, we shouldn't tell the user "It's Computer Vision." We should say "Other" or "Unsure." This makes our application safer.
@app.post("/predict/")
async def _predict(self, request: Request):
# 1. Read the JSON data from the user
data = await request.json()
# 2. Convert JSON to the format our Predictor expects
sample_ds = ray.data.from_items([{
"title": data.get("title", ""),
"description": data.get("description", "")
}])
# 3. Make the prediction
results = predict.predict_proba(ds=sample_ds, predictor=self.predictor)
We don't just return the raw result. We apply business logic.
# 4. Apply custom logic (The "Confidence Check")
for i, result in enumerate(results):
pred = result["prediction"]
prob = result["probabilities"]
# If confidence is too low (e.g., < 90%), return "other"
if prob[pred] < self.threshold:
results[i]["prediction"] = "other"
return {"results": results}
What happens when a user clicks "Submit" on our website?
The full implementation is in madewithml/serve.py. It combines everything we've discussed.
To start the restaurant, we need a main block that binds everything together.
# madewithml/serve.py
if __name__ == "__main__":
# Get arguments (like which Run ID to use)
parser = argparse.ArgumentParser()
parser.add_argument("--run_id", help="run ID to use for serving.")
args = parser.parse_args()
# Start Ray
ray.init()
# Launch the deployment
serve.run(ModelDeployment.bind(run_id=args.run_id))
Once the script is running, the API is live! We can test it using a tool like curl (command line) or Python.
Input (Terminal):
curl -X POST "http://127.0.0.1:8000/predict/" \
-H "Content-Type: application/json" \
-d '{"title": "Intro to CNNs", "description": "Processing images with layers."}'
Output:
{
"results": [
{
"prediction": "computer-vision",
"probabilities": {
"computer-vision": 0.98,
"nlp": 0.01,
"mlops": 0.01
}
}
]
}
We have successfully built a Model Serving layer.
Now, anyone in the world (or at least on our network) can use our machine learning model just by sending a web request.
However, right now this is running on your laptop. If you close your laptop, the service dies. To make this a real product, we need to move it to the cloud.
๐ Next Step: Infrastructure & Deployment
Generated by Code IQ