Welcome to Chapter 5! In the previous chapter, Hyperparameter Tuning, we acted as scientists. We ran experiments to find the best possible settings for our model.
Now, we have a winner. We have a trained "Chef" (Model) and a specific set of "Kitchen Tools" (Preprocessor) that work perfectly together.
But a trained model sitting on a hard drive is useless. We need to use it. This chapter covers Inference: the process of using a trained model to make predictions on new, unseen data.
Imagine a Sci-Fi Universal Translator.
In our project, we are building this device.
To build this device, we need to load two things we created in previous chapters. We call these Artifacts.
Crucial Point: We must use the exact same preprocessor during inference that we used during training. If "data" was ID 492 during training, it must be ID 492 now.
We ran many experiments in the last chapter. We need to find the specific "Run ID" that performed the best. We use MLflow (our experiment tracker) to find this.
import mlflow
def get_best_run_id(experiment_name, metric, mode):
# Search all runs in our experiment
sorted_runs = mlflow.search_runs(
experiment_names=[experiment_name],
order_by=[f"metrics.{metric} {mode}"], # Sort by error (Ascending)
)
# Pick the top one
run_id = sorted_runs.iloc[0].run_id
return run_id
Explanation: This function looks through our experiment logs and sorts them by "Validation Loss." It grabs the ID of the winner.
Once we have the ID, we need to download the files (the Checkpoint).
from ray.train import Result
def get_best_checkpoint(run_id):
# Find where the files are stored for this specific run
artifact_uri = mlflow.get_run(run_id).info.artifact_uri
# Load the results from that folder
results = Result.from_path(artifact_uri)
# Return the best checkpoint saved during that training run
return results.best_checkpoints[0][0]
Explanation: A "Checkpoint" is just a folder containing our model file (model.pt) and our configuration (args.json).
We need a clean way to bundle the Preprocessor and the Model together. We create a class called TorchPredictor.
This class is responsible for:
class TorchPredictor:
def __init__(self, preprocessor, model):
self.preprocessor = preprocessor
self.model = model
self.model.eval() # Important: Switch to inference mode!
def __call__(self, batch):
# Allow the class to be called like a function
results = self.model.predict(batch)
return {"output": results}
Explanation: model.eval() is critical. It tells PyTorch "Do not learn right now, just predict." If you forget this, your predictions might be inconsistent.
Now we can put it all together. We take a raw string, process it, and get a result.
# Create a sample input
title = "Transfer learning with transformers"
description = "Using BERT for classification."
sample_ds = ray.data.from_items([{"title": title, "description": description}])
# Use the predictor
results = predict_proba(ds=sample_ds, predictor=predictor)
# View result
print(results)
Output:
[
{
"prediction": "nlp",
"probabilities": {
"computer-vision": 0.02,
"nlp": 0.95,
"mlops": 0.03
}
}
]
What happens inside predict_proba? Let's visualize the flow of data.
We define this logic in madewithml/predict.py. Let's look at the key function predict_proba.
We can't just feed the string to the model. We must use the preprocessor we loaded from the checkpoint.
# Inside predict_proba()
# 1. Get the preprocessor from the loaded predictor
preprocessor = predictor.get_preprocessor()
# 2. Transform the new raw data into numbers
# This applies the SAME cleaning and tokenization as training
preprocessed_ds = preprocessor.transform(ds)
We pass the numbers to the model.
# 3. Run the model
# map_batches applies the predictor to the data efficiently
outputs = preprocessed_ds.map_batches(predictor.predict_proba)
# 4. Extract the raw probability arrays
y_prob = np.array([d["output"] for d in outputs.take_all()])
Result: y_prob is now a list of numbers like [0.05, 0.90, 0.05]. The model is 90% sure it's Class Index 1. But what is "Index 1"?
The model only knows numbers. Humans want tags. We need to map 1 back to "nlp".
results = []
for i, prob in enumerate(y_prob):
# Find the index with the highest score
tag_index = prob.argmax()
# Look up the name in the preprocessor's dictionary
tag = preprocessor.index_to_class[tag_index]
results.append({"prediction": tag})
Explanation: index_to_class is a dictionary saved inside the preprocessor (e.g., {0: "computer-vision", 1: "nlp"}). We use it to translate the math back into English.
We have successfully built the Inference Pipeline.
We created a system that:
Now we can classify any machine learning project description instantly!
But how do we know if our model is actually good? Is 90% accuracy enough? Does it fail on specific types of tags? To answer this, we need to perform a thorough audit.
๐ Next Step: Model Evaluation
Generated by Code IQ