In the previous chapter, GPT Architecture, we achieved a massive milestone: we assembled the entire Generative Pre-trained Transformer. We connected the embeddings, stacked the blocks, and attached the output head.
But just because we built a car doesn't mean it drives.
Imagine you have just finished assembling a complex LEGO car. It looks perfect on the outside. But did you connect the steering wheel to the tires? Did you put the batteries in correctly?
In AI Engineering, we face the same uncertainty. We need to answer three critical questions before we try to train this model on millions of books:
This chapter is about writing the Final Inspection test suite.
The first test is simple. We feed the model a sequence of numbers (tokens) and ensure it outputs a prediction for every single token.
If we feed in 10 words, we expect 10 predictions. Each prediction should contain a score for every word in our vocabulary.
import torch
from tinytorch import GPT, GPTConfig
def test_gpt_output_shape():
# 1. Setup a small model
config = GPTConfig(vocab_size=100, n_embd=32, n_layer=2)
model = GPT(config)
# 2. Create dummy input: Batch=1, Sequence Length=5
idx = torch.tensor([[1, 5, 2, 9, 3]])
# 3. Get predictions
logits = model(idx)
# 4. Check shape: [Batch, Time, Vocab] -> [1, 5, 100]
assert logits.shape == (1, 5, 100)
print("โ
Plumbing Check Passed: Output shape is correct.")
Explanation:
vocab_size=100 to make the test fast.1 x 5 x 100.GPT-3 has 175 Billion parameters. Our model will be smaller, but we need to know exactly how big it is. This test ensures we aren't accidentally creating extra layers or missing layers.
We iterate through every "tensor" in the model (weights and biases) and count how many numbers are inside them.
def test_parameter_count():
# 1. Setup specific config
config = GPTConfig(vocab_size=100, n_embd=64, n_layer=2, n_head=2)
model = GPT(config)
# 2. Count parameters manually
# sum up the number of elements (numel) in each weight
total_params = sum(p.numel() for p in model.parameters())
# 3. Print for inspection
print(f"Total Parameters: {total_params}")
# 4. Sanity check: Should be > 0
assert total_params > 0
print("โ
Weight Check Passed.")
Explanation:
p.numel(): Returns the total number of items in a tensor (e.g., a 10x10 matrix has 100 elements).12 * n_layer * n_embd^2...) and assert equality.This is the most powerful test in your toolkit.
Before training on Wikipedia (which takes days), we train the model on one single batch of data for 10 steps.
Here is what happens during this test.
We will use a simple optimizer and try to force the model to memorize a random sequence.
Step 1: Setup
def test_model_learning():
# 1. Setup Model
config = GPTConfig(vocab_size=100, n_embd=32, n_layer=2)
model = GPT(config)
# 2. Create a dummy input and a dummy target
# We want the model to predict 'target' from 'input'
input_ids = torch.randint(0, 100, (1, 8))
target_ids = torch.randint(0, 100, (1, 8))
Step 2: The Training Loop
# 3. Create a basic optimizer
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3)
# 4. Track the loss
initial_loss = None
for _ in range(10): # Run 10 training steps
optimizer.zero_grad() # Reset gradients
logits = model(input_ids) # Forward pass
# Calculate loss (Cross Entropy)
# Reshape logits to [Batch*Time, Vocab] for PyTorch
B, T, V = logits.shape
loss = torch.nn.functional.cross_entropy(
logits.view(B*T, V),
target_ids.view(B*T)
)
Step 3: Update and Verify
# Record first loss
if initial_loss is None:
initial_loss = loss.item()
# Backward pass (Calculate gradients)
loss.backward()
# Update weights
optimizer.step()
# 5. Verdict: Did we improve?
final_loss = loss.item()
print(f"Start Loss: {initial_loss:.4f}, End Loss: {final_loss:.4f}")
assert final_loss < initial_loss
print("โ
Learning Check Passed: Loss is decreasing.")
Explanation:
final_loss < initial_loss, it means the "brain" is physically capable of updating itself.We combine all our checks into one executable block.
if __name__ == "__main__":
print("๐ Starting GPT Systems Check...")
test_gpt_output_shape()
test_parameter_count()
test_model_learning()
print("๐ All Systems Go! The GPT model is ready for launch.")
We have successfully verified our full GPT assembly.
We now have a working model. But "working" doesn't mean "efficient." A model might output the correct shape but take 10GB of RAM and run incredibly slowly.
Before we deploy this, we need to understand the costs of running it. How much memory does it need? How fast can it generate text?
In the next chapter, we will perform a systems-level analysis of our creation.
Next Step: Systems Analysis
Generated by Code IQ