Welcome back! In the previous chapters, we built the individual components of our testing engine:
Now, we face a logistics problem. We have all these tools, but who creates the workflow? Who makes sure the Probe actually talks to the Generator, and that the result actually goes to the Detector?
This is the job of the Harness.
Imagine you want to run a full security scan. You want to test 3 different attack types (Probes) against 1 model, and check the results with 2 different scanners (Detectors).
Without a Harness, you would have to write a script like this:
# A manual, messy workflow
all_results = []
for probe in my_probes:
# 1. Generate attacks
attempts = probe.probe(my_generator)
# 2. Judge attacks
for attempt in attempts:
for detector in my_detectors:
score = detector.detect(attempt)
# 3. Save results...
# 4. Handle errors...
# 5. Print progress bars...
This code is repetitive, hard to maintain, and prone to bugs. If you want to add a progress bar or save a log file, you have to rewrite everything.
In garak, the Harness is the Conductor of the orchestra.
It does not generate text itself. It does not judge text itself. Instead, it tells the other components when to start and stop. It manages the pipeline.
The Harness ensures that:
Typically, garak runs the harness for you automatically when you use the command line. However, seeing it in Python code helps us understand the architecture.
The Harness takes the "ingredients" (Model, Probes, Detectors) and cooks the meal (The Report).
First, we need to load the components we learned about in previous chapters.
from garak.generators.openai import OpenAIGenerator
from garak.probes.encoding import InjectBase64
from garak.detectors.mitigation import MitigationBypass
# 1. Load the Model
model = OpenAIGenerator("gpt-3.5-turbo")
# 2. Load the Probe (Attacker)
probe = InjectBase64()
# 3. Load the Detector (Judge)
detectors = [MitigationBypass()]
Now, we create the Harness and tell it to run.
from garak.harnesses.base import Harness
# Initialize the Orchestrator
harness = Harness()
# Run the workflow!
# The harness will coordinate the interaction between the objects.
harness.run(model, [probe], detectors, evaluator=None)
Note: In a real run, you would also pass an Evaluator (covered in Chapter 6) to calculate the final report.
When you call harness.run(), a specific sequence of events occurs. The Harness acts like a manager on a factory floor.
Here is the workflow:
base.py
Let's look at the actual code in garak/harnesses/base.py. This class controls the logic we described above.
The run method accepts the list of tools. It performs a check to make sure you actually provided probes and detectors.
# garak/harnesses/base.py
def run(self, model, probes, detectors, evaluator, announce_probe=True):
# Safety check: Do we have detectors?
if not detectors:
raise ValueError("No detectors, nothing to do")
# Safety check: Do we have probes?
if not probes:
raise ValueError("No probes, nothing to do")
This is the heart of garak. The harness iterates over every probe you requested.
# Iterate through every probe in the list
for probe in probes:
# Check if the probe fits the model (e.g. text vs image)
if not self._modality_match(probe, model):
continue
# EXECUTE THE ATTACK
# The probe talks to the model and returns 'attempts'
attempt_results = probe.probe(model)
Once the probe finishes attacking, the Harness takes the results (attempt_results) and feeds them to the detectors.
# Iterate through detectors (Judges)
for d in detectors:
# Show a progress bar for the detection phase
iterator = tqdm.tqdm(attempt_results)
for attempt in iterator:
# Ask the detector to score this attempt
score = d.detect(attempt)
# Save the score inside the attempt object
attempt.detector_results[d.name] = score
Finally, the Harness saves the data to a JSON file (the report) and asks the Evaluator to summarize the scores.
# Write results to the report file
for attempt in attempt_results:
_config.transient.reportfile.write(
json.dumps(attempt.as_dict()) + "\n"
)
# Calculate final pass/fail metrics
evaluator.evaluate(attempt_results)
garak together..run().Throughout this chapter, we saw that the Harness passes around an object called an Attempt. The Probe creates it, the Generator fills it, and the Detector grades it.
This object is the "Folder" containing all the files for a specific test case. In the next chapter, we will open up this folder and see exactly what's inside.
Next Chapter: Attempt (Interaction Context)
Generated by Code IQ