Welcome back! In the previous chapter, Vulnerability Identification, our agent successfully identified a critical security hole (SQL Injection) in the target website.
We have a target. We know it is vulnerable. Now, we need to exploit it to prove the risk. But exploitation isn't just one big button press; it is a complex series of delicate steps.
Imagine you are planning a bank heist in a movie. You don't just run in screaming. You have a precise checklist:
If you skip step 1, you get caught. If you skip step 3, you get no gold.
Task Tracking is the agent's way of managing this checklist. Instead of trying to do everything at once, the agent breaks the complex job of "Hacking the Database" into four specific, manageable sub-tasks.
Our agent has confirmed that /api/order is vulnerable. To fully demonstrate the danger, it needs to perform the following Execution Chain:
users, admin).This chapter teaches the agent how to create and manage this "Todo List."
In shannon, the agent uses a specialized tool to initialize this workflow. It takes the big vulnerability we found and generates the four specific steps.
First, we define the standard operating procedure for an SQL Injection attack.
# The standard workflow for SQL Injection
exploit_steps = [
"step_1_confirm_vulnerability",
"step_2_fingerprint_database",
"step_3_enumerate_tables",
"step_4_exfiltrate_data"
]
print("Exploitation Plan Created.")
Output: Exploitation Plan Created.
Now, we load these steps into the agent's tracking tool. This creates a formal "Todo List" in the agent's memory.
# Initialize the tracker with our specific steps
agent.tracker.initialize_tasks(exploit_steps)
# Verify the first task is ready
current_task = agent.tracker.get_current_task()
print(f"Current Objective: {current_task}")
Output:
Current Objective: step_1_confirm_vulnerability
As the agent works (which we will see in the next chapter), it updates this list. Let's simulate completing the first step.
# We pretend we successfully confirmed the bug
agent.tracker.mark_complete("step_1_confirm_vulnerability")
# The tracker automatically moves to the next step
next_task = agent.tracker.get_current_task()
print(f"New Objective: {next_task}")
Output:
New Objective: step_2_fingerprint_database
The agent effectively checked off the first item and automatically focused on the second.
How does the agent manage this list internally? It uses a simple state machine.
The TaskTracker acts like a project manager holding a clipboard.
Let's look at a simplified version of shannon/tools/task_tracker.py. It uses a Python list and a dictionary to track the status of every item.
class TaskTracker:
def __init__(self):
self.queue = [] # The list of steps
self.status = {} # The status of each step
def initialize_tasks(self, steps):
self.queue = steps
# Set all tasks to 'PENDING' initially
for step in steps:
self.status[step] = "PENDING"
def get_current_task(self):
# Find the first task that isn't finished
for step in self.queue:
if self.status[step] == "PENDING":
return step
return "ALL_TASKS_COMPLETE"
Explanation:
initialize_tasks: Takes our list of 4 steps and saves them. It marks them all as waiting to be done.get_current_task: Loops through the list from top to bottom. The first one it finds that hasn't been done yet is returned as the current order. This ensures the agent never skips a step.We are about to start the actual hacking (Execution). This involves running complex commands in the terminal.
If we tried to run the "Steal Data" command before we ran the "Find Database Version" command, the attack would likely fail because we wouldn't know the correct syntax for that database.
By enforcing this specific order via Task Tracking, we ensure the agent builds its knowledge logically:
Our agent is fully prepared.
The planning phase is over. It is time for action. The agent needs to execute real system commands to interact with the target server and perform these steps.
In the final chapter of this tutorial, we will learn how the agent uses the Bash Execution tool to run these commands and complete the mission.
Next Chapter: Tool Use - Bash Execution
Generated by Code IQ