Welcome to Chapter 4!
In the previous chapter, vcr_cassette_dir, we organized our file system so that every test has a specific place to store its recordings (cassettes).
Now we have a dedicated folder for our files, but we have a major security problem. When we record a conversation with an AI model (like GPT-4), the recording includes everythingβincluding your private passwords and API keys.
The Use Case: Imagine you write a test that sends a request to OpenAI. To make this work, your computer sends your secret API key in the header of the message:
Authorization: Bearer sk-real-secret-key-12345
The Disaster:
cassette.yaml.cassette.yaml, and uses your credit card to generate millions of words.The Solution: We need a "Censor" or a "Redactor." Before any text is written to a file, we need to find the secrets and scrub them out with a black marker.
We want the saved file to look like this:
Authorization: AUTHORIZATION-XXX
The HEADERS_TO_FILTER concept is simply the list of words that the Censor needs to look for.
HEADERS_TO_FILTER?It is a simple Python dictionary.
authorization, api-key).AUTHORIZATION-XXX).Think of this as a set of instructions you give to a security guard. "If you see a badge that says 'Authorization', put a sticker over the ID number that says 'XXX'."
This dictionary is defined in conftest.py. It is quite long because CrewAI supports many different AI providers (OpenAI, Anthropic, Google, AWS, Azure), and they all name their keys differently.
We will break the code down into small categories to make it easy to read.
These are standard headers used by almost all websites for login and security.
HEADERS_TO_FILTER = {
"authorization": "AUTHORIZATION-XXX",
"cookie": "COOKIE-XXX",
"set-cookie": "SET-COOKIE-XXX",
"x-api-key": "X-API-KEY-XXX",
# ... logic continues ...
authorization: Used by OpenAI and many others for Bearer tokens.cookie: Contains session data that could let someone impersonate you.x-api-key: A common standard for passing keys.OpenAI and Microsoft Azure have their own specific headers.
"openai-organization": "OPENAI-ORG-XXX",
"openai-project": "OPENAI-PROJECT-XXX",
"azureml-model-session": "AZUREML-MODEL-SESSION-XXX",
"x-ms-client-request-id": "X-MS-CLIENT-REQUEST-ID-XXX",
# ... logic continues ...
openai-organization: Identifies which company account gets billed. We mask this to keep your org ID private.x-ms-...: Microsoft (MS) headers used for Azure cloud tracking.Other AI providers use different names for their keys. We must catch them all.
"x-goog-api-key": "X-GOOG-API-KEY-XXX",
"anthropic-organization-id": "ANTHROPIC-ORGANIZATION-ID-XXX",
"anthropic-ratelimit-tokens-remaining": "ANTHROPIC-RATELIMIT-XXX",
# ... logic continues ...
x-goog-api-key: Google's version of an API key.anthropic-ratelimit...: Notice we are filtering "rate limits" (how many tokens you have left).Amazon Web Services (AWS) is very strict about security signatures.
"x-amz-date": "X-AMZ-DATE-XXX",
"amz-sdk-invocation-id": "AMZ-SDK-INVOCATION-ID-XXX",
"x-amzn-requestid": "X-AMZN-REQUESTID-XXX",
}
x-amz-date: AWS requests often require a precise timestamp. We mask this so we don't have to worry about time zones or clock drift in our recordings.You might remember this list from Chapter 2: vcr_config. This dictionary is imported and converted into a list format that VCR understands.
Here is the snippet from Chapter 2 to refresh your memory:
@pytest.fixture(scope="module")
def vcr_config(vcr_cassette_dir: str) -> dict[str, Any]:
config = {
# ... other config ...
"filter_headers": [(k, v) for k, v in HEADERS_TO_FILTER.items()],
}
return config
By passing this list to the config, we ensure that globally, for every single test in CrewAI, these secrets are never written to disk.
In this chapter, we learned about HEADERS_TO_FILTER:
XXX).However, a list is just a piece of paper. It doesn't do anything by itself. We need an active worker to take this list, look at the incoming network requests, and actually perform the replacement.
In the next chapter, we will look at the function that applies these rules to outgoing requests.
Next Chapter: _filter_request_headers
Generated by Code IQ