Welcome to the final chapter of our beginner's guide to Botan!
In the previous chapter, Argon2 (Password Hashing), we learned about memory-hard functions and saw that Botan checks your CPU capabilities to select the most efficient code.
Throughout this series, we have repeatedly used the term SIMD (Single Instruction, Multiple Data). We know it makes SHA-2, ChaCha, and Argon2 fast. But writing SIMD code is notoriously difficult and messy.
In this chapter, we will look at Utils (SIMD Wrappers). This is the internal toolkit Botan developers use to write high-speed cryptography without losing their minds.
Standard programming is like a robot with one arm. If you want to add four pairs of numbers:
SIMD is like an octopus. It picks up all four pairs at once, adds them all in a single movement, and puts them down together.
To use the octopus (SIMD) in C++, you usually have to use Compiler Intrinsics. These are special, ugly commands that look like this:
// Raw AVX2 code to add numbers. Looks scary, right?
__m256i result = _mm256_add_epi32(vector_a, vector_b);
If you write code like this:
Botan wraps these ugly commands into clean C++ classes like SIMD_4x32. This allows developers to write math using standard operators (+, ^, |) while the library handles the messy hardware details in the background.
We want to perform a bitwise XOR operation on 4 integers at once. This is a very common operation in cryptography (like in the ChaCha cipher).
int), it holds a packet of 4, 8, or 16 numbers side-by-side.+ or ^) to call the correct intrinsics.Note: These classes are primarily used inside the library itself. As a user, you benefit from them automatically. However, understanding them explains how Botan achieves its speed.
Let's imagine we are writing a piece of the ChaCha cipher and want to XOR four numbers.
Instead of creating int, we create a SIMD_4x32. This represents a packet of four 32-bit integers.
// Hypothetical internal usage
#include <botan/internal/simd_32.h>
void calculate() {
// Load 4 specific numbers into one register
Botan::SIMD_4x32 packet_a(10, 20, 30, 40);
// Load 4 other numbers
Botan::SIMD_4x32 packet_b(5, 5, 5, 5);
}
Explanation: packet_a now holds {10, 20, 30, 40} inside a single CPU register.
Now we can treat these packets like normal numbers.
// Perform XOR on all 4 pairs simultaneously
// Ideally takes 1 CPU cycle!
Botan::SIMD_4x32 result = packet_a ^ packet_b;
Explanation:
All happening at the exact same moment.
Once the calculation is done, we usually dump the results back into a standard byte array.
std::vector<uint8_t> output(16);
// Write the contents of the register to memory
result.store_le(output.data());
Explanation: store_le (Store Little Endian) takes the data out of the special CPU register and writes it into standard RAM so the rest of the program can read it.
How does packet_a ^ packet_b know what to do?
Botan defines these classes using preprocessor directives (#ifdef). It checks what computer you are compiling on and swaps out the "guts" of the class.
Let's look at a simplified version of how Botan implements the ^ (XOR) operator inside these wrappers.
File: botan/src/lib/utils/simd/simd_32.h (Conceptual)
First, the class holds the raw data specific to the architecture:
class SIMD_4x32 {
private:
#if defined(BOTAN_TARGET_HAS_SSE2)
__m128i m_simd_data; // Intel-specific type
#elif defined(BOTAN_TARGET_HAS_NEON)
uint32x4_t m_simd_data; // ARM-specific type
#else
uint32_t m_simd_data[4]; // Fallback array
#endif
// ... constructors ...
};
Explanation: If you compile this on an Intel laptop, m_simd_data is a 128-bit Intel register. If you compile on an iPhone (ARM), it becomes a NEON register.
Next, the operator logic:
inline SIMD_4x32 operator^(const SIMD_4x32& a, const SIMD_4x32& b) {
#if defined(BOTAN_TARGET_HAS_SSE2)
// Use Intel Intrinsic
return SIMD_4x32(_mm_xor_si128(a.raw(), b.raw()));
#elif defined(BOTAN_TARGET_HAS_NEON)
// Use ARM Intrinsic
return SIMD_4x32(veorq_u32(a.raw(), b.raw()));
#endif
// ... fallback logic ...
}
Explanation:
operator^: This allows the developer to use the ^ symbol._mm_xor_si128: The actual command sent to an Intel CPU.veorq_u32: The actual command sent to an ARM CPU.
By writing code using SIMD_4x32, the cryptography implementations (like SHA-2 and ChaCha) become portable. They automatically "upgrade" themselves to use the fastest instructions available on whatever machine runs them.
Congratulations! You have completed the Botan Basics tutorial.
Let's recap what we've learned:
You are now equipped with the fundamental concepts to use Botan in your C++ projects. Secure coding!
Generated by Code IQ