In the previous chapter, Core Feature Tests, we tested the "Pillars" of ClickHouse, such as backups and security. We made sure the database keeps data safe.
Now, we move to the "Brain" of the database. When you type a SQL query, how does ClickHouse figure out how to execute it? It uses a component called the Query Analyzer.
ClickHouse is currently undergoing a massive brain transplant: we are replacing the old query planner with a powerful new Analyzer.
This chapter is about Analyzer Tests.
Imagine you have a translator who translates English to Spanish.
The Challenge: We are building the New Translator. But before we fire the Old Translator, we must verify that the New one is working perfectly. It must handle complex sentences (queries) that the old one couldn't, without breaking the simple ones.
Central Use Case: We want to run a complex SQL query that uses a Common Table Expression (CTE) and a Joinβstructures that require deep grammatical understanding of SQL. We want to ensure the New Analyzer calculates the result correctly.
This is the original code that processed SQL in ClickHouse for years. It was fast but struggled with very complex SQL standard features (like subqueries inside joins).
This is the new engine. It builds a complete "Semantic Graph" of the query. It knows that column id in the subquery is the same as column id in the main table.
Since the new brain is still experimental in some versions, we have a switch to turn it on:
SET allow_experimental_analyzer = 1;
Analyzer tests are technically very similar to Stateless Queries. They involve a .sql file and a .reference file. The key difference is that we explicitly force the new brain to be active.
We create a file in tests/queries/0_stateless/ (or a specific analyzer folder). The most important part is the first line.
-- tests/queries/0_stateless/01234_analyzer_test.sql
-- Turn on the NEW Analyzer
SET allow_experimental_analyzer = 1;
-- Also turn on new optimization support
SET allow_experimental_analyzer_glossary_optimization = 1;
Explanation: These settings tell ClickHouse: "Ignore the old logic. Use the new experimental logic for everything that follows."
The new analyzer shines with complex logic, like defining a temporary result (CTE) and joining it.
-- Define a "Common Table Expression" (CTE) called 'my_cte'
WITH my_cte AS (
SELECT number, number * 10 AS val FROM numbers(3)
)
-- Join the CTE with itself
SELECT t1.val, t2.val
FROM my_cte AS t1
JOIN my_cte AS t2 ON t1.number = t2.number;
Explanation:
WITH ...: Creates a temporary table my_cte containing {0, 0}, {1, 10}, {2, 20}.JOIN: Matches the rows based on the ID.
Just like in Chapter 6, we define the expected output in 01234_analyzer_test.reference.
0 0
10 10
20 20
Explanation: We expect the multiplication to work and the join to find the matching rows.
What actually happens when you turn that setting on? The entire path the query takes inside the C++ code changes.
allow_experimental_analyzer.Interpreter logic which does analysis and execution simultaneously (which is harder to optimize).Here is the flow:
The fork in the road usually happens in InterpreterSelectQuery.cpp (or similar interpreters).
// Simplified C++ Logic inside InterpreterSelectQuery.cpp
// Check the settings context
if (context->getSettings().allow_experimental_analyzer)
{
// Use the new "Brain"
auto analyzer = QueryAnalyzer(query, context);
// Analyze and build a plan
plan = analyzer.analyze();
}
else
{
// Use the old "Brain" (Legacy)
// Manually interpret the AST parts one by one
analyzeLegacy(query, context);
}
Explanation:
context holds the user settings (where we did SET allow... = 1).QueryAnalyzer. This class is responsible for the new, rigorous semantic checks.Sometimes, we want to see how the Analyzer understood the query. We can use a special command:
-- Debugging the Analyzer
EXPLAIN QUERY TREE SELECT 1 + 1;
Output (Simplified):
QUERY id: 0
PROJECTION
FUNCTION name: plus
CONSTANT value: 1
CONSTANT value: 1
Explanation: This output confirms that the Analyzer successfully broke the query down into a "Function" (plus) and "Constants". If the test fails, developers look at this tree to see where the logic broke.
Analyzer Tests are critical for the future of ClickHouse.
In this chapter, we learned about Analyzer Tests.
SET allow_experimental_analyzer = 1.So far, we have tested the database engine (Stateless, Integration, Analyzer). But a database often needs a manager to coordinate multiple servers. In ClickHouse, this manager is called ClickHouse Keeper.
In the next chapter, we will learn how to verify that the manager is doing its job correctly.
Next Chapter: ClickHouse Keeper Tests
Generated by Code IQ