Nikolai Kochetov, Yandex
Nikolai Kochetov
ClickHouse developer
Independent execution steps
URL
URL != ''
URL
length(URL)
avg
Chain (tree, graph) of steps with
In-memory execution (LocustDB)
Properties
Row by row execution (MySQL, Postgres)
Batch execution (MonetDB, ClickHouse)
Row by row execution
Batch execution
Push strategy
ClickHouse: IBlockOutputStream
Pull strategy
ClickHouse: IBlockInputStream
Push vs Pull
Insert query (into several partitions) - push
Push vs Pull
Select query (form several parts and order by) - pull
Push vs Pull
Merge parts - pull
Push vs Pull
Insert select: difficult case
Mixed strategy
Does current pipeline work well?
Can it work better?
New pipeline (in development)
SET experimental_use_processors = 1
Use processors pipeline
Pipeline is a directional graph
How to execute
Why it works
Processors
|
Ports
|
Use processors pipeline
How ClickHouse executes queries in parallel?
Copy pipeline for each thread
Pull strategy (IBlockInputStream)
Query Pipeline
Part of pipeline is executed in single thread
Graph traverse (Processors)
Query Pipeline
Right chain can be executed in 5 threads (best case)
Sometimes we need to change pipeline during execution
Use previous pipeline as example
Sort
stores all query data in memory
Set max_bytes_before_external_sort = <some limit>
Sometimes we need to change pipeline during execution
Use previous pipeline as example
Sort
stores all query data in memory
Set max_bytes_before_external_sort = <some limit>
Sometimes we need to change pipeline during execution
Use previous pipeline as example
Sort
stores all query data in memory
Set max_bytes_before_external_sort = <some limit>
Sometimes we need to change pipeline during execution
Use previous pipeline as example
Sort
stores all query data in memory
Set max_bytes_before_external_sort = <some limit>
Sometimes we need to change pipeline during execution
Use previous pipeline as example
Sort
stores all query data in memory
Set max_bytes_before_external_sort = <some limit>
Will print pipeline in Graphviz format
digraph
{
n140638219161104[label="SourceFromStorage"];
n140638217764624[label="ExpressionTransform"];
n140638219121680[label="FilterTransform"];
n140638217764048[label="ExpressionTransform"];
n140638217755024[label="AggregatingTransform"];
n140638217763856[label="ExpressionTransform"];
n140638219121360[label="LimitsCheckingTransform"];
n140638142287888[label="ConvertingAggregatedToBlocksTransform"];
...
}
AST rewriting approach
AST rewriting approach
Pipeline optimization approach
Manage quota for users
Common executor for multiple pipelines
Example set of similar queries
It’s possible to make common pipeline for several queries
Idea
Features