Secrets of ClickHouse Performance

Author: Alexey Milovidov, 2017-12-06.

Secrets of ClickHouse
Performance

About Me

Alexey, ClickHouse developer.

What is ClickHouse?

ClickHouse is a distributed analytical column-oriented DBMS.


I will talk about several details of ClickHouse's operation
that I consider important.

How Data is Stored in Tables

How Data is Stored in Tables

ClickHouse is a modular system.

It supports various storage engines.

Storage engine is responsible for:

— storing, reading and writing data;
— synchronizing data access;
— data replication.

How Data is Stored in Tables

Examples of simple Storage Engines:

Memory, File, TinyLog.

If you want to add your own table engine to ClickHouse,
start by looking at the simplest ones.

How Data is Stored in Tables

Examples of specialized Storage Engines:

Distributed, Merge, View, Null.

How Data is Stored in Tables

For serious applications, use MergeTree family tables

How Data is Stored in MergeTree

Similar to LSM-Tree (HBase, Cassandra, RocksDB…), but worse.

The table consists of a set of data "parts".

Each part is lexicographically sorted
by the primary key value.

Different parts are completely independent
and may overlap in key values.

How Data is Stored in MergeTree

/opt/clickhouse/data/default/hits_single_partition#
du -h --max-depth=1 | sort -k2 -V
102G    .
40G     ./all_1_1026_4
47G     ./all_1027_2210_4
15G     ./all_2211_2612_4
1.1G    ./all_2613_2640_2
2.2M    ./all_2641_2641_0

How Data is Stored in MergeTree

Within one part, each column is stored in a separate file.

Column data is serialized values laid out sequentially,

then compressed in blocks. Block size before compression is 64 KB .. 1 MB.

How Data is Stored in MergeTree

all_1_1026_4# ls -1
AdvEngineID.bin
AdvEngineID.mrk
Age.bin
Age.mrk
BrowserCountry.bin
BrowserCountry.mrk
BrowserLanguage.bin
BrowserLanguage.mrk
checksums.txt
...
primary.idx

How Data is Stored in MergeTree

The entire part is sorted by primary key.

Data from each column has a corresponding order.

(The N-th row corresponds to the N-th values in each column file)

How Data is Stored in MergeTree

During SELECT, in the worst case, reading from all parts is performed.

Just in case, each part has a minmax index of the partitioning key.

During INSERT, the batch of inserted data is sorted and a new part is formed in the file system.

In the background, several parts are merged together into one larger part.

How the Index Works

How the Index Works

One index, called the "primary key".

The primary key does not have to be unique
- it's just a key for ordering data.
(clustered index)

The index is sparse.

Disadvantages of Sparse Index

Does not work well for point lookup queries.

A sparse index addresses not every record in the table,
but a range of several thousand records.

Instead of reading one row, you have to read at least a batch.

… and you also have to decompress compressed blocks.

Disadvantages of Sparse Index

Requires data ordering.

For one table — only one sparse index.

Instead of reading one row, you have to read at least a batch.

… and you also have to decompress compressed blocks.

Advantages of Sparse Index

Very compact, fits in memory.

Advantages of Sparse Index

Support for complex logical expressions.

— binary search with ternary logic and interval arithmetic.

Example. There is a complex condition
WHERE a AND (b OR NOT c)...

We sequentially split the set of all data
into range granules, where the condition is:

— always false;
— always true;
— can be both false and true.

Advantages of Sparse Index

The index works even with a condition on a non-prefix of the primary key.

Example. PK: (CounterID, Date).

WHERE Date BETWEEN '2017-11-20' AND '2017-11-29'

— will be able to filter ranges, if
there is a sufficiently long range for a fixed CounterID.

Advantages of Sparse Index

The index always works no worse than full scan.

Even if 95% of the data is selected by the index,
it's better than reading 100%.

Advantages of Sparse Index

Support for conditions with partially monotonic functions on columns.

Example. PK: Date

WHERE toDayOfMonth(Date) = 29

Advantages of Sparse Index

Example: parallelogram index analysis.

Vectorized Query Execution

(vectorized query execution)

How to Process Queries
in an Analytical DBMS

Two ways:

1. Query compilation.

Examples: MemSQL, Impala.

2. Vectorized processing.

Examples: kdb, VectorWise (Actian Vector).

Better — both together.

Vectorized Query Execution

ClickHouse — vectorized query execution

… with minor query compilation capabilities.

Data is processed in "blocks"
— small pieces of columns.

Each column in memory is a flat array (one or more).

(Structure of Arrays, see also Apache Arrow).

Block size is chosen small for cache locality.

Vectorized Query Execution

Advantages:

— no overhead for operation dispatching;

— possibilities for applying SIMD instructions;

— possibilities for applying operations to the entire array at once
(example — substring search in a string);

— good code isolation;

Vectorized Query Execution

Disadvantages:

— code bloat due to supporting specializations for different data types;

— overhead for reading and writing temporary data to cache;

— possibly lower instructions per clock
due to one operation in each loop;

— complexity with implementing short-circuit operations;

— complex interface for UDFs (user-defined functions);

GROUP BY Execution

GROUP BY Execution

Parallel aggregation.

Distributed aggregation.

External memory aggregation.

Incremental aggregation.

Community

Website: https://clickhouse.com/

Google groups: https://groups.google.com/forum/#!forum/clickhouse

Email: [email protected]

Telegram chat: https://telegram.me/clickhouse_en and https://telegram.me/clickhouse_ru (already 1100 participants)

GitHub: https://github.com/ClickHouse/ClickHouse/

Twitter: https://twitter.com/ClickHouseDB

+ meetups. Moscow, Saint Petersburg, Novosibirsk,
Yekaterinburg, Minsk, Berlin, Palo Alto, Nizhny Novgorod…
then Moscow again.