ClickHouse C++ Meetup

Author: Alexey Milovidov, 2018-05-17.

Overview of ClickHouse Internal Architecture

The Foundation of ClickHouse — Columns

On disk — columns.
Data is stored by columns.

In memory — columns.
Data is processed by columns.

How Columns Are Stored in Memory

As pieces of columns — for example, 65,536 elements.

The size of a piece — depends on many things.

For SELECT — see the max_block_size setting.

How Columns Are Stored in Memory

Represented as objects with IColumn interface.

Variants — ColumnVector<T>, ColumnString, ColumnArray...

ColumnVector<T> — almost like std::vector<T>.
But under the IColumn interface.
And instead of std::vector<T> — PODArray<T> (why?).

How Columns Are Stored in Memory

Previously it was std::vector. PODArray — just an optimization.

PODArray:
— no unnecessary memset;
— has 15 bytes padding at the end;
— uses an allocator interface different from std::allocator, which sometimes allows mremap.

How Columns Are Stored in Memory

ColumnString — consists of two components:

1. Bytes laid out sequentially.
2. Offsets to the i+1 string.

h e l l o \0 w o r l d \0 6 12

How Columns Are Stored in Memory

ColumnConst

From one nested column,
containing one value.

What the IColumn Interface Provides

Basic operations:
— cut — extract part of a column, to implement LIMIT;
— filter — to implement WHERE;
— compareAt, permute — to implement ORDER BY;
...

How Columns Are Stored in Memory

Ownership — using COWPtr<IColumn>.

Previously it was std::shared_ptr<IColumn>.

Almost All Operations Are Immutable

virtual Ptr filter(const Filter & filt, ssize_t result_size_hint) const = 0;

Instead of modifying content, they create
and return a new column object.

This is normal, as operations are "coarse-grained".

But there are also "fine-grained", mutating operations.

IColumn, What It's Responsible For:

— storing data in memory;
— common operations on columns.

IColumn, What It's Similar To:

— Apache Arrow;
— arrays in NumPy;
— arrays in APL, J, K.

Motivation

Isolate maximally efficient
inner loops from wrapper code.

Code doesn't have to be efficient as a whole.

Optimizable places should be localizable.

«vectorized engine»

Bonus:
— SIMD instructions;
— clever optimizations for homogeneous data
(IColumn::filter, LIKE function implementation);

Data Types:

IDataType

— binary serialization and deserialization to data streams;
— one column can be written to multiple physical streams, example: Array(Array(UInt8));
— serialization and deserialization in text form for different data formats;
— data type properties;
— completely immutable: std::shared_ptr<const IDataType>.

DataTypeUInt32 \ / ColumnUInt32 X DataTypeDateTime / \ ColumnConst(ColumnUInt32)

Block

A piece of a table: a set of { ColumnPtr, DataTypePtr, name }

Data processing in the query execution pipeline
is performed on blocks.

... here there is an architectural error that will need to be fixed.

Block

Split into

Header: a set of { ColumnPtr, DataTypePtr, name }
Block: { size_t num_rows, std::vector, properties... }

Block Streams

IBlockInputStream: Block read();
IBlockOutputStream: void write(Block);

They implement:
— data formats (CSV, JSON, Native...)
— reading and writing to tables;
— transformations on data (Limit, Filter, Expression, ...)

Strictly typed — blocks have the same data types
and constant values.

/** Get data structure of the stream... */ virtual Block getHeader() const = 0;

(Very recently this method didn't exist.
Question — what did it allow to do?)

Table Engines

Interface — IStorage.
Implementations — StorageMemory, StorageMergeTree...
Class instances — are tables.

virtual BlockInputStreams read( const Names & /*column_names*/, const SelectQueryInfo & /*query_info*/, const Context & /*context*/, QueryProcessingStage::Enum & /*processed_stage*/, size_t /*max_block_size*/, unsigned /*num_streams*/)

Why BlockInputStreams?

Query Execution Pipeline

It's just a BlockInputStream (for SELECT)
or BlockOutputStream (for INSERT),
which performs all necessary transformations
when calling the read or write method.

Question — what about the pipeline for INSERT SELECT?

Query Execution Pipeline

SELECT works on pull principle, INSERT on push principle.

Actually this — is an architectural error.

— control flow is not controlled from outside;
— no ability to glue together the execution of multiple queries;
— more complex implementation of query cancellation, priorities and cooperative execution;
— hard to attach operations to the side of the pipeline — scalar subqueries execution, window functions, etc.
— no transparent access to algebraic properties of pipeline elements.
— hard to perform transformations on the pipeline;
— query execution plan is not available in declarative form;
— for distributed queries only a fragment of the execution plan is visible.

Lexer, Parser, AST

Parser — recursive descent parser, hand-written.

Ready-made SQL parsers:
— parser from PostgreSQL (library available);
— parser from sqlite;
— ready-made variants for ANTLR;
— Apache Calcite;

Features of ClickHouse parser:
— Nested columns;
— lambda functions;
— aliases and expressions anywhere in the query;

Query Analysis and Optimization

InterpreterSelectQuery
ExpressionAnalyzer

Mostly — rule-based optimizations:
— constant folding;
— merging identical expressions;
— removing unnecessary calculations;
— removing unnecessary columns;
...
— pushing ARRAY JOIN closer to the end;
— converting OR chains to IN;

Need to rewrite everything :)

Query Analysis and Optimization

Separate individual optimizations by their places.

Make query optimizations easily pluggable.

Extract common interfaces for some types of optimizations.
(example — peephole optimizations)

Functions

Work on an entire block at once.
The code implements not one function application,
but an entire loop over arrays.

Inner loop (usually) is its own
for each combination of argument types.

Inside the loop (usually) there are no virtual calls,
type checks, unnecessary branches.

Functions

Example: for the addition operator there are

UInt8 UInt16 UInt32 UInt64 UInt8 UInt16 UInt32 UInt64 Int8 Int16 Int32 Int64 ✕ Int8 Int16 Int32 Int64 Float32 Float64 Float32 Float64

combinations.

And also one of the arguments can be constant:
10 * 10 * 3 = 300 implementations.

Functions

Advantages:
— maximum specialization;
— (almost) maximum possible efficiency;
— sufficiently isolated code.

Disadvantages:
— inconvenient interface for implementation;
— bulky code (C++ templates);
— significant increase in binary size;
— no ability to optimize fused operations, example: x * y + z;

How to Execute Expressions Efficiently?

1. Vectorized engine.

Examples:
APL, A+, J (jd), K (kdb+, q)
Vectorwise (Actian Vector, VectorH), initial support in Hive

2. Runtime compilation of expressions.

Examples:
Impala, MemSQL, DBToaster

How to Execute Expressions Efficiently?

1. Vectorized engine.
2. Runtime compilation of expressions.

Best of all — both!

— SIMD;
— Instruction Level Parallelism;
— Out Of Order Execution;

https://github.com/ClickHouse/ClickHouse/pull/2277

Aggregate Functions

IAggregateFunction

create — initialize state
in a pre-prepared piece of memory;

update — update state with argument values;
merge — merge two states into one;

serialize, deserialize
— write to I/O stream (network, file, table)

insertResultInto — get final value.

Aggregate function state is a first class citizen in ClickHouse!

I Want to Add Something New to ClickHouse

— regular function;
https://github.com/ClickHouse/ClickHouse/pull/1535
gcd, lcm — Maks Skorokhod

— aggregate function;
https://github.com/ClickHouse/ClickHouse/pull/2352
windowFunnel — Sundy Li

— table function;
https://github.com/ClickHouse/ClickHouse/pull/2164/
file — decaseal (Topvisor)

I Want to Add Something New to ClickHouse

— data format;
https://github.com/ClickHouse/ClickHouse/pull/1387
Cap'n'Proto — Marek Vavruša

— table engine;
https://github.com/ClickHouse/ClickHouse/pull/1331
Kafka — Marek Vavruša

— database engine;
https://github.com/ClickHouse/ClickHouse/pull/914
Dictionary — Nicolai Kochetov

I Want to Add Something New to ClickHouse

— external dictionary source;
https://github.com/ClickHouse/ClickHouse/pull/204
http, executable — Oleg Alexeenkov

— external dictionary layout;
https://github.com/ClickHouse/ClickHouse/pull/785
iptrie — Marek Vavruša

I Want to Add Something New to ClickHouse

— syntax construct;
https://github.com/ClickHouse/ClickHouse/pull/2134
parentheses for UNION ALL elements — zhang2014

— query section;
https://github.com/ClickHouse/ClickHouse/pull/293
LIMIT BY — Artemkin Pavel

— query type;
https://github.com/ClickHouse/ClickHouse/pull/1163
SYSTEM — Vitaliy Lyudvichenko

I Want to Add Something New to ClickHouse

— data type;
https://github.com/ClickHouse/ClickHouse/pull/945
UUID — Guillaume Tassery

— compression algorithm;
https://github.com/ClickHouse/ClickHouse/pull/1045
none — Paweł Róg

— server protocol;

Already 124 Contributors!

.

.

Web site: https://clickhouse.com/

Google groups: https://groups.google.com/forum/#!forum/clickhouse

Maillist: [email protected]

Telegram chat: https://telegram.me/clickhouse_ru (more than 1500 participants) and https://telegram.me/clickhouse_en

GitHub: https://github.com/ClickHouse/ClickHouse/

Twitter: https://twitter.com/ClickHouseDB

+ meetups. Moscow, St. Petersburg, Novosibirsk, Yekaterinburg, Minsk, Nizhny Novgorod, Berlin, Palo Alto, Beijing, Sunnyvale, San Francisco...