New features of ClickHouse

Author: Alexey Milovidov, 2019-09-18.

New features of ClickHouse

A random selection
of features that I remember

CONSTRAINTs for INSERT queries

CREATE TABLE hits ( URL String, Domain String, CONSTRAINT c_valid_url CHECK isValidUTF8(URL), CONSTRAINT c_domain CHECK Domain = domain(URL) )

Checked on INSERT.

Developer — Gleb Novikov, HSE Moscow

CONSTRAINTs for INSERT queries

INSERT INTO hits VALUES ('https://yandex.ru/', 'google.com') Exception: Constraint `c_domain` for table `hits` is violated at row 1. Expression: (Domain = domain(URL)). Column values: URL = 'https://yandex.ru/', Domain = 'google.com'.

Don't checked on background merges.

Don't checked on ALTER UPDATE.

CONSTRAINTs for INSERT queries

ALTER TABLE hits ADD CONSTRAINT c_valid_url CHECK isValidUTF8(URL)

Existing data is not checked when CONSTRAINT is added.

ALTER TABLE hits DROP CONSTRAINT c_valid_url

Queries With Parameters

SELECT count() FROM test.hits WHERE CounterID = {id:UInt32} AND SearchPhrase = {phrase:String}

— substitutions have names and types;
— substituted on AST level.

TODO:
— substitutions for sets in IN operator;
— substitutions for table, column, database names.
— support for custom server URLs with predefined queries.

Developer — Alexander Tretyakov, HSE Moscow

Queries With Parameters

$ curl "http://localhost:8123/" --data-binary \ "SELECT count() FROM test.hits WHERE CounterID = {id:UInt32}" Exception: Query parameter `id` was not set. $ curl "http://localhost:8123/?param_id=34" --data-binary \ "SELECT count() FROM test.hits WHERE CounterID = {id:UInt32}" 1962 $ curl "http://localhost:8123/?param_id=34;%20DROP%20DATABASE%20test" ... "SELECT count() FROM test.hits WHERE CounterID = {id:UInt32}" Exception: Value 34; DROP DATABASE test cannot be parsed as UInt32 for query parameter 'id' $ curl "http://localhost:8123/?param_id=34;%20DROP%20DATABASE%20test" ... "SELECT count() FROM test.hits WHERE CounterID = {id:String}" Exception: Illegal types of arguments (UInt32, String) of function equals

New Formats

— Protobuf;

— Parquet;

ORC (input-only) — ArenaData;

Template, TemplateIgnoreSpaces;

Template Format

Allow to define a template for data formatting/parsing.

A template contains substitutions and delimieters.

Each substitution specifies data escaping rule:
Quoted, Escaped, CSV, JSON, XML, Raw.

Website ${domain:Quoted} has ${count:Raw} pageviews.

You can specify a template for rows, a delimiter between rows
and a template to wrap resultset.

Example: to parse web access logs.
Example: to parse deeply nested JSON.
Example: generate HTML right in ClickHouse.

Optimization of queries with ORDER BY

with respect to table sorting key

SELECT event_time, message FROM system.text_log ORDER BY event_date DESC, event_time DESC LIMIT 10 ┌──────────event_time─┬─message───────────────────────────────────────────┐ │ 2019-09-05 04:20:35 │ Renaming temporary part tmp_insert_201909_10_10_0 │ │ 2019-09-05 04:20:35 │ Flushing system log │ │ 2019-09-05 04:20:34 │ Flushing system log │ │ 2019-09-05 04:20:34 │ Renaming temporary part tmp_insert_201909_10_10_0 │ │ 2019-09-05 04:20:28 │ Renaming temporary part tmp_insert_201909_9_9_0 to│ │ 2019-09-05 04:20:28 │ Flushing system log │ │ 2019-09-05 04:20:27 │ Renaming temporary part tmp_insert_201909_9_9_0 to│ ... └─────────────────────┴───────────────────────────────────────────────────┘ 10 rows in set. Elapsed: 0.291 sec. Processed 42.21 million rows, 6.10 GB (145.13 million rows/s., 20.98 GB/s.)

Optimization of queries with ORDER BY

with respect to table sorting key

:) SET optimize_read_in_order = 1 :) SELECT event_time, message FROM system.text_log ORDER BY event_date DESC, event_time DESC LIMIT 10 ┌──────────event_time─┬─message───────────────────────────────────────────┐ │ 2019-09-05 04:28:59 │ Merged 6 parts: from 201909_1_96_19 to 201909_101_│ │ 2019-09-05 04:28:59 │ Renaming temporary part tmp_merge_201909_1_101_20 │ │ 2019-09-05 04:28:59 │ Merge sorted 761 rows, containing 202 columns (202│ │ 2019-09-05 04:28:59 │ Reading 2 marks from part 201909_101_101_0, total │ │ 2019-09-05 04:28:59 │ Reading 2 marks from part 201909_100_100_0, total │ ... └─────────────────────┴───────────────────────────────────────────────────┘ 10 rows in set. Elapsed: 0.012 sec. Processed 54.41 thousand rows, 7.42 MB (4.68 million rows/s., 637.76 MB/s.)

Optimization of queries with ORDER BY

with respect to table sorting key

— ORDER BY table sorting key;

— ASC / DESC / mixed:

ORDER BY event_date DESC, event_time ASC

— or by the prefix:

ORDER BY event_date DESC

— of by expressions with monotonic functions of them;

ORDER BY event_date DESC, -toUInt32(event_time)

— available from version 19.14;

Developers — Anton Popov, Yandex; Anastasiya Rodigina, HSE Moscow

Data Gaps Filling

SELECT EventDate, count() FROM table GROUP BY EventDate ORDER BY EventDate ┌──EventDate─┬─count()─┐ │ 2019-09-01 │ 5 │ │ 2019-09-02 │ 3 │ │ 2019-09-04 │ 4 │ │ 2019-09-05 │ 1 │ └────────────┴─────────┘

Data Gaps Filling

SELECT EventDate, count() FROM table GROUP BY EventDate ORDER BY EventDate WITH FILL ┌──EventDate─┬─count()─┐ │ 2019-09-01 │ 5 │ │ 2019-09-02 │ 3 │ │ 2019-09-03 │ 0 │ │ 2019-09-04 │ 4 │ │ 2019-09-05 │ 1 │ └────────────┴─────────┘

Data Gaps Filling

WITH FILL — a modifier for ORDER BY element;

WITH FILL FROM start

WITH FILL FROM start TO end

WITH FILL FROM start TO end STEP step

WITH FILL can be applied for any elements in ORDER BY:

ORDER BY EventDate WITH FILL, EventTime WITH FILL STEP 3600

— available in version 19.14.

Developers — Anton Popov, Yandex; Dmitri Utkin, HSE Moscow

Data Masking in Logs

SELECT * FROM table WHERE Passport = 7609164285 <query_masking_rules> <rule> <regexp>Passport = \d+</regexp> <replace>Passport = ***</replace> </rule> <query_masking_rules>

— suppress sensitive data in logs and system tables;

Developer — Mikhail Filimonov, Altinity

Server Logs for Introspection

in system tables:

— system.query_log;

— system.query_thread_log;

— system.part_log;

— system.trace_log;

— system.text_log;

— system.metric_log;

Server Logs for Introspection

system.text_log

Now we write ClickHouse logs into ClickHouse!

DESCRIBE TABLE system.text_log ┌─name──────────┬─type───────────────────┐ │ event_date │ Date │ │ event_time │ DateTime │ │ microseconds │ UInt32 │ │ thread_name │ LowCardinality(String) │ │ thread_number │ UInt32 │ │ os_thread_id │ UInt32 │ │ level │ Enum8('Fatal' = 1, '...│ │ query_id │ String │ │ logger_name │ LowCardinality(String) │ │ message │ String │ │ revision │ UInt32 │ │ source_file │ LowCardinality(String) │ │ source_line │ UInt64 │ └───────────────┴────────────────────────┘

Server Logs for Introspection

system.metric_log

— for those who forgot to setup monitoring.

— record all the ClickHouse metrics each second (by default).

SELECT toStartOfMinute(event_time) AS h, sum(ProfileEvent_UserTimeMicroseconds) AS user_time, bar(user_time, 0, 60000000, 80) AS bar FROM system.metric_log WHERE event_date = today() GROUP BY h ORDER BY h

Server Logs for Introspection

SELECT toStartOfMinute(event_time) AS h, sum(ProfileEvent_UserTimeMicroseconds) AS user_time, bar(user_time, 0, 60000000, 80) AS bar FROM system.metric_log WHERE event_date = today() GROUP BY h ORDER BY h ┌───────────────────h─┬─user_time─┬─bar───────────────────────────────────────────────┐ │ 2019-09-05 04:12:00 │ 0 │ │ │ 2019-09-05 04:13:00 │ 0 │ │ │ 2019-09-05 04:14:00 │ 524000 │ ▋ │ │ 2019-09-05 04:15:00 │ 15880000 │ █████████████████████▏ │ │ 2019-09-05 04:19:00 │ 36724000 │ ████████████████████████████████████████████████▊ │ │ 2019-09-05 04:20:00 │ 17508000 │ ███████████████████████▎ │ │ 2019-09-05 04:21:00 │ 0 │ │ │ 2019-09-05 04:22:00 │ 0 │ │ │ 2019-09-05 04:23:00 │ 0 │ │ │ 2019-09-05 04:24:00 │ 0 │ │ │ 2019-09-05 04:25:00 │ 0 │ │ │ 2019-09-05 04:26:00 │ 0 │ │ │ 2019-09-05 04:27:00 │ 0 │ │ │ 2019-09-05 04:28:00 │ 0 │ │ │ 2019-09-05 04:29:00 │ 80000 │ │ │ 2019-09-05 04:30:00 │ 0 │ │ │ 2019-09-05 04:31:00 │ 0 │ │

Sampling Profiler on a Query Level

Record code locations where the query was executing
in every execution thread, at each moment of time with some period.

If the query is slow — where exactly (in code) it was slow?

— where the specific query spent time?

— where the time was spent for queries of some kind?

— where the time was spent for queries of some user?

— where the time was spent for queries, cluster wide?

Developer — Nikita Lapkov, HSE Moscow; et all.

Sampling Profiler on a Query Level

1. Turn on one of the following settings (or both):

SET query_profiler_cpu_time_period_ns = 1000000; SET query_profiler_real_time_period_ns = 1000000;

2. Run your queries.
Recorded samples will be saved into system.trace_log table.

event_date: 2019-09-05 event_time: 2019-09-05 05:47:44 revision: 54425 timer_type: CPU thread_number: 149 query_id: b1d8e7f9-48d8-4cb3-a768-0a6683f6f061 trace: [140171472847748,61781958,110943821,117594728,117595220,115654933, 120321783,63251928,111161800,120329436,120331356,120308294,120313436,120319113, 120143313,115666412,120146905,111013972,118237176,111013972,117990912,111013972, 110986070,110986938,61896391,61897898,61887509,156206624,140171472807643]

Sampling Profiler on a Query Level

trace — an array of addresses in machine code (stack trace);

Translate address to function name:
— demangle(addressToSymbol(trace[1]))
Translate address to source file name and line number:
— addressToLine(trace[1])

* don't forget to install clickhouse-common-static-dbg package

Example: functions top:

SELECT count(), demangle(addressToSymbol(trace[1] AS addr)) AS symbol FROM system.trace_log WHERE event_date = today() GROUP BY symbol ORDER BY count() DESC LIMIT 10

Sampling Profiler on a Query Level

Пример: топ функций:

┌─count()─┬─symbol──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ 517 │ void LZ4::(anonymous namespace)::decompressImpl<32ul, false>(char const*, char*, unsigned long) │ │ 480 │ void DB::deserializeBinarySSE2<4>(DB::PODArray<unsigned char, 4096ul, Allocator<false, false>, 15ul, 16ul>&, DB::PODArray<unsigned long, 4096ul, Allocator<false, false>, 15ul, 16ul>&, DB::ReadBuffer&, unsigned long) │ │ 457 │ DB::VolnitskyBase<true, true, DB::StringSearcher<true, true> >::search(unsigned char const*, unsigned long) const │ │ 270 │ read │ │ 163 │ void LZ4::(anonymous namespace)::decompressImpl<16ul, true>(char const*, char*, unsigned long) │ │ 130 │ void LZ4::(anonymous namespace)::decompressImpl<16ul, false>(char const*, char*, unsigned long) │ │ 58 │ CityHash_v1_0_2::CityHash128WithSeed(char const*, unsigned long, std::pair<unsigned long, unsigned long>) │ │ 44 │ void DB::deserializeBinarySSE2<2>(DB::PODArray<unsigned char, 4096ul, Allocator<false, false>, 15ul, 16ul>&, DB::PODArray<unsigned long, 4096ul, Allocator<false, false>, 15ul, 16ul>&, DB::ReadBuffer&, unsigned long) │ │ 37 │ void LZ4::(anonymous namespace)::decompressImpl<8ul, true>(char const*, char*, unsigned long) │ │ 32 │ memcpy │ └─────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Sampling Profiler on a Query Level

Example: top of contexts (stacks) for a query:

SELECT count(), arrayStringConcat(arrayMap(x -> concat( demangle(addressToSymbol(x)), '\n ', addressToLine(x)), trace), '\n') AS sym FROM system.trace_log WHERE query_id = '1a1272b5-695a-4b17-966d-a1701b61b3eb' AND event_date = today() GROUP BY trace ORDER BY count() DESC LIMIT 10
count(): 154 sym: DB::VolnitskyBase<true, true, DB::StringSearcher<true, true> >::search(unsigned char const*, unsigned long) const /opt/milovidov/ClickHouse/build_gcc9/dbms/programs/clickhouse DB::MatchImpl<true, false>::vector_constant(DB::PODArray<unsigned char, 4096ul, Allocator<false, false>, 15ul, 16ul> const&, DB::PODArray<unsigned long, 4096ul, Allocator<false, false>, 15ul, 16ul> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, DB::PODArray<unsigned char, 4096ul, Allocator<false, false>, 15ul, 16ul>&) /opt/milovidov/ClickHouse/build_gcc9/dbms/programs/clickhouse DB::FunctionsStringSearch<DB::MatchImpl<true, false>, DB::NameLike>::executeImpl(DB::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long) /opt/milovidov/ClickHouse/build_gcc9/dbms/programs/clickhouse DB::PreparedFunctionImpl::execute(DB::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long, bool) /home/milovidov/ClickHouse/build_gcc9/../dbms/src/Functions/IFunction.cpp:464 DB::ExpressionAction::execute(DB::Block&, bool) const /usr/local/include/c++/9.1.0/bits/stl_vector.h:677 DB::ExpressionActions::execute(DB::Block&, bool) const /home/milovidov/ClickHouse/build_gcc9/../dbms/src/Interpreters/ExpressionActions.cpp:759 DB::FilterBlockInputStream::readImpl() /home/milovidov/ClickHouse/build_gcc9/../dbms/src/DataStreams/FilterBlockInputStream.cpp:84 DB::IBlockInputStream::read() /usr/local/include/c++/9.1.0/bits/stl_vector.h:108 DB::ExpressionBlockInputStream::readImpl() /home/milovidov/ClickHouse/build_gcc9/../dbms/src/DataStreams/ExpressionBlockInputStream.cpp:34 DB::IBlockInputStream::read() /usr/local/include/c++/9.1.0/bits/stl_vector.h:108 DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>::thread(std::shared_ptr<DB::ThreadGroupStatus>, unsigned long) /usr/local/include/c++/9.1.0/bits/atomic_base.h:419 ThreadFromGlobalPool::ThreadFromGlobalPool<void (DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>::*)(std::shared_ptr<DB::ThreadGroupStatus>, unsigned long), DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>*, std::shared_ptr<DB::ThreadGroupStatus>, unsigned long&>(void (DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>::*&&)(std::shared_ptr<DB::ThreadGroupStatus>, unsigned long), DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>*&&, std::shared_ptr<DB::ThreadGroupStatus>&&, unsigned long&)::{lambda()#1}::operator()() const /usr/local/include/c++/9.1.0/bits/shared_ptr_base.h:729 ThreadPoolImpl<std::thread>::worker(std::_List_iterator<std::thread>) /usr/local/include/c++/9.1.0/bits/atomic_base.h:551 execute_native_thread_routine /home/milovidov/ClickHouse/ci/workspace/gcc/gcc-build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:81 start_thread /lib/x86_64-linux-gnu/libpthread-2.27.so clone /build/glibc-OTsEL5/glibc-2.27/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Tiered Storage

Example: store hot data on SSD and archive data on HDDs.

Multiple storage policies can be configured and used on per-table basis.

Tiered Storage

Step 1: configure available disks (storage paths):

<disks> <fast_disk> <!-- disk name --> <path>/mnt/fast_ssd/clickhouse</path> </fast_disk> <disk1> <path>/mnt/hdd1/clickhouse</path> <keep_free_space_bytes>10485760 </keep_free_space_bytes> </disk1> ...

Tiered Storage

Step 2: configure storage policies:

<policies> <ssd_and_hdd> <volumes> <hot> <disk>fast_ssd</disk> <max_data_part_size_bytes>1073741824 </max_data_part_size_bytes> </hot> <cold> <disk>disk1</disk> </cold> <move_factor>0.2</move_factor> </volumes> </ssd_and_hdd> ...

Tiered Storage

Step 3: use the configured policy for your table:

CREATE TABLE table ( ... ) ENGINE = MergeTree ORDER BY ... SETTINGS storage_policy = 'ssd_and_hdd'

Tiered Storage

The data will be moved between volumes automatically.

You can also do it manually:

ALTER TABLE table MOVE PART|PARTITION ... TO VOLUME|DISK ...

— available in 19.15 testing.

Table Function 'input'

For data transformation at INSERT time:

INSERT INTO table SELECT *, domain(URL) FROM input(TSV, 'URL String, ...')

You pipe your data to this query as with usual INSERT.

Data is transformed on the fly by SELECT expression.

Examples:
— calculate data for columns;
— skip unneeded columns;
— do filtering, aggregations, joins.

— available in 19.15 testing.

Adaptive Index Granularity

Automatically tune index granularity if you have fat columns.

Everything is working out of the box.

— available in 19.13.

Globs for File/URL/HDFS

SELECT * FROM file( 'logs*_{000..999}.tsv', TSV, 's String, x UInt8')

A set of files is processed in parallel.

— available in 19.14.

Official RPM packages

CentOS, RedHat... GosLinux.

Supported versions starting from CentOS 5.

sudo yum install yum-utils sudo rpm --import \ https://repo.yandex.ru/clickhouse/CLICKHOUSE-KEY.GPG sudo yum-config-manager --add-repo \ https://repo.yandex.ru/clickhouse/rpm/stable/x86_64 sudo yum install clickhouse-server clickhouse-client

And more features...

Changing some table settings on the fly

A 'neighbor' function

OS schedule priority

Table function VALUES

COLUMNS matcher

Live View

And more features...

Cascaded Materialized Views

The right logic of AND/OR with NULL

The right logic for 'join_use_nulls'

Correct memory accounting for queries

Auto update of DNS cache

Settings constraints

Row Level security

And more features...

Mutations cannot prevent merges to work

Parallel loading of data parts on startup

Optimization of GROUP BY performance

Speedup of math functions (from Yandex CTO)

Speedup of JSON functions

Loading of dictionaries don't interfere each other

Speedup of Merge tables

CREATE TABLE AS table_function()

Did you know?

— TTL for tables and columns;

— MySQL interface;

— Adaptive granularity;

— Secondary indices;

— Functions for text search;

— Functions for JSON;

— WITH ROLLUP / WITH CUBE;

— Aggregate functions for machine learning

— ASOF JOIN;

September 2019

— DDL queries for dictionaries;

— Indexing by z-Order curve;

— S3 import/export;

— Parallel parsing of data formats;

— Speedup of INSERT with VALUES with expressions;

October 2019

— Initial implementation of RBAC;

— Initial implementation of Merge JOIN;

Autumn-Winter 2019

— More than initial implementation of RBAC;

— More than initial implementation of Merge JOIN;

— Workload management;

2020

Improvements for Replicated tables:

— Wait-free ALTER ADD/DROP;
— ALTER RENAME COLUMN;
— polymorphic data parts for MergeTree;
— better support for insertion with small batches;
— quorum INSERT without linearizability;
— INSERT without available quorum of ZooKeeper nodes;
— don't store parts info in ZooKeeper;
— provide an alternative to ZooKeeper;
— VFS and support for cold data on S3;
— proper support for JBOD;
— cluster auto rebalance;

2020

— ALTER MATERIALIZED VIEW;
— optimization of GROUP BY with respect to table sorting key;
— functions for time series analysis;
— functions for statistical tests;
— DISTINCT and ORDER BY modifiers for every aggregate function;
— a prototype of GPU offloading;
— background recompression of old data;
— minimal support for transactions on multiple reads/writes;
— replicated database engine;

2020, if we're lucky

UDF in C++ and Rust.

Window functions.

Support for some classes of dependent subqueries.

.

.

Web site: https://clickhouse.com/

Maillist: [email protected]

YouTube: https://www.youtube.com/c/ClickHouseDB

Telegram chat: https://telegram.me/clickhouse_ru, clickhouse_en

GitHub: https://github.com/ClickHouse/ClickHouse/

Twitter: https://twitter.com/ClickHouseDB

Google groups: https://groups.google.com/forum/#!forum/clickhouse