Superb New Features of ClickHouse

Author: Alexey Milovidov, 2019-09-06.

Superb New Features of ClickHouse

Random Set of Features
That I Remembered

CONSTRAINTS Support

CREATE TABLE hits ( URL String, Domain String, CONSTRAINT c_valid_url CHECK isValidUTF8(URL), CONSTRAINT c_domain CHECK Domain = domain(URL) )

Data is validated on INSERT.

Разработчик — Глеб Новиков, ВШЭ

CONSTRAINTS Support

INSERT INTO hits VALUES ('https://yandex.ru/', 'google.com') Exception: Constraint `c_domain` for table `hits` is violated at row 1. Expression: (Domain = domain(URL)). Column values: URL = 'https://yandex.ru/', Domain = 'google.com'.

Not checked during background merges.

Not checked during ALTER UPDATE.

CONSTRAINTS Support

ALTER TABLE hits ADD CONSTRAINT c_valid_url CHECK isValidUTF8(URL)

When adding CONSTRAINT, old data is not validated.

ALTER TABLE hits DROP CONSTRAINT c_valid_url

Parameterized Queries

SELECT count() FROM test.hits WHERE CounterID = {id:UInt32} AND SearchPhrase = {phrase:String}

— подстановки именованные и типизированные;
— подставляются на уровне синтаксического дерева.

TODO:
— подстановки для множеств в секции IN;
— подстановки для имён таблиц, столбцов, баз данных.
— возможность задать URL с предустановленными запросами.

Разработчик — Александр Третьяков, ВШЭ

Parameterized Queries

$ curl "http://localhost:8123/" --data-binary \ "SELECT count() FROM test.hits WHERE CounterID = {id:UInt32}" Exception: Query parameter `id` was not set. $ curl "http://localhost:8123/?param_id=34" --data-binary \ "SELECT count() FROM test.hits WHERE CounterID = {id:UInt32}" 1962 $ curl "http://localhost:8123/?param_id=34;%20DROP%20DATABASE%20test" ... "SELECT count() FROM test.hits WHERE CounterID = {id:UInt32}" Exception: Value 34; DROP DATABASE test cannot be parsed as UInt32 for query parameter 'id' $ curl "http://localhost:8123/?param_id=34;%20DROP%20DATABASE%20test" ... "SELECT count() FROM test.hits WHERE CounterID = {id:String}" Exception: Illegal types of arguments (UInt32, String) of function equals

New Formats

— Protobuf;

— Parquet;

ORC (input-only) — ArenaData;

Template, TemplateIgnoreSpaces;

Template Format

Allows to define a template for data formatting or parsing.

Template contains delimiters and substitutions.

For substitutions, escaping method is specified:
Quoted, Escaped, CSV, JSON, XML, Raw.

Website ${domain:Quoted} has ${count:Raw} pageviews.

A template for rows is specified, delimiter between rows
and template for resultset.

Example: parse access logs.
Example: parse complex nested JSON.
Example: generate HTML directly from ClickHouse.

ORDER BY Query Optimization

using table sorting key

SELECT event_time, message FROM system.text_log ORDER BY event_date DESC, event_time DESC LIMIT 10 ┌──────────event_time─┬─message───────────────────────────────────────────┐ │ 2019-09-05 04:20:35 │ Renaming temporary part tmp_insert_201909_10_10_0 │ │ 2019-09-05 04:20:35 │ Flushing system log │ │ 2019-09-05 04:20:34 │ Flushing system log │ │ 2019-09-05 04:20:34 │ Renaming temporary part tmp_insert_201909_10_10_0 │ │ 2019-09-05 04:20:28 │ Renaming temporary part tmp_insert_201909_9_9_0 to│ │ 2019-09-05 04:20:28 │ Flushing system log │ │ 2019-09-05 04:20:27 │ Renaming temporary part tmp_insert_201909_9_9_0 to│ ... └─────────────────────┴───────────────────────────────────────────────────┘ 10 rows in set. Elapsed: 0.291 sec. Processed 42.21 million rows, 6.10 GB (145.13 million rows/s., 20.98 GB/s.)

ORDER BY Query Optimization

using table sorting key

:) SET optimize_read_in_order = 1 :) SELECT event_time, message FROM system.text_log ORDER BY event_date DESC, event_time DESC LIMIT 10 ┌──────────event_time─┬─message───────────────────────────────────────────┐ │ 2019-09-05 04:28:59 │ Merged 6 parts: from 201909_1_96_19 to 201909_101_│ │ 2019-09-05 04:28:59 │ Renaming temporary part tmp_merge_201909_1_101_20 │ │ 2019-09-05 04:28:59 │ Merge sorted 761 rows, containing 202 columns (202│ │ 2019-09-05 04:28:59 │ Reading 2 marks from part 201909_101_101_0, total │ │ 2019-09-05 04:28:59 │ Reading 2 marks from part 201909_100_100_0, total │ ... └─────────────────────┴───────────────────────────────────────────────────┘ 10 rows in set. Elapsed: 0.012 sec. Processed 54.41 thousand rows, 7.42 MB (4.68 million rows/s., 637.76 MB/s.)

ORDER BY Query Optimization

using table sorting key

— ORDER BY по ключу сортировки;

— ASC / DESC / вперемешку:

ORDER BY event_date DESC, event_time ASC

— или его префиксу:

ORDER BY event_date DESC

— или по выражениям с монотонными функциями от него;

ORDER BY event_date DESC, -toUInt32(event_time)

— available since version 19.14;

Разработчик — Антон Попов, Яндекс; Анастасия Родигина, ВШЭ

Filling Data Gaps

SELECT EventDate, count() FROM table GROUP BY EventDate ORDER BY EventDate ┌──EventDate─┬─count()─┐ │ 2019-09-01 │ 5 │ │ 2019-09-02 │ 3 │ │ 2019-09-04 │ 4 │ │ 2019-09-05 │ 1 │ └────────────┴─────────┘

Filling Data Gaps

SELECT EventDate, count() FROM table GROUP BY EventDate ORDER BY EventDate WITH FILL ┌──EventDate─┬─count()─┐ │ 2019-09-01 │ 5 │ │ 2019-09-02 │ 3 │ │ 2019-09-03 │ 0 │ │ 2019-09-04 │ 4 │ │ 2019-09-05 │ 1 │ └────────────┴─────────┘

Filling Data Gaps

WITH FILL — модификатор для элемента ORDER BY;

WITH FILL FROM start

WITH FILL FROM start TO end

WITH FILL FROM start TO end STEP step

WITH FILL может быть применён к каждому элементу ORDER BY:

ORDER BY EventDate WITH FILL, EventTime WITH FILL STEP 3600

— not even merged yet.

Разработчик — Антон Попов, Яндекс; Дмитрий Уткин, ВШЭ

Hiding Data in Logs

SELECT * FROM table WHERE Passport = 7609164285 <query_masking_rules> <rule> <regexp>Passport = \d+</regexp> <replace>Passport = ***</replace> </rule> <query_masking_rules>

— suppresses data in logs and system tables;

Разработчик — Михаил Филимонов, Altinity

Logs for Introspection

in system tables:

— system.query_log;

— system.query_thread_log;

— system.part_log;

— system.trace_log;

— system.text_log;

— system.metric_log;

Logs for Introspection

system.text_log

Now we write ClickHouse logs in ClickHouse!

DESCRIBE TABLE system.text_log ┌─name──────────┬─type───────────────────┐ │ event_date │ Date │ │ event_time │ DateTime │ │ microseconds │ UInt32 │ │ thread_name │ LowCardinality(String) │ │ thread_number │ UInt32 │ │ os_thread_id │ UInt32 │ │ level │ Enum8('Fatal' = 1, '...│ │ query_id │ String │ │ logger_name │ LowCardinality(String) │ │ message │ String │ │ revision │ UInt32 │ │ source_file │ LowCardinality(String) │ │ source_line │ UInt64 │ └───────────────┴────────────────────────┘

Logs for Introspection

system.metric_log

— for those who forgot to configure monitoring.

— saves all ClickHouse metrics once per second (by default).

SELECT toStartOfMinute(event_time) AS h, sum(ProfileEvent_UserTimeMicroseconds) AS user_time, bar(user_time, 0, 60000000, 80) AS bar FROM system.metric_log WHERE event_date = today() GROUP BY h ORDER BY h

Logs for Introspection

SELECT toStartOfMinute(event_time) AS h, sum(ProfileEvent_UserTimeMicroseconds) AS user_time, bar(user_time, 0, 60000000, 80) AS bar FROM system.metric_log WHERE event_date = today() GROUP BY h ORDER BY h ┌───────────────────h─┬─user_time─┬─bar───────────────────────────────────────────────┐ │ 2019-09-05 04:12:00 │ 0 │ │ │ 2019-09-05 04:13:00 │ 0 │ │ │ 2019-09-05 04:14:00 │ 524000 │ ▋ │ │ 2019-09-05 04:15:00 │ 15880000 │ █████████████████████▏ │ │ 2019-09-05 04:19:00 │ 36724000 │ ████████████████████████████████████████████████▊ │ │ 2019-09-05 04:20:00 │ 17508000 │ ███████████████████████▎ │ │ 2019-09-05 04:21:00 │ 0 │ │ │ 2019-09-05 04:22:00 │ 0 │ │ │ 2019-09-05 04:23:00 │ 0 │ │ │ 2019-09-05 04:24:00 │ 0 │ │ │ 2019-09-05 04:25:00 │ 0 │ │ │ 2019-09-05 04:26:00 │ 0 │ │ │ 2019-09-05 04:27:00 │ 0 │ │ │ 2019-09-05 04:28:00 │ 0 │ │ │ 2019-09-05 04:29:00 │ 80000 │ │ │ 2019-09-05 04:30:00 │ 0 │ │ │ 2019-09-05 04:31:00 │ 0 │ │

Sampling Query Profiler

Records code locations where query execution was
in each execution thread, at each moment in time with specified frequency.

Если запрос тормозит — то каким местом?

— Where does a specific query spend time?

— Where do queries of a certain type spend time?

— Where do queries from a specific user spend time?

— Where do all queries on the cluster spend time?

Разработчик — Никита Лапков, ВШЭ; и другие.

Sampling Query Profiler

1. Set one or both of these settings:

SET query_profiler_cpu_time_period_ns = 1000000; SET query_profiler_real_time_period_ns = 1000000;

2. Execute queries.
Data is saved to table system.trace_log.

event_date: 2019-09-05 event_time: 2019-09-05 05:47:44 revision: 54425 timer_type: CPU thread_number: 149 query_id: b1d8e7f9-48d8-4cb3-a768-0a6683f6f061 trace: [140171472847748,61781958,110943821,117594728,117595220,115654933, 120321783,63251928,111161800,120329436,120331356,120308294,120313436,120319113, 120143313,115666412,120146905,111013972,118237176,111013972,117990912,111013972, 110986070,110986938,61896391,61897898,61887509,156206624,140171472807643]

Sampling Query Profiler

trace — array of addresses in machine code (stack trace);

Convert address to function name:
— demangle(addressToSymbol(trace[1]))
Convert address to file name and line number:
— addressToLine(trace[1])

* don't forget to install clickhouse-common-static-dbg

Example: top functions:

SELECT count(), demangle(addressToSymbol(trace[1] AS addr)) AS symbol FROM system.trace_log WHERE event_date = today() GROUP BY symbol ORDER BY count() DESC LIMIT 10

Sampling Query Profiler

Example: top functions:

┌─count()─┬─symbol──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ 517 │ void LZ4::(anonymous namespace)::decompressImpl<32ul, false>(char const*, char*, unsigned long) │ │ 480 │ void DB::deserializeBinarySSE2<4>(DB::PODArray<unsigned char, 4096ul, Allocator<false, false>, 15ul, 16ul>&, DB::PODArray<unsigned long, 4096ul, Allocator<false, false>, 15ul, 16ul>&, DB::ReadBuffer&, unsigned long) │ │ 457 │ DB::VolnitskyBase<true, true, DB::StringSearcher<true, true> >::search(unsigned char const*, unsigned long) const │ │ 270 │ read │ │ 163 │ void LZ4::(anonymous namespace)::decompressImpl<16ul, true>(char const*, char*, unsigned long) │ │ 130 │ void LZ4::(anonymous namespace)::decompressImpl<16ul, false>(char const*, char*, unsigned long) │ │ 58 │ CityHash_v1_0_2::CityHash128WithSeed(char const*, unsigned long, std::pair<unsigned long, unsigned long>) │ │ 44 │ void DB::deserializeBinarySSE2<2>(DB::PODArray<unsigned char, 4096ul, Allocator<false, false>, 15ul, 16ul>&, DB::PODArray<unsigned long, 4096ul, Allocator<false, false>, 15ul, 16ul>&, DB::ReadBuffer&, unsigned long) │ │ 37 │ void LZ4::(anonymous namespace)::decompressImpl<8ul, true>(char const*, char*, unsigned long) │ │ 32 │ memcpy │ └─────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Sampling Query Profiler

Example: top execution contexts for query:

SELECT count(), arrayStringConcat(arrayMap(x -> concat( demangle(addressToSymbol(x)), '\n ', addressToLine(x)), trace), '\n') AS sym FROM system.trace_log WHERE query_id = '1a1272b5-695a-4b17-966d-a1701b61b3eb' AND event_date = today() GROUP BY trace ORDER BY count() DESC LIMIT 10
count(): 154 sym: DB::VolnitskyBase<true, true, DB::StringSearcher<true, true> >::search(unsigned char const*, unsigned long) const /opt/milovidov/ClickHouse/build_gcc9/dbms/programs/clickhouse DB::MatchImpl<true, false>::vector_constant(DB::PODArray<unsigned char, 4096ul, Allocator<false, false>, 15ul, 16ul> const&, DB::PODArray<unsigned long, 4096ul, Allocator<false, false>, 15ul, 16ul> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, DB::PODArray<unsigned char, 4096ul, Allocator<false, false>, 15ul, 16ul>&) /opt/milovidov/ClickHouse/build_gcc9/dbms/programs/clickhouse DB::FunctionsStringSearch<DB::MatchImpl<true, false>, DB::NameLike>::executeImpl(DB::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long) /opt/milovidov/ClickHouse/build_gcc9/dbms/programs/clickhouse DB::PreparedFunctionImpl::execute(DB::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long, bool) /home/milovidov/ClickHouse/build_gcc9/../dbms/src/Functions/IFunction.cpp:464 DB::ExpressionAction::execute(DB::Block&, bool) const /usr/local/include/c++/9.1.0/bits/stl_vector.h:677 DB::ExpressionActions::execute(DB::Block&, bool) const /home/milovidov/ClickHouse/build_gcc9/../dbms/src/Interpreters/ExpressionActions.cpp:759 DB::FilterBlockInputStream::readImpl() /home/milovidov/ClickHouse/build_gcc9/../dbms/src/DataStreams/FilterBlockInputStream.cpp:84 DB::IBlockInputStream::read() /usr/local/include/c++/9.1.0/bits/stl_vector.h:108 DB::ExpressionBlockInputStream::readImpl() /home/milovidov/ClickHouse/build_gcc9/../dbms/src/DataStreams/ExpressionBlockInputStream.cpp:34 DB::IBlockInputStream::read() /usr/local/include/c++/9.1.0/bits/stl_vector.h:108 DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>::thread(std::shared_ptr<DB::ThreadGroupStatus>, unsigned long) /usr/local/include/c++/9.1.0/bits/atomic_base.h:419 ThreadFromGlobalPool::ThreadFromGlobalPool<void (DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>::*)(std::shared_ptr<DB::ThreadGroupStatus>, unsigned long), DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>*, std::shared_ptr<DB::ThreadGroupStatus>, unsigned long&>(void (DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>::*&&)(std::shared_ptr<DB::ThreadGroupStatus>, unsigned long), DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>*&&, std::shared_ptr<DB::ThreadGroupStatus>&&, unsigned long&)::{lambda()#1}::operator()() const /usr/local/include/c++/9.1.0/bits/shared_ptr_base.h:729 ThreadPoolImpl<std::thread>::worker(std::_List_iterator<std::thread>) /usr/local/include/c++/9.1.0/bits/atomic_base.h:551 execute_native_thread_routine /home/milovidov/ClickHouse/ci/workspace/gcc/gcc-build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:81 start_thread /lib/x86_64-linux-gnu/libpthread-2.27.so clone /build/glibc-OTsEL5/glibc-2.27/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Sampling Query Profiler

http://www.highload.ru/moscow/2019/abstracts/5835

Globs для File/URL/HDFS

SELECT * FROM file( 'logs*_{000..999}.tsv', TSV, 's String, x UInt8')

Множество файлов автоматически обрабатывается параллельно.

— not even merged yet.

Proper RPM Packages

CentOS, RedHat... ГосЛинукс.

Supported versions starting from CentOS 5.

sudo yum install yum-utils sudo rpm --import \ https://repo.yandex.ru/clickhouse/CLICKHOUSE-KEY.GPG sudo yum-config-manager --add-repo \ https://repo.yandex.ru/clickhouse/rpm/stable/x86_64 sudo yum install clickhouse-server clickhouse-client

More Features

Changing table settings on the fly

neighbor function

OS schedule priority

Table function VALUES

COLUMNS matcher

Live View

More Features

Cascade Materialized Views

Normal AND/OR logic with NULL

Normal join_use_nulls logic

Proper memory accounting for queries

DNS cache updates

Settings constraints

Row Level security

More Features

Mutations don't interfere with merges

Parallel loading of parts

GROUP BY optimization

Math functions speedup by Mikhail Parakhin

JSON functions speedup

Dictionary loading doesn't interfere with each other

Merge tables optimization

CREATE TABLE AS table_function()

Did you know...

— TTL for tables and individual columns;

— MySQL protocol;

— Adaptive granularity;

— Secondary indices;

— Text search functions;

— JSON functions;

— WITH ROLLUP / WITH CUBE;

— ASOF JOIN;

September 2019

— Support for tiered storage for fresh and old data;

— Dictionary creation and manipulation with DDL queries;

— Using Z-Order for indexing;

— Import and export data to S3;

— Parallel parsing of data formats;

— Table function input;

— VALUES parsing speedup with expressions;

October 2019

— Initial RBAC implementation;

— Initial Merge JOIN implementation;

Autumn-Winter 2019

— Non-initial RBAC implementation;

— Non-initial Merge JOIN implementation;

— Workload management;

2020

Development of Replicated tables:

— Wait-free ALTER ADD/DROP;
— ALTER RENAME COLUMN;
— polymorphic table parts;
— support for small inserts;
— quorum INSERT without linearizability;
— INSERT without available ZooKeeper;
— stop storing parts info in ZooKeeper;
— alternative to using ZooKeeper;
— VFS and old data storage in S3;
— proper JBOD support;
— elastic cluster;

2020

— ALTER MATERIALIZED VIEW;
— GROUP BY optimization considering table sortedness;
— time series processing functions;
— functions for statistical tests;
— DISTINCT and ORDER BY modifiers for all aggregate functions;
— GPU offloading prototype;
— recompression of old data in background;
— minimal transaction support for multiple inserts/reads;
— replicated databases;

2020, maybe we'll get lucky

UDF in C++ and Rust.

Window functions.

Support for some types of dependent subqueries.

.

.

Web site: https://clickhouse.com/

Maillist: [email protected]

YouTube: https://www.youtube.com/c/ClickHouseDB

Telegram chat: https://telegram.me/clickhouse_ru, clickhouse_en

GitHub: https://github.com/ClickHouse/ClickHouse/

Twitter: https://twitter.com/ClickHouseDB

Google groups: https://groups.google.com/forum/#!forum/clickhouse