ClickHouse: Release 21.12 Webinar

ClickHouse
Christmas Release 21.12

Release 21.12 Webinar

1. (30 min) What's new in ClickHouse 21.12.

2. (30 min) Q&A.

What's New in 21.12

Monthly stable release for December.

Christmas πŸŽ„ Release.

ClickHouse Keeper Update

1. clickhouse-keeper — 100% compatible protocol with ZooKeeper.

— compressed logs and snapshots;
— no issues with zxid overflow;
— no issues with large packets;
— better memory usage;
— no issues with GC and Java heap;

2. Can run embedded into clickhouse-server.

— no need for a separate process;

You can also use it with your Kafkas and Hadoops.

ClickHouse Keeper Update

21.12 — four letter commands added.

Example:

echo 'stat' | nc localhost 9181

Full list of commands:
conf, cons, crst, envi, ruok, srst, srvr, stat, wchc, wchs, dirs, mntr, isro.

Developer: JackyWoo.

ClickHouse Keeper Update

Development started in Sep 2020.

Now clickhouse-keeper is feature complete!

Passed 1000s hours of continuous testing
with stress tests, Jepsen tests, integration tests...

Tested in production in Yandex Cloud and by our best friends.

It is still considered pre-production.

We need your feedback.

Optimizations With CONSTRAINTs

We already support table CONSTRAINTs:

CREATE TABLE ( URL String, Domain String, CONSTRAINT validate CHECK isValidUTF8(URL) AND length(URL) BETWEEN 10 AND 10000, CONSTRAINT my_constraint CHECK Domain = domainWithoutWWW(URL) ) ...

Now they are automatically optimizing your queries!

Developer: Nikita Vasilev.

Optimizations With CONSTRAINTs

Controlled by settings:

— optimize_using_constraints;
— optimize_substitute_columns;
— optimize_append_index;
— convert_query_to_cnf;

Conditions are simplified: WHERE domainWithoutWWW(URL) = 'ghe.clickhouse.tech'

Rewritten to: WHERE Domain = 'ghe.clickhouse.tech'

Optimizations With CONSTRAINTs

Bonus 🍰: assumptions

CONSTRAINT my_constraint ASSUME Domain = domainWithoutWWW(URL)

Bonus 🧸: hypotheses

INDEX my_index (a < b) TYPE hypothesis GRANULARITY 1

Efficient Reads Of Arrow/Parquet/ORC

SELECT column FROM url( 'https://example.com/data.parquet', Parquet, '...')

Before 21.12:
whole file is read;
whole file is buffered;

21.12 and newer:
only requested columns are read;
it's using O(1) memory;

Works for url, s3, hdfs.
May increase the number of requests.

Developer: Kseniia Sumarokova.

Bool Data Type

CREATE TABLE t (is_good Bool) ...

Motivation:

1. Compatibility with Postgres and other DBMS.

2. Easy import and export in text format with true/false.

See also the settings: bool_true_representation, bool_false_representation.

In next releases:
— allow to recognize on/off, enable/disable, yes/no, 1/0 on data import;

Partitions For File-like Storages

CREATE TABLE hits_files (...) ENGINE = File(TabSeparated) PARTITION BY toYYYYMM(EventDate)

INSERT INTO TABLE FUNCTION file('path/hits_{_partition_id}', 'TSV', 'columns...') PARTITION BY toYYYYMM(EventDate) VALUES ...

Writes multiple files; performs multiple requests.

Works for file, s3, hdfs, url.

Developer: Kseniia Sumarokova, initial work by Vladimir Chebotarev.

Globs For INSERT FROM INFILE

INSERT INTO my_table FROM INFILE '*.csv.gz' FORMAT CSV

Supports *, ?, {1..10}, {001..100} patterns.

Reading and processing is parallelized automatically πŸš€.

Compression is autodetected.

Developer: Arthur Filatenkov.

INTERVAL For WITH FILL

We already have ORDER BY ... WITH FILL.

Example, simple ORDER BY:

:) SELECT EventDate, count() FROM test.hits WHERE CounterID = 2841673 GROUP BY EventDate ORDER BY EventDate β”Œβ”€β”€EventDate─┬─count()─┐ β”‚ 2014-03-17 β”‚ 3 β”‚ β”‚ 2014-03-19 β”‚ 6 β”‚ β”‚ 2014-03-21 β”‚ 7 β”‚ β”‚ 2014-03-22 β”‚ 6 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

INTERVAL For WITH FILL

We already have ORDER BY ... WITH FILL.

Example, ORDER BY WITH FILL:

:) SELECT EventDate, count() FROM test.hits WHERE CounterID = 2841673 GROUP BY EventDate ORDER BY EventDate WITH FILL STEP 1 β”Œβ”€β”€EventDate─┬─count()─┐ β”‚ 2014-03-17 β”‚ 3 β”‚ β”‚ 2014-03-18 β”‚ 0 β”‚ β”‚ 2014-03-19 β”‚ 6 β”‚ β”‚ 2014-03-20 β”‚ 0 β”‚ β”‚ 2014-03-21 β”‚ 7 β”‚ β”‚ 2014-03-22 β”‚ 6 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

INTERVAL For WITH FILL

We already have ORDER BY ... WITH FILL.

Example, ORDER BY WITH FILL:

:) SELECT EventDate, count() FROM test.hits WHERE CounterID = 2841673 GROUP BY EventDate ORDER BY EventDate WITH FILL FROM '2014-03-15'::Date TO '2014-03-22'::Date β”Œβ”€β”€EventDate─┬─count()─┐ β”‚ 2014-03-15 β”‚ 0 β”‚ β”‚ 2014-03-16 β”‚ 0 β”‚ β”‚ 2014-03-17 β”‚ 3 β”‚ β”‚ 2014-03-18 β”‚ 0 β”‚ β”‚ 2014-03-19 β”‚ 6 β”‚ β”‚ 2014-03-20 β”‚ 0 β”‚ β”‚ 2014-03-21 β”‚ 7 β”‚ β”‚ 2014-03-22 β”‚ 6 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

INTERVAL For WITH FILL

We already have ORDER BY ... WITH FILL.

Since version 21.12: INTERVAL operator for STEP

ORDER BY EventDate WITH FILL STEP INTERVAL 1 MONTH

Developer: Anton Popov.

Parameters For Identifiers

We already have parameterized queries:

SELECT uniq(user_id) FROM table WHERE website = {name:String}

For safe parameter substitutions:

curl https://clickhouse-server:8443/?param_name=upyachka -d ' SELECT uniq(user_id) FROM table WHERE website = {name:String}'

Parameters For Identifiers

We already have parameterized queries.

Since version 21.12: substitutions for table and database names.

SELECT * FROM {name:Identifier}

CREATE TABLE {name:Identifier} ...

Works for SELECT, INSERT and every type of DDL queries.

Developer: Nikolai Degterinskiy.

Parameters For Identifiers

Bonus 🎁.

Did you know that you can create
custom HTTP handlers with prepared queries?

See config.xml, near "Uncomment to use custom http handlers".

WINDOW VIEW

This is an experimental feature πŸ›οΈ

Streaming queries over windows.

CREATE WINDOW VIEW wv TO dst WATERMARK = INTERVAL '2' SECOND ALLOWED_LATENESS = INTERVAL '2' SECOND AS SELECT count(a) AS count, tumbleEnd(wid) AS w_end FROM mt GROUP BY tumble(timestamp, INTERVAL '5' SECOND) AS wid;

Ready for testing in 21.12.

Developer: Vxider, Kseniia Sumarokova.

What's Next

Roadmap 2022 is published:

https://github.com/ClickHouse/ClickHouse/issues/32513

— open for discussions.

... all great stuff is discussed there, don't miss.

What's Next

"ded moroz" features πŸŽ„

Special features prepared for New Year celebration:

service discovery in a cluster;

— full continuous integration support for AArch64;

type inference for text formats;

... more to come, keep in touch!
Find these features by "ded moroz" label on GitHub.

Artwork by Vladimir Zarubin (1925-1996)

Q&A

Artwork by Vladimir Zarubin (1925-1996)