1. (30 min) What's new in ClickHouse 21.12.
2. (30 min) Q&A.
Monthly stable release for December.
Christmas π Release.
1. clickhouse-keeper — 100% compatible protocol with ZooKeeper.
— compressed logs and snapshots;
— no issues with zxid overflow;
— no issues with large packets;
— better memory usage;
— no issues with GC and Java heap;
2. Can run embedded into clickhouse-server.
— no need for a separate process;
You can also use it with your Kafkas and Hadoops.
21.12 — four letter commands added.
Example:
echo 'stat' | nc localhost 9181
Full list of commands:
conf, cons, crst, envi, ruok, srst, srvr, stat, wchc, wchs, dirs, mntr, isro.
Developer: JackyWoo.
Development started in Sep 2020.
Now clickhouse-keeper is feature complete!
Passed 1000s hours of continuous testing
with stress tests, Jepsen tests, integration tests...
Tested in production in Yandex Cloud and by our best friends.
It is still considered pre-production.
We need your feedback.
We already support table CONSTRAINTs:
CREATE TABLE
(
URL String,
Domain String,
CONSTRAINT validate
CHECK isValidUTF8(URL)
AND length(URL) BETWEEN 10 AND 10000,
CONSTRAINT my_constraint
CHECK Domain = domainWithoutWWW(URL)
) ...
Now they are automatically optimizing your queries!
Developer: Nikita Vasilev.
Controlled by settings:
— optimize_using_constraints;
— optimize_substitute_columns;
— optimize_append_index;
— convert_query_to_cnf;
Conditions are simplified:
WHERE domainWithoutWWW(URL) = 'ghe.clickhouse.tech'
Rewritten to:
WHERE Domain = 'ghe.clickhouse.tech'
Bonus π°: assumptions
CONSTRAINT my_constraint
ASSUME Domain = domainWithoutWWW(URL)
Bonus π§Έ: hypotheses
INDEX my_index (a < b) TYPE hypothesis GRANULARITY 1
SELECT column FROM url(
'https://example.com/data.parquet', Parquet, '...')
Before 21.12:
— whole file is read;
— whole file is buffered;
21.12 and newer:
— only requested columns are read;
— it's using O(1) memory;
Works for url, s3, hdfs.
May increase the number of requests.
Developer: Kseniia Sumarokova.
CREATE TABLE t (is_good Bool) ...
Motivation:
1. Compatibility with Postgres and other DBMS.
2. Easy import and export in text format with true/false.
See also the settings: bool_true_representation, bool_false_representation.
In next releases:
— allow to recognize on/off, enable/disable, yes/no, 1/0 on data import;
CREATE TABLE hits_files (...)
ENGINE = File(TabSeparated)
PARTITION BY toYYYYMM(EventDate)
INSERT INTO TABLE FUNCTION
file('path/hits_{_partition_id}', 'TSV', 'columns...')
PARTITION BY toYYYYMM(EventDate)
VALUES ...
Writes multiple files; performs multiple requests.
Works for file, s3, hdfs, url.
Developer: Kseniia Sumarokova, initial work by Vladimir Chebotarev.
INSERT INTO my_table FROM INFILE '*.csv.gz' FORMAT CSV
Supports *, ?, {1..10}, {001..100} patterns.
Reading and processing is parallelized automatically π.
Compression is autodetected.
Developer: Arthur Filatenkov.
We already have ORDER BY ... WITH FILL.
Example, simple ORDER BY:
:) SELECT EventDate, count()
FROM test.hits WHERE CounterID = 2841673 GROUP BY EventDate
ORDER BY EventDate
βββEventDateββ¬βcount()ββ
β 2014-03-17 β 3 β
β 2014-03-19 β 6 β
β 2014-03-21 β 7 β
β 2014-03-22 β 6 β
ββββββββββββββ΄ββββββββββ
We already have ORDER BY ... WITH FILL.
Example, ORDER BY WITH FILL:
:) SELECT EventDate, count()
FROM test.hits WHERE CounterID = 2841673 GROUP BY EventDate
ORDER BY EventDate WITH FILL STEP 1
βββEventDateββ¬βcount()ββ
β 2014-03-17 β 3 β
β 2014-03-18 β 0 β
β 2014-03-19 β 6 β
β 2014-03-20 β 0 β
β 2014-03-21 β 7 β
β 2014-03-22 β 6 β
ββββββββββββββ΄ββββββββββ
We already have ORDER BY ... WITH FILL.
Example, ORDER BY WITH FILL:
:) SELECT EventDate, count()
FROM test.hits WHERE CounterID = 2841673 GROUP BY EventDate
ORDER BY EventDate WITH FILL
FROM '2014-03-15'::Date TO '2014-03-22'::Date
βββEventDateββ¬βcount()ββ
β 2014-03-15 β 0 β
β 2014-03-16 β 0 β
β 2014-03-17 β 3 β
β 2014-03-18 β 0 β
β 2014-03-19 β 6 β
β 2014-03-20 β 0 β
β 2014-03-21 β 7 β
β 2014-03-22 β 6 β
ββββββββββββββ΄ββββββββββ
We already have ORDER BY ... WITH FILL.
Since version 21.12: INTERVAL operator for STEP
ORDER BY EventDate WITH FILL STEP INTERVAL 1 MONTH
Developer: Anton Popov.
We already have parameterized queries:
SELECT uniq(user_id) FROM table
WHERE website = {name:String}
For safe parameter substitutions:
curl https://clickhouse-server:8443/?param_name=upyachka -d '
SELECT uniq(user_id) FROM table WHERE website = {name:String}'
We already have parameterized queries.
Since version 21.12: substitutions for table and database names.
SELECT * FROM {name:Identifier}
CREATE TABLE {name:Identifier} ...
Works for SELECT, INSERT and every type of DDL queries.
Developer: Nikolai Degterinskiy.
Bonus π.
Did you know that you can create
custom HTTP handlers with prepared queries?
See config.xml, near "Uncomment to use custom http handlers".
This is an experimental feature ποΈ
Streaming queries over windows.
CREATE WINDOW VIEW wv TO dst
WATERMARK = INTERVAL '2' SECOND
ALLOWED_LATENESS = INTERVAL '2' SECOND
AS SELECT
count(a) AS count,
tumbleEnd(wid) AS w_end
FROM mt
GROUP BY tumble(timestamp, INTERVAL '5' SECOND) AS wid;
Ready for testing in 21.12.
Developer: Vxider, Kseniia Sumarokova.
Roadmap 2022 is published:
https://github.com/ClickHouse/ClickHouse/issues/32513
— open for discussions.
... all great stuff is discussed there, don't miss.
"ded moroz" features π
Special features prepared for New Year celebration:
— service discovery in a cluster;
— full continuous integration support for AArch64;
— type inference for text formats;
... more to come, keep in touch!
Find these features by "ded moroz" label on GitHub.
Artwork by Vladimir Zarubin (1925-1996)
Artwork by Vladimir Zarubin (1925-1996)