1. (50 min) What's new in ClickHouse 22.12.
2. (10 min) Q&A.
ClickHouse Christmas release.
β π 17 new features
β βΈοΈ 8 performance optimizations
β π 39 bug fixes
Examples:
SELECT 1_000_000;
SELECT 12.345_67;
SELECT 0xABCD_EF01;
SELECT 0b1111_0000_1010_0101;
SELECT 1_50_000, 1_00_00_000; -- 1.5 lakh, 1 crore
Developers: jh0x, unbyte.
FROM github_events
SELECT created_at,
'https://github.com/' || repo_name AS url,
title
WHERE title LIKE '%π%'
ORDER BY created_at DESC
LIMIT 1 BY title
LIMIT 100
Why? — Better autocomplete in editors.
Developer: Nikolay Degterinsky.
SELECT
county, town, district, street,
median(price), count()
FROM uk_price_paid
WHERE toYear(date) = 2022
GROUP BY ALL
ORDER BY count() DESC
LIMIT 10
Developer: TaoFengLiu.
SELECT
county, town, district, street,
median(price), count()
FROM uk_price_paid
WHERE toYear(date) = 2022
GROUP BY 1, 2, 3, 4
ORDER BY count() DESC
LIMIT 10
SELECT
(county, town, district, street) AS k,
median(price), count()
FROM uk_price_paid
WHERE toYear(date) = 2022
GROUP BY k
ORDER BY count() DESC
LIMIT 10
SELECT concatWithSeparator(' - ', 'Hello', 'world', 'goodbye.') AS x
ββxβββββββββββββββββββββββββ
β Hello - world - goodbye. β
ββββββββββββββββββββββββββββ
Alias: concat_ws.
What for? — compatibility with Spark.
Developer: TaiYang Li.
Bonus: concatWithSeparatorAssumeInjective.
SELECT
1::Decimal(10, 5) AS a,
3::Decimal(10, 5) AS b,
a / b AS x,
divideDecimal(a, b, 10) AS y
ββaββ¬βbββ¬βββββββxββ¬ββββββββββββyββ
β 1 β 3 β 0.33333 β 0.3333333333 β
βββββ΄ββββ΄ββββββββββ΄βββββββββββββββ
Developer: Andrey Zvonov.
CREATE FUNCTION toBFloat16 AS (x) ->
reinterpretAsFloat32(
bitAnd(reinterpretAsUInt32(x), 0xFFFF0000));
CREATE TABLE feature_store
(
time DateTime DEFAULT now(),
vec Array(Float32) CODEC(ZSTD) TTL time + INTERVAL 1 DAY,
vec16 Array(Float32)
DEFAULT arrayMap(x -> toBFloat16(x), vec) CODEC(ZSTD),
text String
) ENGINE = MergeTree ORDER BY time;
Developer: Antonio Andelic.
:) INSERT INTO feature_store (vec) VALUES ([0.1, 0.2, 0.3])
Ok.
:) SELECT vec, vec16 FROM feature_store
ββvecββββββββββββ¬βvec16ββββββββββββββββββββββββββββββββ
β [0.1,0.2,0.3] β [0.099609375,0.19921875,0.29882812] β
βββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββββ
:) SELECT L2Distance(vec, vec16) FROM feature_store
ββL2Distance(vec, vec16)ββ
β 0.0014615965 β
ββββββββββββββββββββββββββ
To introspect part moves between volumes and disks.
DESCRIBE TABLE system.moves
ββnameββββββββββββββ¬βtypeβββββ
β database β String β
β table β String β
β elapsed β Float64 β
β target_disk_name β String β
β target_disk_path β String β
β part_name β String β
β part_size β UInt64 β
β thread_id β UInt64 β
ββββββββββββββββββββ΄ββββββββββ
Developer: Sergei Trifonov.
Monitor ClickHouse Kepeer with your favorite tools!
Probably the last missing feature in comparison to ZooKeeper.
$ cat /etc/clickhouse-keeper/config.d/prometheus.yaml
prometheus:
port: 9369
endpoint: /metrics
Developer: Antonio Andelic.
Bonus: write performance and stability on a very high request rate.
Prevent or limit users to change some settings:
CREATE TABLE ... ENGINE = MergeTree
ORDER BY ...
SETTINGS storage_policy = 'local', ...
$ cat /etc/clickhouse-server/users.d/table_constraints.yaml
profiles:
default:
constraints:
merge_tree_storage_policy:
const:
merge_tree_parts_to_throw_insert:
max: 1000
Developer: Sergei Trifonov.
$ cat /etc/clickhouse-server/config.d/rules.yaml
password_complexity:
- rule:
pattern: '.{12}'
message: 'be at least 12 characters long'
- rule:
pattern: '\p{N}'
message: contain at least 1 numeric character
- rule:
pattern: '\p{Lu}'
message: contain at least 1 uppercase character
- rule:
pattern: '[^\p{L}\p{N}]'
message: contain at least 1 special character
Developer: Nikolay Degterinsky.
:) CREATE USER vasyan
IDENTIFIED WITH sha256_password BY 'qwerty123'
DB::Exception: Invalid password. The password should:
be at least 12 characters long,
contain at least 1 uppercase character,
contain at least 1 special character.
Developer: Nikolay Degterinsky.
Note: if clickhouse-client is being used,
the password will be checked and hashed on client side.
The server will never receive the plaintext password.
CREATE TABLE test AS mysql(
'monty:3306', maria, table, 'videnius', 'qwerty123');
2022.12.15 07:51:10.997810 [ 2282939 ] {ea24d544-3e40-4f2a-9f0e-2e3a35fc63c8}
<Debug> executeQuery: (from [::ffff:127.0.0.1]:47320)
CREATE TABLE test AS mysql('monty:3306', maria, table,
'videnius', '[HIDDEN]') (stage: Complete)
:) SHOW CREATE TABLE test
... AS mysql('monty:3306', maria, table, 'videnius', '[HIDDEN]');
:) SELECT query FROM system.query_log
WHERE query LIKE 'CREATE TABLE test%'
CREATE TABLE test AS mysql(
'monty:3306', maria, table, 'videnius', '[HIDDEN]')
Developer: Vitaly Baranov.
$ clickhouse-client --time --progress --query "
SELECT * FROM hits_1m INTO OUTFILE 'hits.jsonl'"
0.801
$ clickhouse-client --time --progress --query "
SELECT * FROM hits_1m INTO OUTFILE 'hits.bson'"
$ clickhouse-client --time --progress --query "
SELECT * FROM hits_1m FORMAT BSONEachRow > hits.bson"
0.564
$ wc -c hits.jsonl hits.bson
2358439536 hits.jsonl
778316087 hits.bson
Developer: Pavel Kruglov, Mark Polokhov.
$ time clickhouse-local --progress --query "
SELECT * FROM table FORMAT Null" < hits.jsonl
real 0m4,404s
user 0m9,831s
sys 0m1,057s
$ time clickhouse-local --progress --query "
SELECT * FROM table FORMAT Null" < hits.bson
real 0m1,471s
user 0m4,226s
sys 0m0,697s
File extension detection. Schema inference. — Fully supported!
Developer: Pavel Kruglov, Mark Polokhov.
SET join_algorithm = 'grace_hash';
Developer: Sergei Skvortsov.
A test: self-join for graph traversal
SELECT UserID,
h1.Referer, h1.URL, h2.URL
FROM hits AS h1 INNER JOIN hits AS h2
ON (h1.UserID = h2.UserID) AND (h1.URL = h2.Referer)
WHERE (h1.URL != '') AND (h2.URL != '')
AND (h1.Referer != '') AND (h2.Referer != '')
ORDER BY UserID LIMIT 10
A test: self-join for graph traversal, 100 million log of web traffic:
SET join_algorithm = 'hash';
-- 26 sec, 26.56 GiB RAM
SET join_algorithm = 'parallel_hash';
-- 10 sec, 28.79 GiB RAM
SET max_memory_usage = '10G', join_algorithm = 'partial_merge';
-- 1216.914 sec
SET max_memory_usage = '10G', join_algorithm = 'grace_hash',
grace_hash_join_initial_buckets = 128;
-- 173 sec, 6.85 GiB.
+ penetration testing, bug bounty program, audit reports...
— free 30-day trial up to 1 TB of data;
— affordable clusters for developers with full HA < $100/month
Try it! https://clickhouse.cloud/.