1. (50 min) What's new in ClickHouse 23.10.
2. (10 min) Q&A.
ClickHouse Autumn Release.
β 23 new features π³
β 26 performance optimizations π΄
β 60 bug fixes βοΈ
:) SHOW MERGES
Row 1:
ββββββ
table: checks
database: default
estimate_complete: 0.01
elapsed: 0.11
progress: 97.11
is_mutation: 0
size_compressed: 2.14 MiB
memory_usage: 8.86 MiB
SHOW PROCESSLIST
SHOW DATABASES
SHOW TABLES
SHOW ENGINES
SHOW FUNCTIONS
SHOW MERGES
Almost the same as
SELECT * FROM system.merges.
Developer: Megao.
SELECT value FROM system.settings WHERE name = 'max_block_size'
SELECT getSetting('max_block_size')
SHOW SETTING max_block_size
Video: https://youtu.be/tNzdrw1L1Xk
Developer: Maksim Kita.
:) SELECT byteSwap(1234567890)
3523384905
:) SELECT hex(byteSwap(0xAABBCCDD))
DDCCBBAA
Developer: Priyansh Agrawal.
Or lttb.
Downsampling time seriesβlike data while
retaining the overall shape and variability in the data.
lttb(n)(x, y)
Demo
Developer: Sinan.
:) SELECT arrayRandomSample(range(1000), 10)
ββarrayRandomSample(range(1000), 10)βββββββββ
β [639,357,338,999,920,851,696,537,602,235] β
βββββββββββββββββββββββββββββββββββββββββββββ
:) SELECT arrayRandomSample(range(1000), 10)
ββarrayRandomSample(range(1000), 10)βββββββ
β [791,995,45,622,97,205,704,417,749,362] β
βββββββββββββββββββββββββββββββββββββββββββ
:) SELECT arrayRandomSample(range(1000), 10)
ββarrayRandomSample(range(1000), 10)βββββββ
β [572,805,521,71,539,703,950,434,120,49] β
βββββββββββββββββββββββββββββββββββββββββββ
Developer: Itay Israelov.
:) SELECT
'{"dbms": "ClickHouse"}' AS a,
'{"is_it_good": true}' AS b,
jsonMergePatch(a, b) AS c,
'{"dbms": "MongoDB"}' AS d,
jsonMergePatch(c, d)
Demo
Developer: Memo.
SELECT * FROM (SELECT SUM(COALESCE(SEQ_VONR_MO_CALL_CONN_FAIL_TIMES_C, 0)) AS VONR_MO_CALL_CONN_FAIL_TIMES, MT.`102520001` AS `102520001`, MT.`181361814368` AS `181361814368`, MT.`102520102` AS `102520102`, MT.`102520101` AS `102520101`, MT.`102520104` AS `102520104`, MT.`183111861371` AS `183111861371`, MT.`102530101` AS `102530101`, MT.`102540101` AS `102540101`, MT.`102520103` AS `102520103`, MT.`102510101` AS `102510101` FROM ( SELECT COALESCE(SUM(VONR_MO_CALL_CONN_FAIL_TIMES), 0) AS SEQ_VONR_MO_CALL_CONN_FAIL_TIMES_C, COM_FAIL_CAUSE AS `102520001`, NULL AS `102520102`, COM_FAIL_CAUSE AS `102510101`, NULL AS `102520101`, D183111570684_H101.`183111861371` AS `183111861371`, NULL AS `102520104`, NULL AS `102520103`, H_COMPREHENSIVE_FAILURE_CAUSE.`102540101` AS `102540101`, H_COMPREHENSIVE_FAILURE_CAUSE.`102530101` AS `102530101`, concat('14', '-', '255', '-', '255', '-', SIP_RELEASE_CODE) AS `181361814368` FROM TEST_DATABASE.SDR_TEST LEFT JOIN ( SELECT concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID) AS `102510101`, UNIFIED_CAUSE_ID AS `183111861371`, concat(FAILCAUSE, '(', PD, ')') AS NAME_102510101 FROM TEST_DATABASE.DIM_TEST GROUP BY concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID), UNIFIED_CAUSE_ID, concat(FAILCAUSE, '(', PD, ')') ) AS D183111570684_H101 ON COM_FAIL_CAUSE = D183111570684_H101.`102510101` LEFT JOIN ( SELECT concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID) AS `102520001`, SCENE_ID AS `102540101`, CLASS_ID AS `102530101`, SCENE_NAME AS NAME_102540101 FROM TEST_DATABASE.DIM_TEST GROUP BY concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID), SCENE_ID, CLASS_ID, SCENE_NAME ) AS H_COMPREHENSIVE_FAILURE_CAUSE ON COM_FAIL_CAUSE = H_COMPREHENSIVE_FAILURE_CAUSE.`102520001` LEFT JOIN ( SELECT concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID) AS `181361814368`, CAUSE AS NAME_181361814368 FROM TEST_DATABASE.DIM_TEST GROUP BY concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID), CAUSE ) AS DIM_FAILCAUSE_ALL_181361814368 ON concat('14', '-', '255', '-', '255', '-', SIP_RELEASE_CODE) = DIM_FAILCAUSE_ALL_181361814368.`181361814368` WHERE (H_COMPREHENSIVE_FAILURE_CAUSE.NAME_102540101 IS NOT NULL) AND (D183111570684_H101.NAME_102510101 IS NOT NULL) AND (DIM_FAILCAUSE_ALL_181361814368.NAME_181361814368 IS NOT NULL) GROUP BY `102520001`, `102520102`, `102510101`, `102520101`, D183111570684_H101.`183111861371`, `102520104`, `102520103`, H_COMPREHENSIVE_FAILURE_CAUSE.`102540101`, H_COMPREHENSIVE_FAILURE_CAUSE.`102530101`, `181361814368` ) AS MT GROUP BY MT.`102520001`, MT.`181361814368`, MT.`102520102`, MT.`102520101`, MT.`102520104`, MT.`183111861371`, MT.`102530101`, MT.`102540101`, MT.`102520103`, MT.`102510101` ) AS ST WHERE ST.VONR_MO_CALL_CONN_FAIL_TIMES > 0 ORDER BY VONR_MO_CALL_CONN_FAIL_TIMES DESC LIMIT 0, 5000
Developer: Salvatore Mesoraca.
Apply aggregate function to the set of values
where another value is the maximum in a group.
Demo.
Developer: Amos Bird.
ALTER TABLE {name:Identifier}
ATTACH PARTITION {x:String}
Developer: Nikolai Degterinskiy.
:) SELECT untuple((1, 'Hello'))
ββtupleElement((1, 'Hello'), '1')ββ¬βtupleElement((1, 'Hello'), '2')ββ
β 1 β Hello β
βββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββ
:) SELECT untuple((1, 'Hello')) AS t
ββt.1ββ¬βt.2ββββ
β 1 β Hello β
βββββββ΄ββββββββ
Developer: garcher22.
:) SET force_optimize_projection_name = 'my_projection';
Developer: Yarik Briukhovetskyi.
:) SET create_table_empty_primary_key_by_default = 1,
default_table_engine = 'MergeTree';
:) CREATE TABLE test (x UInt8, y String);
Developer: Srikanth Chekuri.
(clickhouse-benchmark run on a query log from a production service)
23.9:
QPS: 201.059
50.000% 0.997 sec.
99.990% 75.998 sec.
23.10:
QPS: 961.997
50.000% 0.033 sec.
99.990% 4.153 sec.
Developers: Maksim Kita.
More: https://www.tinybird.co/blog-posts/clickhouse-lock-contention
When a large number of temporary files are created on disk.
SELECT hash, count() FROM test_table
GROUP BY (id % 150000) as hash
SETTINGS max_bytes_before_external_group_by = '50M'
23.9:
Elapsed: 38.492 sec. (7.76 million rows/s., 108.42 MB/s.)
23.10:
Elapsed: 4.643 sec. (64.09 million rows/s., 894.11 MB/s.)
Developers: Maksim Kita.
When a table is narrow:
CREATE TABLE test_table (id UInt64, value UInt64)
ENGINE = MergeTree ORDER BY id;
INSERT INTO test_table
SELECT number % 15, number FROM numbers_mt(500000000);
OPTIMIZE TABLE test_table
23.9: Elapsed: 19.328 sec.
23.10: Elapsed: 12.578 sec.
Developer: Maksim Kita.
By predicting that the subsequent maps
in a block have a similar structure.
SELECT map['hello'] FROM test SETTINGS max_threads = 1;
23.9:
Elapsed: 0.460 sec. (43.44 million rows/s., 3.52 GB/s.)
23.10:
Elapsed: 0.317 sec. (63.19 million rows/s., 5.12 GB/s.)
Developer: lgbo-ustc.
By carefully selecting the number of streams for processing.
Developer: Jiebin Sun.
Using data skipping indices in ORC, similarly to Parquet.
Demo.
Developer: Taiyang Li.
SET query_plan_preserve_num_streams_after_window_functions = 1;
SELECT id, AVG(value)
OVER (ORDER BY id ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS frame,
sipHash64(frame) FROM window_test;
23.9: Elapsed: 3.07 sec.
23.10: Elapsed: 2.38 sec.
Developer: Frinkr.
By reusing the index data structures in memory for different granules.
SELECT count() FROM index_test
WHERE x >= 20000 AND x <= 20100 AND y >= 10000 AND y <= 10100
23.9: Elapsed: 0.083 sec.
23.10: Elapsed: 0.058 sec.
Developer: Amos Bird.
SELECT * FROM 'data.npy'
Demo
Developer: Yarik Briukhovetskyi.
Fold an array with a lambda function:
SELECT arrayFold((acc, value...) -> new_acc, initial_acc, arr...)
... but we already have array aggregation functions:
arraySum, arrayProduct, arrayMin/Max/All/Any, arrayStringConcat...
... and we already can apply any aggregate function to an array:
arrayReduce('stddev', arr), arrayReduce('corr', arr1, arr2)
But you can do much more with arrayFold!
Developer: Lirikl.
:) SELECT arrayFold((acc, value) -> (acc.2, acc.1 + value),
range(number),
(1::UInt64, 1::UInt64)).2 AS x
FROM numbers(10)
βββxββ
β 1 β
β 1 β
β 2 β
β 3 β
β 5 β
β 7 β
β 10 β
β 13 β
β 17 β
β 21 β
ββββββ
Developer: Lirikl.
MySQL compatibility
for Tableau Online and QuickSight support:
— TO_DAYS;
— String arguments for add/subtract date/time and toDayOfWeek;
— information_schema: columns.extra, tables.table_rows, tables.data_length;
Java Driver:
— using Apache HTTP Client to improve performance
— Added proxy support
— Support secure connections with using Trust Store
Tableau Online via MySQL protocol: it works!
Updates for Python, Node.js, and Go clients.
— Supercharging your large data loads;
— Forecasting Using ML Functions;
— Saving millions of dollars in Kubernetes;
— GenAI for ClickHouse SQL queries;
Video Recordings: https://www.youtube.com/c/ClickHouseDB