ClickHouse: Release 23.10 Webinar

ClickHouse
Release 23.10

Release 23.10 Webinar

1. (50 min) What's new in ClickHouse 23.10.

2. (10 min) Q&A.

Release 23.10

ClickHouse Autumn Release.

β€” 23 new features 🌳

β€” 26 performance optimizations 🚴

β€” 60 bug fixes β˜‚οΈ

Small And Nice Features

SHOW MERGES

:) SHOW MERGES Row 1: ────── table: checks database: default estimate_complete: 0.01 elapsed: 0.11 progress: 97.11 is_mutation: 0 size_compressed: 2.14 MiB memory_usage: 8.86 MiB

SHOW PROCESSLIST
SHOW DATABASES
SHOW TABLES
SHOW ENGINES
SHOW FUNCTIONS

SHOW MERGES

Almost the same as
SELECT * FROM system.merges.

Developer: Megao.

SHOW SETTING

SELECT value FROM system.settings WHERE name = 'max_block_size'

SELECT getSetting('max_block_size')

SHOW SETTING max_block_size

Video: https://youtu.be/tNzdrw1L1Xk

Developer: Maksim Kita.

byteSwap

:) SELECT byteSwap(1234567890) 3523384905

:) SELECT hex(byteSwap(0xAABBCCDD)) DDCCBBAA

Developer: Priyansh Agrawal.

largestTriangleThreeBuckets

Or lttb.

Downsampling time series–like data while
retaining the overall shape and variability in the data.

lttb(n)(x, y)

Demo

Developer: Sinan.

arrayRandomSample

:) SELECT arrayRandomSample(range(1000), 10) β”Œβ”€arrayRandomSample(range(1000), 10)────────┐ β”‚ [639,357,338,999,920,851,696,537,602,235] β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ :) SELECT arrayRandomSample(range(1000), 10) β”Œβ”€arrayRandomSample(range(1000), 10)──────┐ β”‚ [791,995,45,622,97,205,704,417,749,362] β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ :) SELECT arrayRandomSample(range(1000), 10) β”Œβ”€arrayRandomSample(range(1000), 10)──────┐ β”‚ [572,805,521,71,539,703,950,434,120,49] β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Developer: Itay Israelov.

jsonMergePatch

:) SELECT '{"dbms": "ClickHouse"}' AS a, '{"is_it_good": true}' AS b, jsonMergePatch(a, b) AS c, '{"dbms": "MongoDB"}' AS d, jsonMergePatch(c, d)

Demo

Developer: Memo.

formatQuery, formatQuerySingleLine

SELECT * FROM (SELECT SUM(COALESCE(SEQ_VONR_MO_CALL_CONN_FAIL_TIMES_C, 0)) AS VONR_MO_CALL_CONN_FAIL_TIMES, MT.`102520001` AS `102520001`, MT.`181361814368` AS `181361814368`, MT.`102520102` AS `102520102`, MT.`102520101` AS `102520101`, MT.`102520104` AS `102520104`, MT.`183111861371` AS `183111861371`, MT.`102530101` AS `102530101`, MT.`102540101` AS `102540101`, MT.`102520103` AS `102520103`, MT.`102510101` AS `102510101` FROM ( SELECT COALESCE(SUM(VONR_MO_CALL_CONN_FAIL_TIMES), 0) AS SEQ_VONR_MO_CALL_CONN_FAIL_TIMES_C, COM_FAIL_CAUSE AS `102520001`, NULL AS `102520102`, COM_FAIL_CAUSE AS `102510101`, NULL AS `102520101`, D183111570684_H101.`183111861371` AS `183111861371`, NULL AS `102520104`, NULL AS `102520103`, H_COMPREHENSIVE_FAILURE_CAUSE.`102540101` AS `102540101`, H_COMPREHENSIVE_FAILURE_CAUSE.`102530101` AS `102530101`, concat('14', '-', '255', '-', '255', '-', SIP_RELEASE_CODE) AS `181361814368` FROM TEST_DATABASE.SDR_TEST LEFT JOIN ( SELECT concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID) AS `102510101`, UNIFIED_CAUSE_ID AS `183111861371`, concat(FAILCAUSE, '(', PD, ')') AS NAME_102510101 FROM TEST_DATABASE.DIM_TEST GROUP BY concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID), UNIFIED_CAUSE_ID, concat(FAILCAUSE, '(', PD, ')') ) AS D183111570684_H101 ON COM_FAIL_CAUSE = D183111570684_H101.`102510101` LEFT JOIN ( SELECT concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID) AS `102520001`, SCENE_ID AS `102540101`, CLASS_ID AS `102530101`, SCENE_NAME AS NAME_102540101 FROM TEST_DATABASE.DIM_TEST GROUP BY concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID), SCENE_ID, CLASS_ID, SCENE_NAME ) AS H_COMPREHENSIVE_FAILURE_CAUSE ON COM_FAIL_CAUSE = H_COMPREHENSIVE_FAILURE_CAUSE.`102520001` LEFT JOIN ( SELECT concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID) AS `181361814368`, CAUSE AS NAME_181361814368 FROM TEST_DATABASE.DIM_TEST GROUP BY concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID), CAUSE ) AS DIM_FAILCAUSE_ALL_181361814368 ON concat('14', '-', '255', '-', '255', '-', SIP_RELEASE_CODE) = DIM_FAILCAUSE_ALL_181361814368.`181361814368` WHERE (H_COMPREHENSIVE_FAILURE_CAUSE.NAME_102540101 IS NOT NULL) AND (D183111570684_H101.NAME_102510101 IS NOT NULL) AND (DIM_FAILCAUSE_ALL_181361814368.NAME_181361814368 IS NOT NULL) GROUP BY `102520001`, `102520102`, `102510101`, `102520101`, D183111570684_H101.`183111861371`, `102520104`, `102520103`, H_COMPREHENSIVE_FAILURE_CAUSE.`102540101`, H_COMPREHENSIVE_FAILURE_CAUSE.`102530101`, `181361814368` ) AS MT GROUP BY MT.`102520001`, MT.`181361814368`, MT.`102520102`, MT.`102520101`, MT.`102520104`, MT.`183111861371`, MT.`102530101`, MT.`102540101`, MT.`102520103`, MT.`102510101` ) AS ST WHERE ST.VONR_MO_CALL_CONN_FAIL_TIMES > 0 ORDER BY VONR_MO_CALL_CONN_FAIL_TIMES DESC LIMIT 0, 5000

Developer: Salvatore Mesoraca.

ArgMin, ArgMax as Combinators

Apply aggregate function to the set of values
where another value is the maximum in a group.

Demo.

Developer: Amos Bird.

Parameterized ALTER With Partitions

ALTER TABLE {name:Identifier} ATTACH PARTITION {x:String}

Developer: Nikolai Degterinskiy.

Untuple With Better Names

:) SELECT untuple((1, 'Hello')) β”Œβ”€tupleElement((1, 'Hello'), '1')─┬─tupleElement((1, 'Hello'), '2')─┐ β”‚ 1 β”‚ Hello β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ :) SELECT untuple((1, 'Hello')) AS t β”Œβ”€t.1─┬─t.2───┐ β”‚ 1 β”‚ Hello β”‚ β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜

Developer: garcher22.

Enforcing Projections

:) SET force_optimize_projection_name = 'my_projection';

Developer: Yarik Briukhovetskyi.

Allowing No Primary Key

:) SET create_table_empty_primary_key_by_default = 1, default_table_engine = 'MergeTree'; :) CREATE TABLE test (x UInt8, y String);

Developer: Srikanth Chekuri.

Performance Improvements

Optimization For Many Concurrent Queries

(clickhouse-benchmark run on a query log from a production service)

23.9:

QPS: 201.059 50.000% 0.997 sec. 99.990% 75.998 sec.

23.10:

QPS: 961.997 50.000% 0.033 sec. 99.990% 4.153 sec.

Developers: Maksim Kita.
More: https://www.tinybird.co/blog-posts/clickhouse-lock-contention

Optimization For External Aggregation

When a large number of temporary files are created on disk.

SELECT hash, count() FROM test_table GROUP BY (id % 150000) as hash SETTINGS max_bytes_before_external_group_by = '50M'

23.9:

Elapsed: 38.492 sec. (7.76 million rows/s., 108.42 MB/s.)

23.10:

Elapsed: 4.643 sec. (64.09 million rows/s., 894.11 MB/s.)

Developers: Maksim Kita.

Optimization For Merges

When a table is narrow:

CREATE TABLE test_table (id UInt64, value UInt64) ENGINE = MergeTree ORDER BY id; INSERT INTO test_table SELECT number % 15, number FROM numbers_mt(500000000); OPTIMIZE TABLE test_table

23.9: Elapsed: 19.328 sec.

23.10: Elapsed: 12.578 sec.

Developer: Maksim Kita.

Optimization For map['element']

By predicting that the subsequent maps
in a block have a similar structure.

SELECT map['hello'] FROM test SETTINGS max_threads = 1;

23.9:

Elapsed: 0.460 sec. (43.44 million rows/s., 3.52 GB/s.)

23.10:

Elapsed: 0.317 sec. (63.19 million rows/s., 5.12 GB/s.)

Developer: lgbo-ustc.

Better Pipeline For Short Queries

By carefully selecting the number of streams for processing.

Developer: Jiebin Sun.

Condition Pushdown For ORC

Using data skipping indices in ORC, similarly to Parquet.

Demo.

Developer: Taiyang Li.

Faster Window Functions

SET query_plan_preserve_num_streams_after_window_functions = 1;

SELECT id, AVG(value) OVER (ORDER BY id ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS frame, sipHash64(frame) FROM window_test;

23.9: Elapsed: 3.07 sec.

23.10: Elapsed: 2.38 sec.

Developer: Frinkr.

Faster Skip Indices

By reusing the index data structures in memory for different granules.

SELECT count() FROM index_test WHERE x >= 20000 AND x <= 20100 AND y >= 10000 AND y <= 10100

23.9: Elapsed: 0.083 sec.

23.10: Elapsed: 0.058 sec.

Developer: Amos Bird.

Something Interesting

NumPy as Input Format

SELECT * FROM 'data.npy'

Demo

Developer: Yarik Briukhovetskyi.

arrayFold

Fold an array with a lambda function:

SELECT arrayFold((acc, value...) -> new_acc, initial_acc, arr...)

... but we already have array aggregation functions:

arraySum, arrayProduct, arrayMin/Max/All/Any, arrayStringConcat...

... and we already can apply any aggregate function to an array:

arrayReduce('stddev', arr), arrayReduce('corr', arr1, arr2)

But you can do much more with arrayFold!

Developer: Lirikl.

arrayFold

:) SELECT arrayFold((acc, value) -> (acc.2, acc.1 + value), range(number), (1::UInt64, 1::UInt64)).2 AS x FROM numbers(10) β”Œβ”€β”€x─┐ β”‚ 1 β”‚ β”‚ 1 β”‚ β”‚ 2 β”‚ β”‚ 3 β”‚ β”‚ 5 β”‚ β”‚ 7 β”‚ β”‚ 10 β”‚ β”‚ 13 β”‚ β”‚ 17 β”‚ β”‚ 21 β”‚ β””β”€β”€β”€β”€β”˜

Developer: Lirikl.

Bonus

Integrations

MySQL compatibility
for Tableau Online and QuickSight support:
— TO_DAYS;
— String arguments for add/subtract date/time and toDayOfWeek;
— information_schema: columns.extra, tables.table_rows, tables.data_length;

Java Driver:
— using Apache HTTP Client to improve performance
— Added proxy support
— Support secure connections with using Trust Store

Tableau Online via MySQL protocol: it works!

Updates for Python, Node.js, and Go clients.

Reading Corner

https://clickhouse.com/blog/

— Supercharging your large data loads;
— Forecasting Using ML Functions;
— Saving millions of dollars in Kubernetes;
GenAI for ClickHouse SQL queries;

Video Recordings: https://www.youtube.com/c/ClickHouseDB

Q&A