ClickHouse: Release 24.3 Webinar

ClickHouse Release
24.3 LTS

Release 24.3 Webinar

1. (55 min) What's new in ClickHouse 24.3.

2. (5 min) Q&A.

Release 24.3 LTS

ClickHouse Spring Release.

β€” 12 new features 🌸

β€” 18 performance optimizations 🐦

β€” 60 bug fixes πŸ¦‹

Small And Nice Features

system.keywords

:) SELECT * FROM system.keywords β”Œβ”€keyword──────────────────┐ 1. β”‚ ADD COLUMN β”‚ 2. β”‚ ADD CONSTRAINT β”‚ 3. β”‚ ADD INDEX β”‚ 4. β”‚ ADD PROJECTION β”‚ 5. β”‚ ADD STATISTIC β”‚

How many keywords are in total?

Developer: Nikita Mikhailov.

ATTACH PARTITION ALL

Detach all partitions. The table becomes empty:

:) ALTER TABLE hits DETACH PARTITION ALL

Attach all partitions from the detached directory:

:) ALTER TABLE hits ATTACH PARTITION ALL

No demo. The feature is trivial. It works as expected :)

Developer: Kirill Nikiforov.

ATTACH PARTITION from a different disk

Allows to ATTACH (or MOVE, REPLACE) PARTITION (or PART)
from another disk, including external filesystems (such as S3).

Similar to MOVE PARTITION TO DISK/VOLUME, but works between tables.

Demo.

Developer: Unalian.

Size-Capped Memory Tables

CREATE TABLE test ... ENGINE = Memory SETTINGS max_bytes_to_keep = '1G'; -- max_rows_to_keep, min_bytes_to_keep, min_rows_to_keep

max_bytes_to_keep, max_rows_to_keep
— thresholds to trigger the cleanup of old data;

min_bytes_to_keep, min_rows_to_keep
— if specified, the cleanup don't delete blocks below this threshold;

Demo

Developer: Jake Bamrah.

BACKUP in tar format

:) BACKUP TABLE test TO File('backup1.tar'); -- File, Disk, S3, and AzureBlobStorage work: :) BACKUP TABLE test TO S3( 'https://mybucket.s3.amazonaws.com/backups/1.tar', ...);

The other supported formats are .zip and a directory.

Note: when doing a backup on S3, using a .zip/.tar container
makes little sense compared to a bunch of files.

Developer: Josh Hildred.

S3 Express One Zone Support

+ low latency;
+ higher RPS;
+ two times less GET/PUT price;
higher storage cost (7β¨―);
less availability;
restricted API;
requires new libraries;

Use cases:
— fast shared storage inside a single AZ;
— a cache in front of S3 standard tier;

AWS blog: ClickHouse Cloud & Amazon S3 Express One Zone:
Making a blazing fast analytical database even faster

Developer: Nikita Taranov.

Usability Improvements

Enum And Strings

Before 24.3:

Code: 386. DB::Exception: There is no supertype for types Enum8('Hello' = 1, 'World' = 2), String because some of them are String/FixedString and some of them are not: While processing [str, enum]. (NO_COMMON_TYPE)

Now Enums and Strings are cast to a common type if needed.

:) SELECT [str, enum], -- Array(String) cond ? str : enum; -- String

Developer: Alexey Milovidov.

Row Numbers in Pretty Formats

β”Œβ”€β”€β”€β”€β”€price─┬─addr1───────────┬─addr2────────────────────────────┬─street────────────────────────────┬─town────────────────┐ 1. β”‚ 594300000 β”‚ 55 β”‚ UNIT 53 β”‚ BAKER STREET β”‚ LONDON β”‚ 2. β”‚ 569200000 β”‚ 2 β”‚ β”‚ STANHOPE ROW β”‚ LONDON β”‚ 3. β”‚ 523000000 β”‚ ONE THAMES β”‚ PHASE 1 β”‚ NINE ELMS LANE β”‚ LONDON β”‚ 4. β”‚ 448500000 β”‚ 22 β”‚ β”‚ SUMNER STREET β”‚ LONDON β”‚ 5. β”‚ 448300979 β”‚ UNIT 8 β”‚ β”‚ HAWICK CRESCENT INDUSTRIAL ESTATE β”‚ NEWCASTLE UPON TYNE β”‚ 6. β”‚ 421364142 β”‚ BEAGLE HOUSE, 1 β”‚ β”‚ BRAHAM STREET β”‚ LONDON β”‚ 7. β”‚ 415000000 β”‚ 37 β”‚ β”‚ DIGBETH β”‚ WALSALL β”‚ 8. β”‚ 415000000 β”‚ 37 - 39 β”‚ β”‚ DIGBETH β”‚ WALSALL β”‚ 9. β”‚ 411500000 β”‚ 30 β”‚ FIRST, SEVENTH AND EIGHTH FLOORS β”‚ GRESHAM STREET β”‚ LONDON β”‚ 10. β”‚ 400000000 β”‚ THE O2, 1 β”‚ INTERCONTINENTAL LONDON β”‚ WATERVIEW DRIVE β”‚ LONDON β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ 10 rows in set. Elapsed: 0.052 sec. Processed 27.91 million rows, 849.14 MB (537.12 million rows/s., 16.34 GB/s.)

Controlled by the output_format_pretty_row_numbers setting.
This is an old feature. What's new?

Developers: Alexey Milovidov.

Numbers Become Nicer

β”Œβ”€β”€β”€β”€β”€price─┬─addr1───────────┬─addr2────────────────────────────┬─street────────────────────────────┬─town────────────────┐ 1. β”‚ 594300000 β”‚ 55 β”‚ UNIT 53 β”‚ BAKER STREET β”‚ LONDON β”‚ 2. β”‚ 569200000 β”‚ 2 β”‚ β”‚ STANHOPE ROW β”‚ LONDON β”‚ 3. β”‚ 523000000 β”‚ ONE THAMES β”‚ PHASE 1 β”‚ NINE ELMS LANE β”‚ LONDON β”‚ 4. β”‚ 448500000 β”‚ 22 β”‚ β”‚ SUMNER STREET β”‚ LONDON β”‚ 5. β”‚ 448300979 β”‚ UNIT 8 β”‚ β”‚ HAWICK CRESCENT INDUSTRIAL ESTATE β”‚ NEWCASTLE UPON TYNE β”‚ 6. β”‚ 421364142 β”‚ BEAGLE HOUSE, 1 β”‚ β”‚ BRAHAM STREET β”‚ LONDON β”‚ 7. β”‚ 415000000 β”‚ 37 β”‚ β”‚ DIGBETH β”‚ WALSALL β”‚ 8. β”‚ 415000000 β”‚ 37 - 39 β”‚ β”‚ DIGBETH β”‚ WALSALL β”‚ 9. β”‚ 411500000 β”‚ 30 β”‚ FIRST, SEVENTH AND EIGHTH FLOORS β”‚ GRESHAM STREET β”‚ LONDON β”‚ 10. β”‚ 400000000 β”‚ THE O2, 1 β”‚ INTERCONTINENTAL LONDON β”‚ WATERVIEW DRIVE β”‚ LONDON β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ 10 rows in set. Elapsed: 0.052 sec. Processed 27.91 million rows, 849.14 MB (537.12 million rows/s., 16.34 GB/s.)

Controlled by the output_format_pretty_highlight_digit_groups setting.

Developers: Alexey Milovidov.

Default Formats In the CLI

Demo

Developer: Alexey Milovidov.

ClickHouse Local: All File Formats

Demo

Developer: Pavel Kruglov.

MergeTree is the default

CREATE TABLE test ( x UInt64, s String, PRIMARY KEY(x) );

Wow, it works.

It is controlled by the default_table_engine setting,
which is MergeTree by default.

Performance Improvements

Memory Usage For Primary Key

The primary key often contains more columns that are needed for indexing.

For example, PRIMARY KEY (user_id, timestamp, url)

— here, timestamp is a high-cardinality column,
  and typically changes every granule, so there is no use from url.

Since 24.3 we have an option to not load useless columns of the primary key in memory.

This is controlled by a new setting of MergeTree tables:
primary_key_ratio_of_unique_prefix_values_to_skip_suffix_columns
(0.9 by default)

Developer: Alexey Milovidov.

Faster Serialization of UInt256

How faster?

SELECT count() FROM system.numbers_mt WHERE NOT ignore(toString(toUInt256(rand64())))

Developer: Raul Marin.

Something
Interesting

Analyzer is by default!

Analyzer — is a new infrastructure for query analysis
and optimization in ClickHouse.

It provides better compatibility and feature completeness
and enables complex query optimizations.

24.2: allow_experimental_analyzer = 0, the feature is experimental.

24.3: allow_experimental_analyzer = 1, the feature is beta.

Developers: Maksim Kita, Nikolai Kochetov, Dmitriy Novik, Vladimir Cherkasov,
Igor Nikonov, Yakov Olkhovskiy, ...

Analyzer

WITH example AS ( SELECT '2021-01-01' AS date, 1 AS node, 1 AS user) SELECT extra_data FROM ( SELECT join1.* FROM example LEFT JOIN ( SELECT '2021-01-01' AS date, 1 AS extra_data) AS join1 ON example.date = join1.date LEFT JOIN ( SELECT '2021-01-01' AS date) AS join2 ON example.date = join2.date)

Was: Missing columns: 'extra_data' while processing query...

Now: — works perfectly.

Analyzer

SELECT * FROM (SELECT SUM(COALESCE(SEQ_VONR_MO_CALL_CONN_FAIL_TIMES_C, 0)) AS VONR_MO_CALL_CONN_FAIL_TIMES, MT.`102520001` AS `102520001`, MT.`181361814368` AS `181361814368`, MT.`102520102` AS `102520102`, MT.`102520101` AS `102520101`, MT.`102520104` AS `102520104`, MT.`183111861371` AS `183111861371`, MT.`102530101` AS `102530101`, MT.`102540101` AS `102540101`, MT.`102520103` AS `102520103`, MT.`102510101` AS `102510101` FROM ( SELECT COALESCE(SUM(VONR_MO_CALL_CONN_FAIL_TIMES), 0) AS SEQ_VONR_MO_CALL_CONN_FAIL_TIMES_C, COM_FAIL_CAUSE AS `102520001`, NULL AS `102520102`, COM_FAIL_CAUSE AS `102510101`, NULL AS `102520101`, D183111570684_H101.`183111861371` AS `183111861371`, NULL AS `102520104`, NULL AS `102520103`, H_COMPREHENSIVE_FAILURE_CAUSE.`102540101` AS `102540101`, H_COMPREHENSIVE_FAILURE_CAUSE.`102530101` AS `102530101`, concat('14', '-', '255', '-', '255', '-', SIP_RELEASE_CODE) AS `181361814368` FROM TEST_DATABASE.SDR_TEST LEFT JOIN ( SELECT concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID) AS `102510101`, UNIFIED_CAUSE_ID AS `183111861371`, concat(FAILCAUSE, '(', PD, ')') AS NAME_102510101 FROM TEST_DATABASE.DIM_TEST GROUP BY concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID), UNIFIED_CAUSE_ID, concat(FAILCAUSE, '(', PD, ')') ) AS D183111570684_H101 ON COM_FAIL_CAUSE = D183111570684_H101.`102510101` LEFT JOIN ( SELECT concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID) AS `102520001`, SCENE_ID AS `102540101`, CLASS_ID AS `102530101`, SCENE_NAME AS NAME_102540101 FROM TEST_DATABASE.DIM_TEST GROUP BY concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID), SCENE_ID, CLASS_ID, SCENE_NAME ) AS H_COMPREHENSIVE_FAILURE_CAUSE ON COM_FAIL_CAUSE = H_COMPREHENSIVE_FAILURE_CAUSE.`102520001` LEFT JOIN ( SELECT concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID) AS `181361814368`, CAUSE AS NAME_181361814368 FROM TEST_DATABASE.DIM_TEST GROUP BY concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID), CAUSE ) AS DIM_FAILCAUSE_ALL_181361814368 ON concat('14', '-', '255', '-', '255', '-', SIP_RELEASE_CODE) = DIM_FAILCAUSE_ALL_181361814368.`181361814368` WHERE (H_COMPREHENSIVE_FAILURE_CAUSE.NAME_102540101 IS NOT NULL) AND (D183111570684_H101.NAME_102510101 IS NOT NULL) AND (DIM_FAILCAUSE_ALL_181361814368.NAME_181361814368 IS NOT NULL) GROUP BY `102520001`, `102520102`, `102510101`, `102520101`, D183111570684_H101.`183111861371`, `102520104`, `102520103`, H_COMPREHENSIVE_FAILURE_CAUSE.`102540101`, H_COMPREHENSIVE_FAILURE_CAUSE.`102530101`, `181361814368` ) AS MT GROUP BY MT.`102520001`, MT.`181361814368`, MT.`102520102`, MT.`102520101`, MT.`102520104`, MT.`183111861371`, MT.`102530101`, MT.`102540101`, MT.`102520103`, MT.`102510101` ) AS ST WHERE ST.VONR_MO_CALL_CONN_FAIL_TIMES > 0 ORDER BY VONR_MO_CALL_CONN_FAIL_TIMES DESC LIMIT 0, 5000

Analyzer

Why? — Because ClickHouse's SQL support is too good!

Aliases in any place of a query:
SELECT (a + 1 AS b) * 2 AS c FROM table ORDER BY b + a

Nested and complex data structures, ARRAY JOIN:
SELECT a.b.c[1].d.2 FROM table ARRAY JOIN arr AS a

Arrays and higher order functions:
SELECT arrayMap((x, y) -> x + arrayFilter(z -> z > 0, y), arr1, arr2)

Column matchers and transformers:
SELECT t.* APPLY toString, COLUMNS('^metric_'),
  * EXCEPT(hello) REPLACE(x + 1 AS x)

CTE, alias columns, parameterized views, distributed joins...

Analyzer

It brings consistency and completeness of the features.

Multiple ARRAY JOIN in a single query:

SELECT number, nested_array, nested_array_element FROM numbers(1) ARRAY JOIN [[1,2,3], [4, 5]] AS nested_array ARRAY JOIN nested_array AS nested_array_element;

Aliases for lambda functions:

WITH x -> x * 2 AS f SELECT f(3)

Tuple elements like columns:

WITH (1, 'Hello') AS t SELECT format('{}: {}', t.*)

Analyzer

It brings consistency and completeness of the features.

SAMPLE can be specified for any table expression in JOIN:

SELECT t1.id, t2.id FROM test_table_join_1 AS t1 SAMPLE 1/2 INNER JOIN test_table_join_2 AS t2 SAMPLE 1/2 ON t1.id = t2.id;

FINAL by now can be specified for any table expression in JOIN:

SELECT t1.id, t2.id, t1.value, t2.value FROM test_table_join_1 AS t1 FINAL INNER JOIN test_table_join_2 AS t2 FINAL ON t1.id = t2.id;

Analyzer

It is also faster!

For big queries with a lot of JOINs there are
performance improvements up to 5 times:

Analyzer

Since version 24.3: beta, enabled by default.

Passing all required functional and integrational tests.

It can be disabled by the setting allow_experimental_analyzer = 0.

24.4 .. 24.5: plan to promote it to production
and remove the old query analysis implementation.

As of version 24.3, analyzer does not support the following experimental features:
window views, annoy and usearch indices, hypothesis constraints.

Integrations

Integrations

ClickPipes in ClickHouse Cloud now supports streaming import from S3!
— available for testing under a feature flag, ask the support.

The same ClickPipes now supports Avro format for Kafka.

Improvements for dbt, Metabase, Tableau, Grafana.

Reading Corner

https://clickhouse.com/blog/

— One Trillion Rows Challenge;
— chDB Joins The ClickHouse Family;
— How to build an LLM chatbot with ClickHouse;
— Schema as a code with Atlas;

Video Recordings: https://www.youtube.com/c/ClickHouseDB

Q&A