1. (55 min) What's new in ClickHouse 24.3.
2. (5 min) Q&A.
ClickHouse Spring Release.
β 12 new features πΈ
β 18 performance optimizations π¦
β 60 bug fixes π¦
:) SELECT * FROM system.keywords
ββkeywordβββββββββββββββββββ
1. β ADD COLUMN β
2. β ADD CONSTRAINT β
3. β ADD INDEX β
4. β ADD PROJECTION β
5. β ADD STATISTIC β
How many keywords are in total?
Developer: Nikita Mikhailov.
Detach all partitions. The table becomes empty:
:) ALTER TABLE hits DETACH PARTITION ALL
Attach all partitions from the detached directory:
:) ALTER TABLE hits ATTACH PARTITION ALL
No demo. The feature is trivial. It works as expected :)
Developer: Kirill Nikiforov.
Allows to ATTACH (or MOVE, REPLACE) PARTITION (or PART)
from another disk, including external filesystems (such as S3).
Similar to MOVE PARTITION TO DISK/VOLUME, but works between tables.
Demo.
Developer: Unalian.
CREATE TABLE test ... ENGINE = Memory
SETTINGS max_bytes_to_keep = '1G';
-- max_rows_to_keep, min_bytes_to_keep, min_rows_to_keep
max_bytes_to_keep, max_rows_to_keep
— thresholds to trigger the cleanup of old data;
min_bytes_to_keep, min_rows_to_keep
— if specified, the cleanup don't delete blocks below this threshold;
Demo
Developer: Jake Bamrah.
:) BACKUP TABLE test TO File('backup1.tar');
-- File, Disk, S3, and AzureBlobStorage work:
:) BACKUP TABLE test TO S3(
'https://mybucket.s3.amazonaws.com/backups/1.tar', ...);
The other supported formats are .zip and a directory.
Note: when doing a backup on S3, using a .zip/.tar container
makes little sense compared to a bunch of files.
Developer: Josh Hildred.
+ low latency;
+ higher RPS; + two times less GET/PUT price; |
− higher storage cost (7β¨―);
− less availability; − restricted API; − requires new libraries; |
Use cases:
— fast shared storage inside a single AZ;
— a cache in front of S3 standard tier;
AWS blog: ClickHouse Cloud & Amazon S3 Express One Zone:
Making a blazing fast analytical database even faster
Developer: Nikita Taranov.
Before 24.3:
Code: 386. DB::Exception: There is no supertype for types
Enum8('Hello' = 1, 'World' = 2), String because some of them are
String/FixedString and some of them are not:
While processing [str, enum]. (NO_COMMON_TYPE)
Now Enums and Strings are cast to a common type if needed.
:) SELECT [str, enum], -- Array(String)
cond ? str : enum; -- String
Developer: Alexey Milovidov.
ββββββpriceββ¬βaddr1ββββββββββββ¬βaddr2βββββββββββββββββββββββββββββ¬βstreetβββββββββββββββββββββββββββββ¬βtownβββββββββββββββββ
1. β 594300000 β 55 β UNIT 53 β BAKER STREET β LONDON β
2. β 569200000 β 2 β β STANHOPE ROW β LONDON β
3. β 523000000 β ONE THAMES β PHASE 1 β NINE ELMS LANE β LONDON β
4. β 448500000 β 22 β β SUMNER STREET β LONDON β
5. β 448300979 β UNIT 8 β β HAWICK CRESCENT INDUSTRIAL ESTATE β NEWCASTLE UPON TYNE β
6. β 421364142 β BEAGLE HOUSE, 1 β β BRAHAM STREET β LONDON β
7. β 415000000 β 37 β β DIGBETH β WALSALL β
8. β 415000000 β 37 - 39 β β DIGBETH β WALSALL β
9. β 411500000 β 30 β FIRST, SEVENTH AND EIGHTH FLOORS β GRESHAM STREET β LONDON β
10. β 400000000 β THE O2, 1 β INTERCONTINENTAL LONDON β WATERVIEW DRIVE β LONDON β
βββββββββββββ΄ββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββ
10 rows in set. Elapsed: 0.052 sec. Processed 27.91 million rows, 849.14 MB (537.12 million rows/s., 16.34 GB/s.)
Controlled by the output_format_pretty_row_numbers setting.
This is an old feature. What's new?
Developers: Alexey Milovidov.
ββββββpriceββ¬βaddr1ββββββββββββ¬βaddr2βββββββββββββββββββββββββββββ¬βstreetβββββββββββββββββββββββββββββ¬βtownβββββββββββββββββ
1. β 594300000 β 55 β UNIT 53 β BAKER STREET β LONDON β
2. β 569200000 β 2 β β STANHOPE ROW β LONDON β
3. β 523000000 β ONE THAMES β PHASE 1 β NINE ELMS LANE β LONDON β
4. β 448500000 β 22 β β SUMNER STREET β LONDON β
5. β 448300979 β UNIT 8 β β HAWICK CRESCENT INDUSTRIAL ESTATE β NEWCASTLE UPON TYNE β
6. β 421364142 β BEAGLE HOUSE, 1 β β BRAHAM STREET β LONDON β
7. β 415000000 β 37 β β DIGBETH β WALSALL β
8. β 415000000 β 37 - 39 β β DIGBETH β WALSALL β
9. β 411500000 β 30 β FIRST, SEVENTH AND EIGHTH FLOORS β GRESHAM STREET β LONDON β
10. β 400000000 β THE O2, 1 β INTERCONTINENTAL LONDON β WATERVIEW DRIVE β LONDON β
βββββββββββββ΄ββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββ
10 rows in set. Elapsed: 0.052 sec. Processed 27.91 million rows, 849.14 MB (537.12 million rows/s., 16.34 GB/s.)
Controlled by the output_format_pretty_highlight_digit_groups setting.
Developers: Alexey Milovidov.
Demo
Developer: Alexey Milovidov.
Demo
Developer: Pavel Kruglov.
CREATE TABLE test
(
x UInt64,
s String,
PRIMARY KEY(x)
);
Wow, it works.
It is controlled by the default_table_engine setting,
which is MergeTree by default.
The primary key often contains more columns that are needed for indexing.
For example, PRIMARY KEY (user_id, timestamp, url)
— here, timestamp is a high-cardinality column,
and typically changes every granule, so there is no use from url.
Since 24.3 we have an option to not load useless columns of the primary key in memory.
This is controlled by a new setting of MergeTree tables:
primary_key_ratio_of_unique_prefix_values_to_skip_suffix_columns
(0.9 by default)
Developer: Alexey Milovidov.
How faster?
SELECT count() FROM system.numbers_mt
WHERE NOT ignore(toString(toUInt256(rand64())))
Developer: Raul Marin.
Analyzer — is a new infrastructure for query analysis
and optimization in ClickHouse.
It provides better compatibility and feature completeness
and enables complex query optimizations.
24.2: allow_experimental_analyzer = 0, the feature is experimental.
24.3: allow_experimental_analyzer = 1, the feature is beta.
Developers: Maksim Kita, Nikolai Kochetov, Dmitriy Novik, Vladimir Cherkasov,
Igor Nikonov, Yakov Olkhovskiy, ...
WITH example AS (
SELECT '2021-01-01' AS date,
1 AS node, 1 AS user)
SELECT extra_data FROM (
SELECT join1.*
FROM example
LEFT JOIN (
SELECT '2021-01-01' AS date,
1 AS extra_data) AS join1
ON example.date = join1.date
LEFT JOIN (
SELECT '2021-01-01' AS date) AS join2
ON example.date = join2.date)
Was: Missing columns: 'extra_data' while processing query...
Now: — works perfectly.
SELECT * FROM (SELECT SUM(COALESCE(SEQ_VONR_MO_CALL_CONN_FAIL_TIMES_C, 0)) AS VONR_MO_CALL_CONN_FAIL_TIMES, MT.`102520001` AS `102520001`, MT.`181361814368` AS `181361814368`, MT.`102520102` AS `102520102`, MT.`102520101` AS `102520101`, MT.`102520104` AS `102520104`, MT.`183111861371` AS `183111861371`, MT.`102530101` AS `102530101`, MT.`102540101` AS `102540101`, MT.`102520103` AS `102520103`, MT.`102510101` AS `102510101` FROM ( SELECT COALESCE(SUM(VONR_MO_CALL_CONN_FAIL_TIMES), 0) AS SEQ_VONR_MO_CALL_CONN_FAIL_TIMES_C, COM_FAIL_CAUSE AS `102520001`, NULL AS `102520102`, COM_FAIL_CAUSE AS `102510101`, NULL AS `102520101`, D183111570684_H101.`183111861371` AS `183111861371`, NULL AS `102520104`, NULL AS `102520103`, H_COMPREHENSIVE_FAILURE_CAUSE.`102540101` AS `102540101`, H_COMPREHENSIVE_FAILURE_CAUSE.`102530101` AS `102530101`, concat('14', '-', '255', '-', '255', '-', SIP_RELEASE_CODE) AS `181361814368` FROM TEST_DATABASE.SDR_TEST LEFT JOIN ( SELECT concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID) AS `102510101`, UNIFIED_CAUSE_ID AS `183111861371`, concat(FAILCAUSE, '(', PD, ')') AS NAME_102510101 FROM TEST_DATABASE.DIM_TEST GROUP BY concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID), UNIFIED_CAUSE_ID, concat(FAILCAUSE, '(', PD, ')') ) AS D183111570684_H101 ON COM_FAIL_CAUSE = D183111570684_H101.`102510101` LEFT JOIN ( SELECT concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID) AS `102520001`, SCENE_ID AS `102540101`, CLASS_ID AS `102530101`, SCENE_NAME AS NAME_102540101 FROM TEST_DATABASE.DIM_TEST GROUP BY concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID), SCENE_ID, CLASS_ID, SCENE_NAME ) AS H_COMPREHENSIVE_FAILURE_CAUSE ON COM_FAIL_CAUSE = H_COMPREHENSIVE_FAILURE_CAUSE.`102520001` LEFT JOIN ( SELECT concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID) AS `181361814368`, CAUSE AS NAME_181361814368 FROM TEST_DATABASE.DIM_TEST GROUP BY concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID), CAUSE ) AS DIM_FAILCAUSE_ALL_181361814368 ON concat('14', '-', '255', '-', '255', '-', SIP_RELEASE_CODE) = DIM_FAILCAUSE_ALL_181361814368.`181361814368` WHERE (H_COMPREHENSIVE_FAILURE_CAUSE.NAME_102540101 IS NOT NULL) AND (D183111570684_H101.NAME_102510101 IS NOT NULL) AND (DIM_FAILCAUSE_ALL_181361814368.NAME_181361814368 IS NOT NULL) GROUP BY `102520001`, `102520102`, `102510101`, `102520101`, D183111570684_H101.`183111861371`, `102520104`, `102520103`, H_COMPREHENSIVE_FAILURE_CAUSE.`102540101`, H_COMPREHENSIVE_FAILURE_CAUSE.`102530101`, `181361814368` ) AS MT GROUP BY MT.`102520001`, MT.`181361814368`, MT.`102520102`, MT.`102520101`, MT.`102520104`, MT.`183111861371`, MT.`102530101`, MT.`102540101`, MT.`102520103`, MT.`102510101` ) AS ST WHERE ST.VONR_MO_CALL_CONN_FAIL_TIMES > 0 ORDER BY VONR_MO_CALL_CONN_FAIL_TIMES DESC LIMIT 0, 5000
Why? — Because ClickHouse's SQL support is too good!
Aliases in any place of a query:
SELECT (a + 1 AS b) * 2 AS c FROM table ORDER BY b + a
Nested and complex data structures, ARRAY JOIN:
SELECT a.b.c[1].d.2 FROM table ARRAY JOIN arr AS a
Arrays and higher order functions:
SELECT arrayMap((x, y) -> x + arrayFilter(z -> z > 0, y), arr1, arr2)
Column matchers and transformers:
SELECT t.* APPLY toString, COLUMNS('^metric_'),
* EXCEPT(hello) REPLACE(x + 1 AS x)
CTE, alias columns, parameterized views, distributed joins...
It brings consistency and completeness of the features.
Multiple ARRAY JOIN in a single query:
SELECT number, nested_array, nested_array_element FROM numbers(1)
ARRAY JOIN [[1,2,3], [4, 5]] AS nested_array
ARRAY JOIN nested_array AS nested_array_element;
Aliases for lambda functions:
WITH x -> x * 2 AS f SELECT f(3)
Tuple elements like columns:
WITH (1, 'Hello') AS t SELECT format('{}: {}', t.*)
It brings consistency and completeness of the features.
SAMPLE can be specified for any table expression in JOIN:
SELECT t1.id, t2.id
FROM test_table_join_1 AS t1 SAMPLE 1/2
INNER JOIN test_table_join_2 AS t2 SAMPLE 1/2 ON t1.id = t2.id;
FINAL by now can be specified for any table expression in JOIN:
SELECT t1.id, t2.id, t1.value, t2.value
FROM test_table_join_1 AS t1 FINAL
INNER JOIN test_table_join_2 AS t2 FINAL ON t1.id = t2.id;
It is also faster!
For big queries with a lot of JOINs there are
performance improvements up to 5 times:
Since version 24.3: beta, enabled by default.
Passing all required functional and integrational tests.
It can be disabled by the setting allow_experimental_analyzer = 0.
24.4 .. 24.5: plan to promote it to production
and remove the old query analysis implementation.
As of version 24.3, analyzer does not support the following experimental features:
window views, annoy and usearch indices, hypothesis constraints.
ClickPipes in ClickHouse Cloud now supports streaming import from S3!
— available for testing under a feature flag, ask the support.
The same ClickPipes now supports Avro format for Kafka.
Improvements for dbt, Metabase, Tableau, Grafana.
— One Trillion Rows Challenge;
— chDB Joins The ClickHouse Family;
— How to build an LLM chatbot with ClickHouse;
— Schema as a code with Atlas;
Video Recordings: https://www.youtube.com/c/ClickHouseDB