ClickHouse: Release 24.12 Webinar

ClickHouse Release 24.12

Release 24.12 Webinar

1. (40 min) What's new in ClickHouse 24.12.

2. (15 min) Guest Talk.

2. (5 min) Q&A.

Release 24.12

ClickHouse Christmas Release.

โ€” 16 new features ๐ŸŽ

โ€” 16 performance optimizations โ›ธ๏ธ

โ€” 36 bug fixes โ›„

Small Features ๐ŸŽ

Extended Table Aliases ๐Ÿฌ

Table aliases can redefine column names:

SELECT a, b FROM ( SELECT number, number * 2 FROM numbers(10) ) AS x(a, b); CREATE VIEW my_view(a, b) AS SELECT 1, 2; WITH t(a, b) AS (SELECT 1, 2) SELECT a, b FROM t;

Developer: Yarik Briukhovetskyi.

Function `translate` Can Delete Characters

:) SELECT translate('ClickHouse', 'HCc', 'm') -- change H to m and delete characters C and c: โ”Œโ”€resโ”€โ”€โ”€โ”€โ”€โ”€โ” 1. โ”‚ likhouse โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

And the translateUTF8 function is also supported.

Developer: Shuai Xu.

Setting Custom HTTP Headers ๐Ÿญ

:) CREATE OR REPLACE VIEW pictures AS SELECT '...' AS img; :) CREATE USER viewer SETTINGS http_response_headers = $${'Content-Type': 'image/svg+xml'}$$; :) GRANT SELECT ON pictures TO viewer;

Use-cases:
— control over client-side caching;
— content-disposition for downloadable files;
— controlling the content-type;

Demo

Developer: Alexey Milovidov.

Cache For Primary Keys ๐Ÿฏ

2024 — a whole lot of optimizations for primary keys:

24.2: Lazy loading of the primary key
24.2: Optimizing primary key's memory usage
24.3: Skip useless columns in the primary key
24.4: SYSTEM UNLOAD PRIMARY KEY

24.12: primary key for tables' parts can be managed by an LRU cache:

ALTER TABLE t MODIFY SETTING primary_key_lazy_load = 1, use_primary_key_cache = 1, prewarm_primary_key_cache = 1;

Developer: Anton Popov.

Cache For Primary Keys ๐Ÿฏ

What for?

— Tables with over quadrillion records on a cluster with shared storage.

Client Receives Settings From Server

Client-related settings, such as formatting and parsing options, can be specified on the server-side.

clickhouse-client will receive and apply these settings.

Demo

Automatic Spilling To Disk ๐Ÿฎ

ClickHouse supports using disk as a scratch storage when there is not enough memory to perform GROUP BY or ORDER BY.

This is available since 2015, controlled by the settings max_bytes_before_external_group_by, max_bytes_before_external_sort.

It is also configured by default in ClickHouse Cloud with static thresholds.

Now it can be automatic!

Demo

Developer: Azat Khuzhin.

Usability For Enums ๐Ÿซ

LIKE operator is supported for Enums:

:) CREATE TABLE test (x Enum('Hello', 'World')) ENGINE = Memory :) INSERT INTO test VALUES ('World') :) SELECT * FROM test WHERE x LIKE '%or%' โ”Œโ”€xโ”€โ”€โ”€โ”€โ”€โ” 1. โ”‚ World โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ -- 24.11: -- Code: 43. DB::Exception: Illegal type Enum8('Hello' = 1, 'World' = 2) of argument of function like: In scope SELECT * FROM test WHERE x LIKE '%or%'. (ILLEGAL_TYPE_OF_ARGUMENT)

Developers: ZhangLiStar.

Usability For Enums ๐Ÿซ

Equality and IN operators accept unknown values:

:) CREATE TABLE test (x Enum('Hello', 'World')) ENGINE = Memory :) INSERT INTO test VALUES ('World') :) SELECT * FROM test WHERE x IN ('Goodbye', 'World') โ”Œโ”€xโ”€โ”€โ”€โ”€โ”€โ” 1. โ”‚ World โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ -- 24.11: -- Code: 691. DB::Exception: Unknown element 'Goodbye' for enum: while converting 'Goodbye' to Enum8('Hello' = 1, 'World' = 2). (UNKNOWN_ELEMENT_OF_ENUM)

Developers: ZhangLiStar.

Experimental Features

Reverse Table Ordering โœจ

CREATE TABLE ... ENGINE = MergeTree ORDER BY (EventDate DESC, CounterID, UserID) SETTINGS allow_experimental_reverse_key = 1;

When most of your queries will use the reverse order.

Demo

Developer: Amos Bird.

Performance Improvements

Parallel Hash Join By Default ๐Ÿ›ท

Both sides of JOIN are parallelized, with no to minimal memory overhead.
It does not harm even for short queries, and we made it as the default.

TPC-H, SF-100, Q3: 53 sec -> 2.6 sec.

TPC-H, SF-100, Q7: 94 sec -> 21 sec.

Demo

Developer: Nikita Taranov.

Automatic JOIN Reordering ๐ŸŽฟ

Using estimations of table sizes.

Controlled by the setting query_plan_join_swap_table
and enabled by default.

Demo

Developer: Vladimir Cherkasov.

Optimization Of JOIN Expressions ๐ŸŽ†

SET optimize_extract_common_expressions = 1;

JOIN ... ON (a=b AND x) OR (a=b AND y) OR (a=b AND z) JOIN ... ON a=b AND (x OR y OR z)

Allows using less number of hash tables and better conditions push-down.

Demo

Developer: Jรกnos Benjamin Antal.

Non-Equi JOINs Supported By Default

-- Equi join SELECT t1.*, t2.* FROM t1 JOIN t2 ON t1.key = t2.key; -- Non-equi joins SELECT t1.*, t2.* FROM t1 JOIN t2 ON t1.key != t2.key; SELECT t1.*, t2.* FROM t1 JOIN t2 ON t1.key > t2.key;

Developer: Vladimir Cherkasov.

Faster Inserts Of LowCardinality Strings

Developer: Alexander Gololobov.

Faster argMin/argMax โ›ท๏ธ

argMax(c_arg, c_max)

— an aggregate function, that
  returns the value of c_arg when the value of c_max is maximal.

Up to 10% performance increase for primitive types, such as integers.

Developer: Alexander Sapin.

Something Interesting

Guest Talk

From the creators of Iceberg ๐ŸงŠ

Iceberg REST Catalog ๐ŸงŠ

CREATE DATABASE unity_demo ENGINE = Iceberg('https://dbc-55555555-5555.cloud.databricks.com/api/2.1/unity-catalog/iceberg') SETTINGS catalog_type = 'rest', catalog_credential = 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:...', warehouse = 'unity', oauth_server_uri = 'https://dbc-55555555-5555.cloud.databricks.com/oidc/v1/token', auth_scope = 'all-apis,sql'; SHOW TABLES FROM unity_demo; SELECT * unity_demo."webinar.test";

Compatible with Unity, Polaris.

Developer: Kseniia Sumarokova.

Iceberg REST Catalog ๐ŸงŠ

Compatible with Unity, Polaris.

Demo.

Developer: Kseniia Sumarokova.

Schema Evolution For Iceberg

The iceberg table function supports tables with:

— columns added or removed over time;

— renamed columns;

— data types changed between primitive types;

Developer: Dan Ivanik.

Bonus

Variant, Dynamic, And JSON Types ๐Ÿฐ

Are promoted from experimental to the beta stage.

We also backported all fixes to the previous release, 24.11.

Read the blog post about the architecture of the JSON data type:
https://clickhouse.com/blog/a-new-powerful-json-data-type-for-clickhouse.

Now we support ALTER from the deprecated Object type to JSON
to allow easy migrations.

Developer: Pavel Kruglov.

Variant, Dynamic, And JSON Types ๐Ÿฐ

Now we support subcolumns for table's primary key and indices:

CREATE TABLE log ( data JSON ) ORDER BY data.time;

Developer: Pavel Kruglov.

Integrations

Postgres CDC in ClickPipes, private preview:
https://clickhouse.com/cloud/clickpipes/postgres-cdc-connector

Improvements for dbt, Firetran, Kafka integrations
as well as for Python and JavaScript connectors.

Meetups

— ๐Ÿ‡ฏ๐Ÿ‡ต Tokyo, Jan 23rd
— ๐Ÿ‡ง๐Ÿ‡ช FOSDEM, Brussels, Feb 1, 2
— ๐Ÿ‡ฌ๐Ÿ‡ง London, Feb 5th
— ๐Ÿ‡ฆ๐Ÿ‡ช Dubai, Feb 10th

I will be in person on some of these meetups :)

Reading Corner ๐Ÿงถ

https://clickhouse.com/blog/

— Strategic partnership with AWS
— Medallion architecture with ClickHouse
— How we built product analytics on ClickHouse
— Query optimizations in ClickHouse
— From BigQuery to ClickHouse
— Our re:Invent announcements

Q&A