ClickHouse: Release 25.3 Call

ClickHouse Release 25.3 LTS

Release 25.3 Call

1. (45 min) What's new in ClickHouse 25.3 LTS.

2. (10 min) Guest Talk.

3. (5 min) Q&A.

Release 25.3

ClickHouse LTS Release.

โ€” 18 new features ๐ŸŒท

โ€” 13 performance optimizations โ˜€๏ธ

โ€” 48 bug fixes โ˜˜๏ธ

A Few Small Functions

estimateCompressionRatio

An aggregate function that compresses a stream of records, counts bytes,
and ignores the result, giving the compression ratio.

-- default codec: :) SELECT estimateCompressionRatio(URL) FROM hits -- another codec: :) SELECT estimateCompressionRatio('ZSTD(3)')(URL) FROM hits; :) SELECT estimateCompressionRatio('Delta, ZSTD(3)')(EventTime) FROM hits; -- a codec and block size: :) SELECT estimateCompressionRatio('ZSTD(3)', 65536)(URL) FROM hits;

Demo

Developer: Tariq Almawash.

arraySymmetricDifference

:) WITH [1, 2, 3] AS a, [2, 3, 4] AS b SELECT arrayUnion(a, b) AS u, arrayIntersect(a, b) AS i, arraySymmetricDifference(a, b) AS diff, arrayJaccardIndex(a, b) AS similarity, arrayFilter(x -> (NOT has(i, x)), u) AS another_diff โ”Œโ”€uโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€iโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€diffโ”€โ”€โ”ฌโ”€similarityโ”€โ”ฌโ”€another_diffโ”€โ” โ”‚ [3,2,1,4] โ”‚ [2,3] โ”‚ [1,4] โ”‚ 0.5 โ”‚ [1,4] โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Developer: Filipp Abapolov.

keccak256

A modern and secure cryptographic hash function.

:) SELECT hex(keccak256('ClickHouse')) โ”Œโ”€hex(keccak256('ClickHouse'))โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ E50BCD489117263EFF31D66D4AC3862BC98C2A34795834D56ECB237DAF88DC12 โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

This is the Etherium variant, and it is different than SHA-3,
while using the same algorithm family.

See also: BLAKE3, which is about 5..10 times faster and similarly safe.

Developer: Arnaud Briche.

Performance Improvements

Query Condition Cache

With the Query Condition Cache, ClickHouse will remember,
which ranges of granules in data parts satisfy the condition in WHERE,
and reuse this information as an ephemeral index for subsequent queries.

SET use_query_condition_cache = 1;

Demo.

Developer: ZhongYuanKai.

Userspace Page Cache

A different way of caching on top of remote filesystems:

Filesystem Cache (disk with type = cache) Userspace Page Cache (new!)
— Writes data on the local filesystem.
— Takes disk space (but can be configured on tmpfs)
— Survives server restart.
— Both on-disk and in-memory (OS page cache).
— Does not show up in the server's memory usage.
— Present only in memory.
— Does not depend on the filesystem.
— Does not survive server restart.
— Shows up in the server's memory usage.
— Good for disk-less servers!

Developer: Michael Kolupaev.

Userspace Page Cache

Configure it on the server:

$ cat config.d/page_cache.yaml page_cache_max_size: 100G

It will use up to the specified amount of memory,
but will evict memory whenever required for other server needs.

Enable its usage on the query-level:

SET use_page_cache_for_disks_without_file_cache = 1;

Demo

Developer: Michael Kolupaev.

Active Eviction Of Mark Cache

In previous versions, the mark cache will grow up to the maximum size
over time, collecting data that is no longer needed.

In 25.3, entries for old data parts are removed proactively.

Developer: Alexey Milovidov.

Something Interesting

Automatic Parallelization For External Data

Processing a bunch of external files on the current machine:

:) SELECT uniq(text) FROM s3('s3://my_bucket/data/*');

Processing a bunch of external files on a distributed cluster:

:) SELECT uniq(text) FROM s3Cluster(default, 's3://my_bucket/data/*');

Now the distributed processing is set up automatically:

SET parallel_replicas_for_cluster_engines = 1, -- enabled by default cluster_for_parallel_replicas = 'default', enable_parallel_replicas = 1, max_parallel_replicas = 100;

Developer: Konstantin Bogdanov.

SSH Interface

ClickHouse server supports many interfaces:
— native;
— HTTP;
— GRPC;
— MySQL compatibility;
— PostgreSQL compatibility;
— ZooKeeper;

Now it also implements the SSH protocol,
so any SSH client can connect to it directly.

Demo

Developer: George Gamezardashvili, Nikita Mikhailov.

JSON Is Production Ready

The new data types, Variant, Dynamic, and JSON
are production-ready in 25.3!

Developer: Pavel Kruglov.

JSON Data Type

CREATE TABLE test (time DateTime, data JSON) ORDER BY time;

How it works:

— Analyzes the JSON and infers data types for every path.

— Stores every path and every distinct type as a subcolumn.

— Uses efficient representations for sparse subcolumns.

It enables fast column-oriented storage and queries
on arbitrary semistructured data!

Developer: Pavel Kruglov.

AWS Glue Catalog

A database engine to represent Iceberg tables:

CREATE DATABASE demo_catalog ENGINE = DataLakeCatalog SETTINGS catalog_type = 'glue', region = 'us-west-2', aws_access_key_id = 'AKIA...', aws_secret_access_key = '...'; SHOW TABLES FROM demo_catalog; SELECT * FROM "demo_catalog"."db.table";

Demo

Developer: Alexander Sapin.

Unity Catalog

It can represent Delta Lake tables and Iceberg tables in the "uniform" mode:

CREATE DATABASE unity_demo ENGINE = DataLakeCatalog( 'https://endpoint.cloud.databricks.com/api/2.1/unity-catalog') SETTINGS catalog_type = 'unity', warehouse = 'workspace', catalog_credential = '...' SHOW TABLES FROM unity_demo; SELECT * FROM "unity_demo"."db.table";

Demo

Developer: Alexander Sapin.

Guest Talk

Stanley Chukwuemeke, SA at AWS

Every Data Lake Catalog

In ClickHouse:

— REST (24.12)

— Glue (25.3)

— Unity (25.3)

— Hive metastore;

Bonus

Integrations

Support for server-side parameters in the Rust driver,
support for the Variant data type, and for reading arbitrary formats.

Email notifications for Postgres connector in ClickPipes.

Metabase: fixes for WITH inside a CTE.

ODBC driver: resolved tech debt.

Updates for Node.JS and Go drivers.

Meetups

— ๐Ÿ‡ฎ๐Ÿ‡ณ Delhi, March 22
— ๐Ÿ‡ญ๐Ÿ‡บ Budapest, March 25
— ๐Ÿ‡บ๐Ÿ‡ธ Boston, March 25
— ๐Ÿ‡ง๐Ÿ‡ท Sรฃo Paulo, March 25
— ๐Ÿ‡ฎ๐Ÿ‡ฑ Tel Aviv, March 26
— ๐Ÿ‡บ๐Ÿ‡ธ New York, March 26
— ๐Ÿ‡บ๐Ÿ‡ธ Washington DC, March 27
— ๐Ÿ‡ฆ๐Ÿ‡บ Sydney, March 31
— ๐Ÿ‡ณ๐Ÿ‡ด Oslo, April 8
— ๐Ÿ‡ฌ๐Ÿ‡ง London, May 14

I will be in person on some of these meetups :)

Reading Corner ๐Ÿ“–

https://clickhouse.com/blog/

— A Linux Kernel debugging story ๐Ÿ‘ป
— Optimizing Queries For Bluesky ๐Ÿฆ‹
— How we built BYOC
— Solarwinds on BYOC
— How Poolside uses ClickHouse
— Astronomer also uses ClickHouse
— Postgres data modeling tips
— Input Format Matchup
— Agent-Facing Analytics

Q&A