ClickHouse: Release 25.4 Call

ClickHouse Release 25.4

ClickHouse "Spring" release 25.4

1. (45 min) What's new in ClickHouse 25.4.

2. (10 min) Guest Talk.

3. (5 min) Q&A.

Release 25.4

ClickHouse Release.

— 25 new features 🌸

— 23 performance optimizations 🦋

— 58 bug fixes 🐝

A Few Small Features

Default Compression Codec

For MergeTree tables.

CREATE TABLE t ( time DateTime CODEC(ZSTD(3)), -- codec on a column level user_id UInt64, -- this column will use the default codec ... ) ORDER BY time SETTINGS default_compression_codec = 'ZSTD(1)' -- codec on a table level

By the way, the default codec can be also specified globally in the server's configuration:

$ cat config.d/compression.yaml compression: case: min_part_size: 1000000000 # Optional condition method: 'zstd'

Developer: Gvoelfin.

Persistent Databases In clickhouse-local

# Use the specified paths for databases # instead of a temporary location: $ ch --path . # The database will remain available # for subsequent invocations.

Demo.

Developer: Alexey Milovidov.

Random Early Drop on Overload

Two new query settings: min_os_cpu_wait_time_ratio_to_throw
and max_os_cpu_wait_time_ratio_to_throw.

When the ratio between the number of threads waiting for CPU and the number of threads running on CPU is more than the min threshold, the server will start randomly refusing some queries. When it approaches the max, the server will refuse every query.

Implements a gradual degradation to prevent full DoS.

Demo.

Developers: Alexey Katsman, Alexey Milovidov.

Inline Credentials For Kafka

CREATE TABLE kafka_sasl (key UInt32, value String) ENGINE = Kafka SETTINGS kafka_broker_list = 'kafka_sasl:19092', kafka_security_protocol = 'sasl_plaintext', kafka_sasl_mechanism = 'SCRAM-SHA-256', kafka_sasl_username = 'Vasya', kafka_sasl_password = 'qwerty', kafka_topic_list = 'topic', kafka_group_name = 'group', kafka_format = 'JSONEachRow';

No longer necessary to edit configuration files for every new Kafka endpoint.

Developer: Christoph Wurm.

Bind Host In the Cluster Configuration

$ cat config.d/cluster.yaml remote_servers: default: shard: replica: host: '10.0.0.2' port: 9000 bind_host: '10.0.0.1'

Useful when a server has multiple network interfaces and distributed connections should use one.

Developer: Todd Yocum.

Performance Improvements

Lazy Columns

Instead of reading data for a column, track the information about what data should be read. Then read the data only when needed.

The column values can be carried around, filtered, but not used in calculations before the latest stages of the query pipeline.

Demo.

Developer: Xiaozhe Yu.

Merging Filters For Optimizing JOINs

I couldn't believe it didn't work before...

Demo.

Developer: Dmitry Novik.

Dynamic Sharding For JOINs

If you JOIN two MergeTree tables by their primary keys (or their prefixes),
it will create independent lanes in the query pipeline dedicated to the ranges of the primary keys, so the work is paralellized better.

Demo.

Developer: Nikolai Kochetov.

Something Interesting

Correlated Subqueries

— are subqueries that depend on the columns from the outer scope.

SELECT * FROM users AS u1 WHERE EXISTS ( SELECT * FROM users2 AS u2 WHERE u1.age = u2.age)

They can appear in many different contexts: EXISTS, IN, scalar...

Now we support correlated subqueries
inside the WHERE clause with the EXISTS operator!

Demo.

Developer: Dmitry Novik.

CPU Workload Scheduler

CREATE RESOURCE cpu (MASTER THREAD, WORKER THREAD); CREATE WORKLOAD all; CREATE WORKLOAD admin IN all SETTINGS max_concurrent_threads = 10; CREATE WORKLOAD production IN all SETTINGS max_concurrent_threads = 100; CREATE WORKLOAD analytics IN production SETTINGS max_concurrent_threads = 60, weight = 9; CREATE WORKLOAD ingestion IN production; SET workload = 'analytics';

CREATE RESOURCE worker_cpu (WORKER THREAD); CREATE RESOURCE master_cpu (MASTER THREAD); CREATE WORKLOAD all SETTINGS max_concurrent_threads = 100 FOR worker_cpu, max_concurrent_threads = 1000 FOR master_cpu;

Developer: Sergei Trifonov.

Time Travel For Iceberg Tables

Allows querying tables as of a certain time in the past.

Demo

Developer: Brett Hoerner, Dan Ivanik.

Metadata Cache For Iceberg

Avoid reading the same metadata and manifest files.

Demo

Developer: Han Fei.

Partition Pruning For Delta Lake

To enable partition pruning, switch to the new implementation:

SELECT * FROM deltaLake('s3://unitycatalogdemobucket/delta_hits_partitioned/', '...', '...', SETTINGS allow_experimental_delta_kernel_rs = 1) WHERE CounterID = 62 AND EventDate >= '2013-07-14' AND EventDate <= '2013-07-15' GROUP BY EventDate ORDER BY EventDate

Before: 18.486 sec.
After: 4.389 sec.

Developer: Ksenia Sumarokova.

Something Else...

Support for Delta Lake on Azure Blob Storage.

Developer: Smita Kulkarni.

Data Lakes From MergeTree Tables

MergeTree tables on read-only disks can refresh their state and load new data parts, if they appear in the background.

This lets you run unlimited number of readers on top of externally hosted, continuously updating datasets.

Good for data sharing and publishing.

Developer: Alexey Milovidov.

Data Lakes From MergeTree Tables

-- there should be a single writer (at most): CREATE TABLE writer (...) ORDER BY () SETTINGS table_disk = true, disk = disk( type = object_storage, object_storage_type = s3, endpoint = 'https://mybucket.s3.us-east-1.amazonaws.com/data/', metadata_type = plain_rewritable); -- there could be unlimited number of readers in any locations: CREATE TABLE reader (...) ORDER BY () SETTINGS table_disk = true, refresh_parts_interval = 1, disk = disk( readonly = true, type = object_storage, object_storage_type = s3, endpoint = 'https://mybucket.s3.us-east-1.amazonaws.com/data/', metadata_type = plain_rewritable);

Demo

Guest Talk

Sai Srirampur and Amogh Bharadwaj, engineers at ClickHouse.

Integrations

A ton of new features for ClickPipes:

— AWS Private Link support;
— managing with ClickHouse Terraform provider;
Prometheus metrics in the cloud endpoint;
— notifications in Email, Slack, and the UI;
— mutual TLS authentication for Kafka;

Metabase: ClickHouse integration is now shipped by default.

Updates to ODBC, JDBC, and Java client.

Bonus

Bonus

Try this:

ssh [email protected]

Thanks to the embedded SSH server, the interactive ClickHouse experience is now available from any platform, including Windows, Android, iPad...

Meetups

— 🇺🇸 Denver, April 23
— 🇺🇸 Austin, May 13
— 🇬🇧 London, May 14
— 🇹🇷 Istanbul, May 15
— 🇨🇳 Shenzhen, May 17

I will be in person on some of these meetups :)

Reading Corner 📖

https://clickhouse.com/blog/

— A year of Rust in ClickHouse 🦀
— MySQL CDC 🐬
— Query Condition Cache 🤑
— Lazy Materialization 🏖️
— Make Before Break: how we manage ClickHouse
— Observability: Dash0, Poizon, Last9 💹
— Security: Harvey 👮

Q&A