1. (45 min) What's new in ClickHouse 25.3 LTS.
2. (10 min) Guest Talk.
3. (5 min) Q&A.
ClickHouse LTS Release.
โ 18 new features ๐ท
โ 13 performance optimizations โ๏ธ
โ 48 bug fixes โ๏ธ
An aggregate function that compresses a stream of records, counts bytes,
and ignores the result, giving the compression ratio.
-- default codec:
:) SELECT estimateCompressionRatio(URL) FROM hits
-- another codec:
:) SELECT estimateCompressionRatio('ZSTD(3)')(URL) FROM hits;
:) SELECT estimateCompressionRatio('Delta, ZSTD(3)')(EventTime) FROM hits;
-- a codec and block size:
:) SELECT estimateCompressionRatio('ZSTD(3)', 65536)(URL) FROM hits;
Demo
Developer: Tariq Almawash.
:) WITH
[1, 2, 3] AS a,
[2, 3, 4] AS b
SELECT
arrayUnion(a, b) AS u,
arrayIntersect(a, b) AS i,
arraySymmetricDifference(a, b) AS diff,
arrayJaccardIndex(a, b) AS similarity,
arrayFilter(x -> (NOT has(i, x)), u) AS another_diff
โโuโโโโโโโโโโฌโiโโโโโโฌโdiffโโโฌโsimilarityโโฌโanother_diffโโ
โ [3,2,1,4] โ [2,3] โ [1,4] โ 0.5 โ [1,4] โ
โโโโโโโโโโโโโดโโโโโโโโดโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโโโ
Developer: Filipp Abapolov.
A modern and secure cryptographic hash function.
:) SELECT hex(keccak256('ClickHouse'))
โโhex(keccak256('ClickHouse'))โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ E50BCD489117263EFF31D66D4AC3862BC98C2A34795834D56ECB237DAF88DC12 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
This is the Etherium variant, and it is different than SHA-3,
while using the same algorithm family.
See also: BLAKE3, which is about 5..10 times faster and similarly safe.
Developer: Arnaud Briche.
With the Query Condition Cache, ClickHouse will remember,
which ranges of granules in data parts satisfy the condition in WHERE,
and reuse this information as an ephemeral index for subsequent queries.
SET use_query_condition_cache = 1;
Demo.
Developer: ZhongYuanKai.
A different way of caching on top of remote filesystems:
Filesystem Cache (disk with type = cache) | Userspace Page Cache (new!) |
---|---|
— Writes data on the local filesystem.
— Takes disk space (but can be configured on tmpfs) — Survives server restart. — Both on-disk and in-memory (OS page cache). — Does not show up in the server's memory usage. |
— Present only in memory.
— Does not depend on the filesystem. — Does not survive server restart. — Shows up in the server's memory usage. — Good for disk-less servers! |
Developer: Michael Kolupaev.
Configure it on the server:
$ cat config.d/page_cache.yaml
page_cache_max_size: 100G
It will use up to the specified amount of memory,
but will evict memory whenever required for other server needs.
Enable its usage on the query-level:
SET use_page_cache_for_disks_without_file_cache = 1;
Demo
Developer: Michael Kolupaev.
In previous versions, the mark cache will grow up to the maximum size
over time, collecting data that is no longer needed.
In 25.3, entries for old data parts are removed proactively.
Developer: Alexey Milovidov.
Processing a bunch of external files on the current machine:
:) SELECT uniq(text) FROM s3('s3://my_bucket/data/*');
Processing a bunch of external files on a distributed cluster:
:) SELECT uniq(text) FROM s3Cluster(default, 's3://my_bucket/data/*');
Now the distributed processing is set up automatically:
SET parallel_replicas_for_cluster_engines = 1, -- enabled by default
cluster_for_parallel_replicas = 'default',
enable_parallel_replicas = 1,
max_parallel_replicas = 100;
Developer: Konstantin Bogdanov.
ClickHouse server supports many interfaces:
— native;
— HTTP;
— GRPC;
— MySQL compatibility;
— PostgreSQL compatibility;
— ZooKeeper;
Now it also implements the SSH protocol,
so any SSH client can connect to it directly.
Demo
Developer: George Gamezardashvili, Nikita Mikhailov.
The new data types, Variant, Dynamic, and JSON
are production-ready in 25.3!
Developer: Pavel Kruglov.
CREATE TABLE test (time DateTime, data JSON) ORDER BY time;
How it works:
— Analyzes the JSON and infers data types for every path.
— Stores every path and every distinct type as a subcolumn.
— Uses efficient representations for sparse subcolumns.
It enables fast column-oriented storage and queries
on arbitrary semistructured data!
Developer: Pavel Kruglov.
A database engine to represent Iceberg tables:
CREATE DATABASE demo_catalog ENGINE = DataLakeCatalog
SETTINGS catalog_type = 'glue', region = 'us-west-2',
aws_access_key_id = 'AKIA...', aws_secret_access_key = '...';
SHOW TABLES FROM demo_catalog;
SELECT * FROM "demo_catalog"."db.table";
Demo
Developer: Alexander Sapin.
It can represent Delta Lake tables and Iceberg tables in the "uniform" mode:
CREATE DATABASE unity_demo
ENGINE = DataLakeCatalog(
'https://endpoint.cloud.databricks.com/api/2.1/unity-catalog')
SETTINGS catalog_type = 'unity',
warehouse = 'workspace', catalog_credential = '...'
SHOW TABLES FROM unity_demo;
SELECT * FROM "unity_demo"."db.table";
Demo
Developer: Alexander Sapin.
Stanley Chukwuemeke, SA at AWS
In ClickHouse:
— REST (24.12)
— Glue (25.3)
— Unity (25.3)
— Hive metastore;
Support for server-side parameters in the Rust driver,
support for the Variant data type, and for reading arbitrary formats.
Email notifications for Postgres connector in ClickPipes.
Metabase: fixes for WITH inside a CTE.
ODBC driver: resolved tech debt.
Updates for Node.JS and Go drivers.
— ๐ฎ๐ณ Delhi, March 22
— ๐ญ๐บ Budapest, March 25
— ๐บ๐ธ Boston, March 25
— ๐ง๐ท Sรฃo Paulo, March 25
— ๐ฎ๐ฑ Tel Aviv, March 26
— ๐บ๐ธ New York, March 26
— ๐บ๐ธ Washington DC, March 27
— ๐ฆ๐บ Sydney, March 31
— ๐ณ๐ด Oslo, April 8
— ๐ฌ๐ง London, May 14
I will be in person on some of these meetups :)
— A Linux Kernel debugging story ๐ป
— Optimizing Queries For Bluesky ๐ฆ
— How we built BYOC
— Solarwinds on BYOC
— How Poolside uses ClickHouse
— Astronomer also uses ClickHouse
— Postgres data modeling tips
— Input Format Matchup
— Agent-Facing Analytics