ClickHouse: Release 26.3 LTS Call

ClickHouse release 26.3 LTS

1. (40 min) What's new in ClickHouse 26.3 LTS.

2. (15 min) Guest Talk.

3. (5 min) Q&A.

Release 26.3 LTS

ClickHouse LTS Release.

— 27 new features 🌷

— 40 performance optimizations 🐇

— 202 bug fixes 🐝

Small And Nice Features

Natural Sorting

A new function naturalSortKey for human-friendly sorting:

SELECT name FROM files ORDER BY name file1.txt file10.txt file100.txt file2.txt file20.txt

SELECT name FROM files ORDER BY naturalSortKey(name) file1.txt file2.txt file10.txt file20.txt file100.txt

Can be used as a function in any part of the query, e.g., in arraySort. Demo.

Developer: Nazarii Piontko.

JSONExtract works with JSON type

-- native JSON data type: WITH '{"ClickHouse":{"version":"26.3"}}'::JSON AS s SELECT s.ClickHouse.version; -- String: WITH '{"ClickHouse":{"version":"26.3"}}' AS s SELECT JSONExtractString(s, 'ClickHouse', 'version') -- Since 26.3: WITH '{"ClickHouse":{"version":"26.3"}}'::JSON AS s SELECT JSONExtractString(s, 'ClickHouse', 'version')

Developer: Fisnik Kastrati.

has function for JSON

Check if a path exists in a JSON column, similar to Map:

SELECT * FROM events WHERE has(data, 'metrics.latency')

— Works similarly to has for the Map type.

Developer: DQ.

Read-only MergeTree tables

A new MergeTree setting table_readonly:

CREATE TABLE my_table (...) SETTINGS table_readonly = 1; -- ALTER TABLE my_table MODIFY SETTING table_readonly = 1; INSERT INTO my_table VALUES (1, 2, 3); -- Exception: Table is in readonly mode.

Use case: archived, "sealed" tables.

Readonly tables don't run background threads.

Developer: Mathuranath Metivier.

Pretty EXPLAIN

New options for EXPLAIN: pretty=1 and compact=1:

play-eu :) EXPLAIN pretty=1, compact=1 SELECT URL, count() FROM hits WHERE URL LIKE '%google%' GROUP BY URL ORDER BY count() DESC LIMIT 10; ┌─explain───────────────────────────────────────────────────────────────┐ │ Limit (preliminary LIMIT) │ │ └──Sorting (Sorting for ORDER BY) │ │ └──Aggregating │ │ └──Filter ((WHERE + Change column names to column identifiers)) │ │ └──ReadFromMergeTree (default.hits) │ └───────────────────────────────────────────────────────────────────────┘

— pretty=1 — tree-style indented output.
— compact=1 — collapses Expression steps.

Developer: Kirill Kopnev.

More functions for Unicode

We already have normalizeUTF8NFC, normalizeUTF8NFD, normalizeUTF8NFKC, normalizeUTF8NFKD.

caseFoldUTF8 — Unicode case folding
removeDiacriticsUTF8 — remove diacritical marks
normalizeUTF8NFKCCasefold — NFKC normalization + case folding

SELECT caseFoldUTF8('Straße') = caseFoldUTF8('STRASSE') -- 1 SELECT removeDiacriticsUTF8('crème brûlée') -- creme brulee SELECT normalizeUTF8NFKCCasefold('ﬁle') -- file

Developer: George Larionov.

Limit skipping unavailable shards

When allowing skipping unavailable shards, control over their number:

SET skip_unavailable_shards = 1; SET max_skip_unavailable_shards_num = 5; SET max_skip_unavailable_shards_ratio = 0.1;

Useful on very large clusters, on observability datasets,
as well as when aggregating over regionally federated parts of the cluster.

Developer: Alexey Milovidov.

SQL Compatibility

Some functions can be used without parentheses:

SELECT NOW, CURRENT_DATE, CURRENT_TIMESTAMP;

NOT operator precedence now matches the SQL standard:

NOT x IS NULL -- now parsed as NOT (x IS NULL)

SOME keyword (alias for ANY):

SELECT * FROM t WHERE x = SOME (SELECT y FROM u)

Parenthesized table join expressions:

SELECT * FROM (t1 CROSS JOIN t2) JOIN t3 ON ...

Developers: Aly Kafoury, Alexey Milovidov, Artem Kytkin.

More small features

A new setting, output_format_trim_fixed_string
— Strip trailing null bytes from FixedString in text output.

:) SELECT toFixedString('abc', 10) FORMAT TSV abc\0\0\0\0\0\0\0 :) SELECT toFixedString('abc', 10) FORMAT TSV SETTINGS output_format_trim_fixed_string = 1; abc

Developers: NeedmeFordev.

:) SELECT toDaysInMonth(now()) 31

Developers: Vitaly Baranov.

Performance Improvements

Vertical merge for TTL DELETE

Background merges can use one of the algorithms:
— Horizontal: read blocks of data with all columns and merge them;
— Vertical: merge columns with sorting key first; then read and merge each of the other columns;

Vertical merge takes less memory and processed faster for wide tables,
and is usually preferred.

But Vertical is not supported for some type of merges and some operations.

Since 26.3, vertical merges work for TTL DELETE operations!

Developer: murphy-4o.

JOIN reordering for ANTI, SEMI, FULL

The optimizer can now swap sides of ANTI, SEMI, and FULL joins
based on statistics:

SELECT * FROM large_table LEFT ANTI JOIN small_table USING (id)

— Previously only INNER and LEFT/RIGHT joins were reordered.
— Now the optimizer picks the best side automatically for all join types.

In 26.3, this requires enabling statistics for tables.

Developer: Hechem Selmi.

Efficient S3Queue updates

26.2: S3Queue lists all entries in the prefix to find new files

26.3: In the ordered mode, S3Queue uses the StartAfter parameter

— Avoids re-listing full prefix history.
— Reduces ListObjects API calls.
— Significant cost and latency improvement for queues with long history.

Developer: Venkata Vineel.

Improved subcolumns reading

More accurate size calculation allowed to lower memory usage
up to 8 times for heavy JSON objects when reading a few subcolumns.

Developer: Pavel Kruglov.

Something Interesting

Materialized CTE 🧪

CTEs (subqueries in the WITH clause) can now be evaluated only once
and stored in temporary tables:

SET enable_materialized_cte = 1; WITH top_users AS MATERIALIZED ( SELECT user_id, count() AS cnt FROM events GROUP BY user_id ORDER BY cnt DESC LIMIT 1000) SELECT * FROM top_users INNER JOIN (SELECT user_id FROM top_users WHERE ...) ...

Demo

Developer: Dmitry Novik.

Sharded Map

Bucketed serialization for Map columns:

CREATE TABLE tab (id UInt64, m Map(String, UInt64)) ENGINE = MergeTree ORDER BY id SETTINGS map_serialization_version = 'with_buckets', max_buckets_in_map = 32;

— 2–49x faster single-key lookups depending on map size.
— Uses map_serialization_version_for_zero_level_parts = 'basic'
to keep insert speed close to baseline.

Demo

Developer: Pavel Kruglov.

WebAssembly UDFs 🧪

Create user-defined functions in WebAssembly:

INSERT INTO system.webassembly_modules (name, code) SELECT 'collatz', '...'; CREATE FUNCTION collatz_steps LANGUAGE WASM ARGUMENTS (n UInt32) RETURNS UInt32 FROM 'collatz' :: 'steps'; SELECT groupArray(collatz_steps(number::UInt32)) FROM numbers(1, 100);

— Write UDFs in any language that compiles to WASM:
Rust, C, C++, Go, Zig, ...
— Sandboxed execution with Wasmtime.
— Experimental.

Developers: Vladimir Cherkasov, Alexey Smirnov, Vasily Chekalkin.

WebAssembly UDFs 🧪

Rust crate: clickhouse-wasm-udf:

use clickhouse_wasm_udf_bindgen::clickhouse_udf; #[clickhouse_udf] pub fn some_udf(data: String) -> HashMap<String, String> { // Your implementation here }

Control over runtime complexity:
— webassembly_udf_max_fuel
— webassembly_udf_max_memory
— webassembly_udf_max_input_block_size
— webassembly_udf_max_instances

Developers: Vladimir Cherkasov, Alexey Smirnov, Vasily Chekalkin.

WebAssembly UDFs 🧪

A showcase:

chgeos: PostGIS-compatible spatial functions for ClickHouse,
delivered as a WebAssembly UDF module powered by GEOS 3.12+.

https://github.com/bacek/chgeos

Developer: Vasily Chekalkin.

Polyglot SQL 🧪

Support for 31 external SQL dialects using the polyglot library:

:) SET dialect = 'polyglot', polyglot_dialect = 'snowflake'; :) SELECT IFF(1 > 0, 'yes', 'no');

Athena, BigQuery, CockroachDB, Databricks, Doris, Dremio, Drill, Druid, DuckDB, Dune, Exasol, Fabric, Hive, Materialize, MySQL, Oracle, PostgreSQL, Presto, Redshift, RisingWave, SingleStore, Snowflake, Solr, Spark, SQLite, StarRocks, Tableau, Teradata, TiDB, Trino, TSQL

https://github.com/tobilg/polyglot

Full support for the ClickHouse dialect is also contributed to polyglot.

Developer: Alexey Milovidov.

Async Insert by default

Starting from 26.3 LTS, async inserts are enabled by default.

— ClickHouse will batch all small inserts automatically.
— No configuration changes needed for most users.
— Reduces the number of parts created by frequent small inserts.

New: consistent mechanism of deduplication of both regular and asynchronous inserts, including inserts with materialized views.

Developer: Sema Checherinda.

ALP Codec 🧪

New ALP compression codec for floating-point data:

CREATE TABLE metrics ( timestamp DateTime, value Float64 CODEC(ALP, ZSTD) ) ENGINE = MergeTree ORDER BY timestamp;

ALP (Adaptive Lossless floating-Point compression):
— Exploits the structure of real-world floating-point data.
— Better compression ratio than Gorilla for many workloads.

Developer: Nazarii Piontko.

Lazy type hints for JSON 🧪

Add or modify JSON type hints instantly,
without rewriting data:

SET allow_experimental_json_lazy_type_hints = 1; ALTER TABLE events MODIFY COLUMN data JSON(metrics.count UInt64);

— Metadata-only operation — completes instantly.
— Type hints are applied at query time for old parts.
— Materialized during INSERTs and background merges.

Developer: tanner-bruce.

Data Lakes Performance

Reading from data lakes is now tens of times faster on multi-core machines, when reading from a small number of files.

Applies to Iceberg, Delta Lake, Hudi, and all object storage reads.

Developer: Alexey Milovidov.

Parquet Metadata Cache

A new SLRU cache for Parquet metadata (footer):

— Up to 2x reduction of reads on regular queries.
— Especially beneficial for repeated queries over the same Parquet files.

SET use_parquet_metadata_cache;

Enabled by default.

Consistency is guaranteed by tracking file modifications by etag.

Developer: Grant Holly.

Metadata prefetching for Iceberg

Iceberg tables now support asynchronous metadata prefetching:

CREATE TABLE my_iceberg (...) ENGINE = IcebergS3(...) SETTINGS iceberg_metadata_async_prefetch_period_ms = 60000; SELECT ... FROM my_iceberg SETTINGS iceberg_metadata_staleness_ms = 30000;

— Periodically pre-populates the metadata cache.
— SELECT queries use cached metadata if fresh enough,
eliminating calls to the Iceberg catalog.

Developer: Arsen Muk.

Meetups

— 🇺🇸 Seattle: Observability Meetup, Mar 26
— 🇮🇹 Milan: ClickHouse x Apache Kafka x AWS, Mar 26
— 🇨🇳 Shenzhen Meetup, Mar 28
— 🇺🇸 San Francisco: Fireside Chat, Mar 31
— 🇳🇱 Amsterdam: Lunch & Learn, Mar 31
— 🇺🇸 Seattle: Startup Summit, Apr 1
— 🇺🇸 San Francisco: Iceberg Summit, Apr 8
— 🇳🇱 Amsterdam: Training, Apr 8
— 🇺🇸 San Francisco: AI Demo Night, Apr 9
— 🇹🇼 Taipei: Open Source Meetup, Apr 16
— 🇺🇸 Las Vegas: Google Cloud Next, Apr 22

Reading Corner 📖

https://clickhouse.com/blog/

— AI is redrawing the database market
— Geospatial in ClickHouse
— The Agentic Data ClickStack
— ClickHouse is data lake ready
— Building high-performant full-text search in ClickHouse
— AI observability notebooks
— chDB 4.0

ClickHouse Release26.3 LTS