ClickHouse: Release 26.4 Call

Release 26.4

ClickHouse "Spring" Release.

— 39 new features 🌷

— 45 performance optimizations 🐇

— 238 bug fixes 🐝

Small And Nice Features

The stem function

The stem function is now non-experimental
and works on entire columns of all string-like types:

SELECT stem('searching', 'en') -- 'search' SELECT stem(['running', 'jumps', 'flying'], 'en') -- ['run', 'jump', 'fli']

Works for String, FixedString, Array([Fixed]String),
Nullable, LowCardinality, and Const arguments.

Useful for normalizing tokens before full-text search.

Developer: Jimmy Aguilar Mena.

Smarter INSERT squashing

New behavior of min/max_insert_block_size_rows/bytes
under the compatibility setting use_strict_insert_block_limits:

— Inserted blocks are squashed into chunks closer to max_insert_block_size_*;
— Reduces the number of small parts generated by chatty inserts;
— Less write amplification, fewer background merges.

Old behavior is preserved by default for compatibility.

Developer: Kirill Kopnev.

Even prettier EXPLAIN

EXPLAIN PLAN pretty=1 now prints expressions in a human-readable form,
shows top-level output columns and per-step output columns,
and labels JOINs with estimated row counts and locality.

EXPLAIN PLAN pretty=1 SELECT URL, count() FROM hits WHERE URL LIKE '%google%' GROUP BY URL ORDER BY count() DESC LIMIT 10;

— Friendlier output for understanding query plans.
— Especially useful for plans with multiple JOINs and aggregations.

Developer: Kirill Kopnev.

Case-insensitive column matching

For input formats:

A new setting, input_format_column_name_matching_mode:

SET input_format_column_name_matching_mode = 'ignore_case'; INSERT INTO t (UserID, Country) FROM INFILE 'data.csv' FORMAT CSVWithNames; -- Matches userid, USERID, UserID, ... in the CSV header.

Modes: match_case (default), ignore_case, auto.

Helpful when ingesting CSV/TSV/Parquet files from heterogeneous sources.

Developer: manerone.

arrayAutocorrelation

Compute the normalized autocorrelation of a numeric array per lag:

SELECT arrayAutocorrelation([1, 2, 3, 4, 5, 4, 3, 2, 1]) -- [1, 0.625, 0.166..., -0.25, ...] SELECT arrayAutocorrelation(values, 10) FROM metrics

— Supports integer, float, and decimal array types.
— Optional max_lag argument controls the result length.
— Useful for time series analysis and seasonality detection.

Developer: Wenyu Chen.

arrayTranspose

Transpose a 2D array (a matrix):

SELECT arrayTranspose([[1, 2, 3], [4, 5, 6]]) -- [[1, 4], [2, 5], [3, 6]]

Convenient for switching between row-major and column-major arrays,
working with groupArray results, and preparing data for ML pipelines.

Developer: Vitaly Baranov.

obfuscateQuery

A new SQL function that returns the obfuscated form of a query —
identifiers and literals are replaced, structure preserved:

SELECT obfuscateQuery( 'SELECT name, count() FROM users WHERE age > 18 GROUP BY name')

— Same logic as the clickhouse-format --obfuscate tool,
now available as a function.
— Useful for sharing queries publicly without leaking schema details.
— Pairs well with normalizedQueryHash for query analytics.

Developer: Xuewei Wang.

highlight

Wrap occurrences of search terms with HTML tags:

SELECT highlight('ClickHouse is fast', ['fast', 'click']) -- ClickHouse is fast SELECT highlight(text, ['error', 'warn'], '', '')

— ASCII case-insensitive matching.
— Automatic merging of overlapping matches.
— Custom open/close tags.

Pairs nicely with text indices for building search UIs.

Developer: Peng.

Quotas by normalized query hash

Protect public endpoints from runaway query patterns:

CREATE QUOTA q1 KEYED BY normalized_query_hash FOR INTERVAL 1 minute MAX queries = 100, read_bytes = '100G'; CREATE QUOTA q2 FOR INTERVAL 1 minute MAX queries_per_normalized_hash = 100;

— NORMALIZED_QUERY_HASH: separate quota per unique query shape.
— QUERIES_PER_NORMALIZED_HASH: cap how often any single query
pattern can run within an interval.

Developer: Alexey Milovidov.

hasPhrase / matchPhrase

Phrase search — match a continuous sequence of tokens:

SELECT * FROM logs WHERE hasPhrase(message, 'connection refused')

— Tokens must appear consecutively, in order.
— Brute-force matching by default.
— A text index on the column accelerates phrase search:

CREATE TABLE logs (..., message String, INDEX msg_idx message TYPE text(tokenizer = 'ngrams') GRANULARITY 1)

Developer: Elmi Ahmadov.

Web UI improvements

The built-in play.html got a lot of polish in 26.4:

— Parameterized queries: {name:Type} placeholders show input fields.
— Multi-query mode: run several queries at once, in parallel.
— TOTALS row rendered as a table footer.
— Engine-specific icons in the table sidebar.
— Cell heights capped, with a cell viewer on click.

Developer: Alexey Milovidov.

NATURAL JOIN

Join on all columns sharing the same name and deduplicate them in the result:

SELECT * FROM customers NATURAL JOIN orders -- joins on every column with the same name in both tables

— Automatically matches columns by name.
— Shared columns appear once in the output.
— Standard SQL syntax, now supported in ClickHouse.

Developer: Peter Nguyen.

OVERLAY syntax

SQL standard OVERLAY with PLACING / FROM / FOR keywords:

SELECT OVERLAY('Hello world' PLACING 'SQL' FROM 7 FOR 5) -- 'Hello SQL'

— The overlay function already existed.
— This adds the keyword-based syntactic sugar from SQL:2003.
— Drop-in compatibility with PostgreSQL and other engines.

Developer: Desel72.

SET TIME ZONE

SQL standard alias for SET session_timezone:

SET TIME ZONE 'Europe/Amsterdam'; SELECT now(); -- now reported in Amsterdam time

Drop-in compatible with PostgreSQL session settings.

Developer: phulv94.

VALUES as a table expression

26.3:

SELECT * FROM VALUES((1, 'a'), (2, 'b'), (3, 'c'))

26.4: "standard" syntax, compatible with PostgreSQL:

SELECT * FROM (VALUES (1, 'a'), (2, 'b'), (3, 'c'))

— Convenient for ad-hoc data without a CSV/external source.
— Works as a regular subquery, can be JOINed, filtered, etc.

Developer: Desel72.

EXTRACT — PostgreSQL units

The EXTRACT operator now supports PostgreSQL-style units:

EPOCH, DOW, DOY, ISODOW, ISOYEAR, WEEK, CENTURY, DECADE, MILLENNIUM

SELECT EXTRACT(EPOCH FROM now()) AS epoch, EXTRACT(DOW FROM today()) AS day_of_week, EXTRACT(WEEK FROM today()) AS iso_week, EXTRACT(CENTURY FROM today()) AS century;

Bonus: EXTRACT(WEEK FROM date) no longer throws —
it now returns the ISO week number as expected.

Developer: Alexey Milovidov.

Compound INTERVAL literals

SQL-standard intervals with TO range qualifiers:

SELECT now() + INTERVAL '1:30' HOUR TO MINUTE; SELECT now() + INTERVAL '5 12:30:00' DAY TO SECOND; SELECT INTERVAL '2-6' YEAR TO MONTH;

— Internally decomposed into a sum of single-unit intervals.
— Familiar for users coming from PostgreSQL, Oracle, DB2.

Developer: Desel72.

Performance Improvements

Indices on JSON columns

MergeTree skip indices now work on the set of JSON paths
using JSONAllPaths:

ALTER TABLE events ADD INDEX paths_idx JSONAllPaths(data) TYPE bloom_filter GRANULARITY 1;

— Supported types: bloom_filter, tokenbf_v1, ngrambf_v1, and text (inverted).
— Skip granules where the JSON paths a query is filtering by are absent.

Developer: Pavel Kruglov.

JSONAllValues + text index

A new function JSONAllValues returns every leaf value
of a JSON column as Array(String):

SELECT JSONAllValues(data) FROM events -- ['42', 'click', '2026-04-30', ...]

Build a text index on it — and it kicks in automatically
for filters on JSON subcolumns:

ALTER TABLE events ADD INDEX vals JSONAllValues(data) TYPE text(tokenizer='ngrams') GRANULARITY 1; SELECT * FROM events WHERE data.event_type = 'click'; -- The text index is used to skip granules.

Developer: Anton Popov.

Statistics by default

From 26.4, basic statistics are on by default for new tables:

auto_statistics_types = 'minmax, uniq' materialize_statistics_on_insert = false

— minmax and uniq stats are created for every suitable column.
— Built during background merges — no insert overhead.
— Used by the optimizer for predicate selectivity, JOIN reordering,
and the new part-pruning min/max optimization.

Developer: Han Fei.

LIKE can use text index

Queries with LIKE can now be accelerated by text indices:

ALTER TABLE logs ADD INDEX msg_idx message TYPE text(tokenizer='ngrams') GRANULARITY 1; SELECT * FROM logs WHERE message LIKE '%timeout%error%'; -- Granules without both terms are skipped.

— Works for substring patterns extractable from a LIKE expression.
— Bonus: text index is now GA, no compatibility flag required.

Developer: Elmi Ahmadov.

Distributed index analysis

For huge clusters with many shards, the initiator node can do
primary index analysis across the cluster before reading data.

26.4 brings two improvements:

— Index analysis is applied after partition pruning,
so heavy clusters with many partitions skip irrelevant shards faster.
— Fixed the quadratic blow-up when predicates contained IN subqueries.

Result: lower latency for selective queries on very wide deployments.

Developers: Azat Khuzhin, Anton Popov.

Regexps with alternatives use indices

Primary key pruning now understands regex alternations
that share a common literal prefix:

SELECT * FROM events WHERE match(id, '^(abc-1|abc-2|abc-3)'); -- The common prefix 'abc-' is used for primary key pruning.

— Detects shared literal prefixes inside (...|...|...).
— Falls back to no pruning when no shared prefix exists.

Developer: Yash.

Faster Float ↔ String

The Float-to-String hot path was rewritten in 26.4:

— Replaced dragonbox with zmij — 1.5–3x faster Float → String.
— Extended the itoa fast path with dragonbox-compatible rounding
for large integral floats.
— Int128/UInt128 → String: software division replaced
by Barrett reduction with an unrolled inner loop.

Visible in big aggregations and exports to text formats.

Developer: Raúl Marín.

Faster COUNT DISTINCT

The parallel merge of uniqExact (used by COUNT(DISTINCT))
was rewritten:

— Batch parallel merge instead of per-thread sequential merge.
— Avoids spawning redundant threads for small intermediate states.
— Significant speedups for high-cardinality COUNT(DISTINCT)
on machines with many cores.

Developer: Jiebin Sun.

Something
Interesting

JOIN spilling to disk 🧪

Hash and parallel hash joins now spill to disk automatically
when the memory limit is reached:

SET max_bytes_before_external_join = '4G'; SELECT * FROM orders LEFT JOIN customers USING (customer_id) -- Once the hash side exceeds 4 GB, join converts to grace hash join.

— Transparent fallback — same query, same result.
— Converts to grace hash join on the fly.
— Big JOINs no longer fail with Memory limit exceeded.

Developer: János Benjamin Antal.

Arrow Flight SQL

ClickHouse can now serve queries over Arrow Flight SQL
— a high-throughput, columnar gRPC protocol from Apache Arrow.

Configured as a separate listener:

$ cat config.d/arrow.yaml arrowflight_port: 9005

import pyarrow.flight as flight client = flight.FlightClient("grpc://localhost:9005") ticket = flight.Ticket(b"SELECT * FROM system.numbers LIMIT 10") reader = client.do_get(ticket) print(reader.read_all())

Developer: Yakov Olkhovskiy.

AI functions 🧪

Call LLM endpoints (OpenAI, Anthropic) directly from SQL:

SELECT aiGenerate('Summarize this in one line: ' || review) FROM product_reviews LIMIT 10;

Higher-level helpers built on top:

SELECT aiClassify(text, ['bug', 'feature', 'question']) FROM tickets; SELECT aiExtract(article, 'people, organizations, dates') FROM news; SELECT aiTranslate(message, 'English') FROM logs;

— Provider configured server-side; secrets never leak to clients.

Developer: George Larionov.

Iceberg: remove orphan files 🧪

Reclaim space from files that no live snapshot references:

ALTER TABLE my_iceberg EXECUTE remove_orphan_files;

— Walks the snapshot graph to determine the set of live data files.
— Removes everything else from object storage.
— Companion to EXECUTE expire_snapshots(...) shipped earlier.

Together they let ClickHouse fully maintain Iceberg tables in place.

Developer: murphy-4o.

Delta Lake virtual column pruning

Delta Lake now prunes data files using predicates over virtual columns:

SELECT * FROM my_delta WHERE _partition = 'country=DE/year=2026'; -- Only files in matching partitions are scanned.

— Fewer files opened, fewer requests to object storage.
— Same idea as partition pruning, applied via virtual columns.

Developer: Konstantin Vedernikov.

Iceberg: Unity Catalog & better paths

A round of fixes and improvements for Iceberg writes:

— Better support for Unity Catalog writes.
— Cleaner storage paths vs. metadata paths abstraction.
— Tables now record their location as a URL or absolute path.
— version-hint.txt handled in a Spark-compatible way.
— Position deletes no longer rely on path-inference heuristics.
— Cross-engine interop tests with Azure and Local catalogs.

And a redesigned interface for catalog inserts — cleaner settings,
old storage_* options deprecated.

Developers: Daniil Ivanik, Konstantin Vedernikov.

clickhousectl

A new clickhousectl CLI —
one tool for local ClickHouse and ClickHouse Cloud:

— Install and manage local ClickHouse instances.
— Run queries against local servers and Cloud services.
— Manage Cloud organizations, services, API keys.
— Integrates with AI coding agents via installable skills.

Used in benchmarking blog post:
Comparing ClickHouse versions with clickhousectl

https://github.com/ClickHouse/clickhousectl

OTel.FYI

A new docs site for the OpenTelemetry Collector ecosystem:

https://otel.fyi/

— Auto-generated from opentelemetry-collector-contrib.
— Browse Receivers, Exporters, Processors, Connectors, Extensions.
— MCP server — query OTel component docs from
Claude / Cursor / Windsurf inside your IDE.

A small, focused tool that makes the OTel collector easier to discover and use.

PostgresBench

An open benchmark for Postgres-compatible DBMS using pgbench:

https://github.com/ClickHouse/postgresbench

— Industry-standard TPC-B-like workload.
— Throughput, latency, stability across services.
— Results for AWS RDS, Aurora, Neon, and others.

Built to run reproducible, transparent comparisons of transactional Postgres-compatible engines —
same spirit as ClickBench for analytical workloads.

Meetups

— 🇺🇸 Boston Meetup, May 5
— 🇸🇬 AWS Summit Singapore, May 6
— 🇮🇳 Delhi/Gurgaon Meetup, May 9
— 🇧🇷 São Paulo: Real-time Analytics Training, May 12
— 🇦🇺 AWS Summit Sydney, May 13
— 🇮🇳 Bangalore: Data Engineering Summit, May 14
— 🇬🇧 London: Real-time Analytics Training, May 19
— 🇰🇷 AWS Summit Seoul, May 20
— 🇯🇵 Tokyo: Findy VPoE Summit, May 22
— 🇺🇸 San Francisco: Open House, May 26
— 🇯🇵 Tokyo: AI Engineering Summit, Jun 8
— 🇸🇬 SuperAI Singapore, Jun 10

Reading Corner 📖

https://clickhouse.com/blog/

— Agentic coding at ClickHouse
— ClickHouse at FOSDEM 2026
— ClickHouse at Google Cloud Next '26
— Do you still need Elasticsearch for log analytics?
— What's New in pg_clickhouse
— ClickHouse Cloud on Google Axion processors
— Comparing ClickHouse versions with clickhousectl

ClickHouse Release26.4