Author: Alexey Milovidov, 2026-04-30.
1. (50 min) What's new in ClickHouse 26.4.
2. (10 min) Q&A.
ClickHouse "Spring" Release.
โ 39 new features ๐ท
โ 45 performance optimizations ๐
โ 238 bug fixes ๐
The stem function is now non-experimental
and works on entire columns of all string-like types:
SELECT stem('searching', 'en')
-- 'search'
SELECT stem(['running', 'jumps', 'flying'], 'en')
-- ['run', 'jump', 'fli']
Works for String, FixedString, Array([Fixed]String),
Nullable, LowCardinality, and Const arguments.
Useful for normalizing tokens before full-text search.
Developer: Jimmy Aguilar Mena.
New behavior of min/max_insert_block_size_rows/bytes
under the compatibility setting use_strict_insert_block_limits:
— Inserted blocks are squashed into chunks closer to max_insert_block_size_*;
— Reduces the number of small parts generated by chatty inserts;
— Less write amplification, fewer background merges.
Old behavior is preserved by default for compatibility.
Developer: Kirill Kopnev.
EXPLAIN PLAN pretty=1 now prints expressions in a human-readable form,
shows top-level output columns and per-step output columns,
and labels JOINs with estimated row counts and locality.
EXPLAIN PLAN pretty=1
SELECT URL, count() FROM hits
WHERE URL LIKE '%google%'
GROUP BY URL ORDER BY count() DESC LIMIT 10;
— Friendlier output for understanding query plans.
— Especially useful for plans with multiple JOINs and aggregations.
Developer: Kirill Kopnev.
For input formats:
A new setting, input_format_column_name_matching_mode:
SET input_format_column_name_matching_mode = 'ignore_case';
INSERT INTO t (UserID, Country)
FROM INFILE 'data.csv' FORMAT CSVWithNames;
-- Matches userid, USERID, UserID, ... in the CSV header.
Modes: match_case (default), ignore_case, auto.
Helpful when ingesting CSV/TSV/Parquet files from heterogeneous sources.
Developer: manerone.
Compute the normalized autocorrelation of a numeric array per lag:
SELECT arrayAutocorrelation([1, 2, 3, 4, 5, 4, 3, 2, 1])
-- [1, 0.625, 0.166..., -0.25, ...]
SELECT arrayAutocorrelation(values, 10) FROM metrics
— Supports integer, float, and decimal array types.
— Optional max_lag argument controls the result length.
— Useful for time series analysis and seasonality detection.
Developer: Wenyu Chen.
Transpose a 2D array (a matrix):
SELECT arrayTranspose([[1, 2, 3], [4, 5, 6]])
-- [[1, 4], [2, 5], [3, 6]]
Convenient for switching between row-major and column-major arrays,
working with groupArray results, and preparing data for ML pipelines.
Developer: Vitaly Baranov.
A new SQL function that returns the obfuscated form of a query —
identifiers and literals are replaced, structure preserved:
SELECT obfuscateQuery(
'SELECT name, count() FROM users WHERE age > 18 GROUP BY name')
— Same logic as the clickhouse-format --obfuscate tool,
now available as a function.
— Useful for sharing queries publicly without leaking schema details.
— Pairs well with normalizedQueryHash for query analytics.
Developer: Xuewei Wang.
Wrap occurrences of search terms with HTML tags:
SELECT highlight('ClickHouse is fast', ['fast', 'click'])
-- <em>Click</em>House is <em>fast</em>
SELECT highlight(text, ['error', 'warn'], '<mark>', '</mark>')
— ASCII case-insensitive matching.
— Automatic merging of overlapping matches.
— Custom open/close tags.
Pairs nicely with text indices for building search UIs.
Developer: Peng.
Protect public endpoints from runaway query patterns:
CREATE QUOTA q1
KEYED BY normalized_query_hash
FOR INTERVAL 1 minute MAX queries = 100, read_bytes = '100G';
CREATE QUOTA q2
FOR INTERVAL 1 minute MAX queries_per_normalized_hash = 100;
— NORMALIZED_QUERY_HASH: separate quota per unique query shape.
— QUERIES_PER_NORMALIZED_HASH: cap how often any single query
pattern can run within an interval.
Developer: Alexey Milovidov.
Phrase search — match a continuous sequence of tokens:
SELECT * FROM logs
WHERE hasPhrase(message, 'connection refused')
— Tokens must appear consecutively, in order.
— Brute-force matching by default.
— A text index on the column accelerates phrase search:
CREATE TABLE logs (..., message String,
INDEX msg_idx message TYPE text(tokenizer = 'ngrams') GRANULARITY 1)
Developer: Elmi Ahmadov.
The built-in play.html got a lot of polish in 26.4:
— Parameterized queries: {name:Type} placeholders show input fields.
— Multi-query mode: run several queries at once, in parallel.
— TOTALS row rendered as a table footer.
— Engine-specific icons in the table sidebar.
— Cell heights capped, with a cell viewer on click.
Developer: Alexey Milovidov.
Join on all columns sharing the same name and deduplicate them in the result:
SELECT * FROM customers
NATURAL JOIN orders
-- joins on every column with the same name in both tables
— Automatically matches columns by name.
— Shared columns appear once in the output.
— Standard SQL syntax, now supported in ClickHouse.
Developer: Peter Nguyen.
SQL standard OVERLAY with PLACING / FROM / FOR keywords:
SELECT OVERLAY('Hello world' PLACING 'SQL' FROM 7 FOR 5)
-- 'Hello SQL'
— The overlay function already existed.
— This adds the keyword-based syntactic sugar from SQL:2003.
— Drop-in compatibility with PostgreSQL and other engines.
Developer: Desel72.
SQL standard alias for SET session_timezone:
SET TIME ZONE 'Europe/Amsterdam';
SELECT now();
-- now reported in Amsterdam time
Drop-in compatible with PostgreSQL session settings.
Developer: phulv94.
26.3:
SELECT * FROM VALUES((1, 'a'), (2, 'b'), (3, 'c'))
26.4: "standard" syntax, compatible with PostgreSQL:
SELECT * FROM (VALUES (1, 'a'), (2, 'b'), (3, 'c'))
— Convenient for ad-hoc data without a CSV/external source.
— Works as a regular subquery, can be JOINed, filtered, etc.
Developer: Desel72.
The EXTRACT operator now supports PostgreSQL-style units:
EPOCH, DOW, DOY, ISODOW, ISOYEAR, WEEK, CENTURY, DECADE, MILLENNIUM
SELECT
EXTRACT(EPOCH FROM now()) AS epoch,
EXTRACT(DOW FROM today()) AS day_of_week,
EXTRACT(WEEK FROM today()) AS iso_week,
EXTRACT(CENTURY FROM today()) AS century;
Bonus: EXTRACT(WEEK FROM date) no longer throws —
it now returns the ISO week number as expected.
Developer: Alexey Milovidov.
SQL-standard intervals with TO range qualifiers:
SELECT now() + INTERVAL '1:30' HOUR TO MINUTE;
SELECT now() + INTERVAL '5 12:30:00' DAY TO SECOND;
SELECT INTERVAL '2-6' YEAR TO MONTH;
— Internally decomposed into a sum of single-unit intervals.
— Familiar for users coming from PostgreSQL, Oracle, DB2.
Developer: Desel72.
MergeTree skip indices now work on the set of JSON paths
using JSONAllPaths:
ALTER TABLE events
ADD INDEX paths_idx JSONAllPaths(data)
TYPE bloom_filter GRANULARITY 1;
— Supported types: bloom_filter, tokenbf_v1, ngrambf_v1, and text (inverted).
— Skip granules where the JSON paths a query is filtering by are absent.
Developer: Pavel Kruglov.
A new function JSONAllValues returns every leaf value
of a JSON column as Array(String):
SELECT JSONAllValues(data) FROM events
-- ['42', 'click', '2026-04-30', ...]
Build a text index on it — and it kicks in automatically
for filters on JSON subcolumns:
ALTER TABLE events ADD INDEX vals
JSONAllValues(data) TYPE text(tokenizer='ngrams') GRANULARITY 1;
SELECT * FROM events WHERE data.event_type = 'click';
-- The text index is used to skip granules.
Developer: Anton Popov.
From 26.4, basic statistics are on by default for new tables:
auto_statistics_types = 'minmax, uniq'
materialize_statistics_on_insert = false
— minmax and uniq stats are created for every suitable column.
— Built during background merges — no insert overhead.
— Used by the optimizer for predicate selectivity, JOIN reordering,
and the new part-pruning min/max optimization.
Developer: Han Fei.
Queries with LIKE can now be accelerated by text indices:
ALTER TABLE logs
ADD INDEX msg_idx message TYPE text(tokenizer='ngrams') GRANULARITY 1;
SELECT * FROM logs
WHERE message LIKE '%timeout%error%';
-- Granules without both terms are skipped.
— Works for substring patterns extractable from a LIKE expression.
— Bonus: text index is now GA, no compatibility flag required.
Developer: Elmi Ahmadov.
For huge clusters with many shards, the initiator node can do
primary index analysis across the cluster before reading data.
26.4 brings two improvements:
— Index analysis is applied after partition pruning,
so heavy clusters with many partitions skip irrelevant shards faster.
— Fixed the quadratic blow-up when predicates contained IN subqueries.
Result: lower latency for selective queries on very wide deployments.
Developers: Azat Khuzhin, Anton Popov.
Primary key pruning now understands regex alternations
that share a common literal prefix:
SELECT * FROM events
WHERE match(id, '^(abc-1|abc-2|abc-3)');
-- The common prefix 'abc-' is used for primary key pruning.
— Detects shared literal prefixes inside (...|...|...).
— Falls back to no pruning when no shared prefix exists.
Developer: Yash.
The Float-to-String hot path was rewritten in 26.4:
— Replaced dragonbox with zmij — 1.5–3x faster Float → String.
— Extended the itoa fast path with dragonbox-compatible rounding
for large integral floats.
— Int128/UInt128 → String: software division replaced
by Barrett reduction with an unrolled inner loop.
Visible in big aggregations and exports to text formats.
Developer: Raรบl Marรญn.
The parallel merge of uniqExact (used by COUNT(DISTINCT))
was rewritten:
— Batch parallel merge instead of per-thread sequential merge.
— Avoids spawning redundant threads for small intermediate states.
— Significant speedups for high-cardinality COUNT(DISTINCT)
on machines with many cores.
Developer: Jiebin Sun.
Hash and parallel hash joins now spill to disk automatically
when the memory limit is reached:
SET max_bytes_before_external_join = '4G';
SELECT *
FROM orders LEFT JOIN customers USING (customer_id)
-- Once the hash side exceeds 4 GB, join converts to grace hash join.
— Transparent fallback — same query, same result.
— Converts to grace hash join on the fly.
— Big JOINs no longer fail with Memory limit exceeded.
Developer: Jรกnos Benjamin Antal.
ClickHouse can now serve queries over Arrow Flight SQL
— a high-throughput, columnar gRPC protocol from Apache Arrow.
Configured as a separate listener:
$ cat config.d/arrow.yaml
arrowflight_port: 9005
import pyarrow.flight as flight
client = flight.FlightClient("grpc://localhost:9005")
ticket = flight.Ticket(b"SELECT * FROM system.numbers LIMIT 10")
reader = client.do_get(ticket)
print(reader.read_all())
Developer: Yakov Olkhovskiy.
Call LLM endpoints (OpenAI, Anthropic) directly from SQL:
SELECT aiGenerate('Summarize this in one line: ' || review)
FROM product_reviews LIMIT 10;
Higher-level helpers built on top:
SELECT aiClassify(text, ['bug', 'feature', 'question']) FROM tickets;
SELECT aiExtract(article, 'people, organizations, dates') FROM news;
SELECT aiTranslate(message, 'English') FROM logs;
— Provider configured server-side; secrets never leak to clients.
Developer: George Larionov.
Reclaim space from files that no live snapshot references:
ALTER TABLE my_iceberg
EXECUTE remove_orphan_files;
— Walks the snapshot graph to determine the set of live data files.
— Removes everything else from object storage.
— Companion to EXECUTE expire_snapshots(...) shipped earlier.
Together they let ClickHouse fully maintain Iceberg tables in place.
Developer: murphy-4o.
Delta Lake now prunes data files using predicates over virtual columns:
SELECT * FROM my_delta
WHERE _partition = 'country=DE/year=2026';
-- Only files in matching partitions are scanned.
— Fewer files opened, fewer requests to object storage.
— Same idea as partition pruning, applied via virtual columns.
Developer: Konstantin Vedernikov.
A round of fixes and improvements for Iceberg writes:
— Better support for Unity Catalog writes.
— Cleaner storage paths vs. metadata paths abstraction.
— Tables now record their location as a URL or absolute path.
— version-hint.txt handled in a Spark-compatible way.
— Position deletes no longer rely on path-inference heuristics.
— Cross-engine interop tests with Azure and Local catalogs.
And a redesigned interface for catalog inserts — cleaner settings,
old storage_* options deprecated.
Developers: Daniil Ivanik, Konstantin Vedernikov.
A new clickhousectl CLI —
one tool for local ClickHouse and ClickHouse Cloud:
— Install and manage local ClickHouse instances.
— Run queries against local servers and Cloud services.
— Manage Cloud organizations, services, API keys.
— Integrates with AI coding agents via installable skills.
Used in benchmarking blog post:
Comparing ClickHouse versions with clickhousectl
A new docs site for the OpenTelemetry Collector ecosystem:
— Auto-generated from opentelemetry-collector-contrib.
— Browse Receivers, Exporters, Processors, Connectors, Extensions.
— MCP server — query OTel component docs from
Claude / Cursor / Windsurf inside your IDE.
A small, focused tool that makes the OTel collector easier to discover and use.
An open benchmark for Postgres-compatible DBMS using pgbench:
https://github.com/ClickHouse/postgresbench
— Industry-standard TPC-B-like workload.
— Throughput, latency, stability across services.
— Results for AWS RDS, Aurora, Neon, and others.
Built to run reproducible, transparent comparisons of transactional Postgres-compatible engines —
same spirit as ClickBench for analytical workloads.
— ๐บ๐ธ Boston Meetup, May 5
— ๐ธ๐ฌ AWS Summit Singapore, May 6
— ๐ฎ๐ณ Delhi/Gurgaon Meetup, May 9
— ๐ง๐ท Sรฃo Paulo: Real-time Analytics Training, May 12
— ๐ฆ๐บ AWS Summit Sydney, May 13
— ๐ฎ๐ณ Bangalore: Data Engineering Summit, May 14
— ๐ฌ๐ง London: Real-time Analytics Training, May 19
— ๐ฐ๐ท AWS Summit Seoul, May 20
— ๐ฏ๐ต Tokyo: Findy VPoE Summit, May 22
— ๐บ๐ธ San Francisco: Open House, May 26
— ๐ฏ๐ต Tokyo: AI Engineering Summit, Jun 8
— ๐ธ๐ฌ SuperAI Singapore, Jun 10
— Agentic coding at ClickHouse
— ClickHouse at FOSDEM 2026
— ClickHouse at Google Cloud Next '26
— Do you still need Elasticsearch for log analytics?
— What's New in pg_clickhouse
— ClickHouse Cloud on Google Axion processors
— Comparing ClickHouse versions with clickhousectl