1. (55 min) What's new in ClickHouse 25.11.
2. (5 min) Q&A.
ClickHouse Thanksgiving Release.
โ 24 new features ๐ฆ
โ 27 performance optimizations โ
โ 97 bug fixes ๐
A new system table for introspection and searching for code points:
:) SELECT block, general_category, groupConcat(code_point)
FROM system.unicode
WHERE name LIKE '%LEAF%'
GROUP BY ALL
Demo ๐๐๐๐
Developer: wxybear.
A function for Point, LineString, Ring, Polygon, MultiPolygon...
Similar to PostGIS's ST_FlipCoordinates.
:) SELECT arrayMap(x -> arrayMap(y -> reverse(y), x), polygon); -- 25.10
:) SELECT flipCoordinates(polygon); -- 25.11
For easy integration of different GIS software:
— most databases use the (lon, lat) order;
— most web maps use the (lat, lon) order.
Developer: Sachin Kumar Singh.
A function for compatibility with Postgres:
-- 25.10, ClickHouse's higher-order function:
:) SELECT arrayFilter(x -> x != 1, [1, 2, 1]);
-- 25.11, the new function:
:) SELECT arrayRemove([1, 2, 1], 1);
-- 25.11, compatibility alias for Postgres:
:) SELECT array_remove([1, 2, 1], 1);
Developer: Tiwary Saurav.
A window function, compatible with Spark, Databricks, Snowflake, Postgres.
Calculates the value of a CDF (cumulative distribution function)
— the ratio of the number of rows <= current row in the specified order.
Similar to percent_rank with a slightly different formula.
Developer: Manuel.
A non-aggregate function, similar to least/greatest (min2/max2),
which returns the value of the same type as its arguments,
closest to the average of the arguments.
:) SELECT least(1, 9), greatest(1, 9), midpoint(1, 9)
โโleast(1, 9)โโฌโgreatest(1, 9)โโฌโmidpoint(1, 9)โโ
โ 1 โ 9 โ 5 โ
โโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโ
:) SELECT midpoint('2025-11-25'::Date, '2026-01-01'::Date)
โโmidpoint(CASโฏ', 'Date'))โโ
โ 2025-12-13 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Developer: Simon Michal.
New aggregate functions, similar to argMin, argMax.
-- get the maximum price:
:) SELECT max(price) FROM uk_price_paid WHERE toYear(date) = 2025;
-- get the town corresponding to the maximum price:
:) SELECT argMax(town, price) FROM uk_price_paid WHERE toYear(date) = 2025;
-- get the (town, price) corresponding to the maximum price (25.10):
:) SELECT argMax((town, price), price) FROM uk_price_paid WHERE toYear(date) = 2025;
-- get the (town, price) corresponding to the maximum price (25.11):
:) SELECT argAndMax(town, price) FROM uk_price_paid WHERE toYear(date) = 2025;
Developer: AbdAlRahman Gad.
For queries from S3-compatible object storages.
When you read files from S3, in addition to the data, you can request the following virtual columns: _path, _file, _size, _time, _etag, _tags.
Tags are a set of key-value pairs a user can attach to an object in S3.
They can be changed without rewriting the object.
ClickHouse represents tags as Map(String, String).
Developer: Zicong Qu.
You can already use projections to have different sort orders of a table:
CREATE TABLE pageviews (
CounterID UInt32,
UserID UInt64,
EventTime DateTime,
...
PROJECTION by_time (
SELECT * ORDER BY EventTime -- projection's order
)
) ORDER BY (CounterID, UserID, EventTime) -- table's main order
And the projection with accelerate range queries or queries with ORDER BY.
However, it requires duplicating the storage for the columns (or a subset).
Can we make it work without duplication?
Since 25.11 you can create a projection with a single column, _part_offset:
CREATE TABLE pageviews (
CounterID UInt32,
UserID UInt64,
EventTime DateTime,
...
PROJECTION by_time (
-- works as an index for the main table representation
SELECT _part_offset ORDER BY EventTime
)
) ORDER BY (CounterID, UserID, EventTime)
This projection will accelerate queries using less extra storage
with the cost of worse locality of reads: useful for point queries. Demo.
Developer: Amos Bird.
A projection with GROUP BY:
CREATE TABLE pageviews (
CounterID UInt32,
UserID UInt64,
EventTime DateTime,
MobilePhoneModel String,
...
PROJECTION phones (
SELECT count() GROUP BY CounterID, MobilePhoneModel
)
) ORDER BY (CounterID, UserID, EventTime)
will be used for queries with DISTINCT by any subset of the fields:
SELECT DISTINCT MobilePhoneModel FROM pageviews
Developer: Nihal Z. Miaji.
The case when GROUP BY has a small number of distinct keys
as they are small integers (8/16 bit), but the aggregation states are large:
SELECT
number % 100 AS k, -- small integer as a key
uniq(number) AS u -- large aggregation state
FROM numbers_mt(1e8)
GROUP BY k ORDER BY u DESC LIMIT 10
Demo
Developer: Jianfei Hu.
By storing identical column descriptions for different data parts only once.
Example: a "CI Logs" cluster with:
— 2173 tables, each with 300 columns on average,
— 27,517 data parts,
— 21 trillion rows,
— 430 TB of compressed data,
— 1 shard (ClickHouse Cloud).
The optimisation save 12 GB out of 73 GB of memory!
Especially useful for tables with a large number of columns.
Developer: Azat.
ClickHouse runs distributed queries:
— with Distributed tables;
— with Replicated tables, using "parallel replicas";
— on top of a bunch of external files, e.g., Parquet files on S3,
using the s3Cluster table function.
But what if there is only a single (or a few) large files?
Since 25.11, ClickHouse splits the work using ranges
(row groups) inside single files!
Many servers of the cluster will "eat" different portions of a file.
Developer: Konstantin Vedernikov.
Take a fraction of the resultset:
:) SELECT county, avg(price) AS price
FROM uk_price_paid GROUP BY county ORDER BY price DESC
LIMIT 0.25; -- take first 1/4 of the result
:) SELECT county, avg(price) AS price
FROM uk_price_paid GROUP BY county ORDER BY price DESC
LIMIT 100 OFFSET 0.5; -- take 100 records from the middle
Use case: sampling a portion of the result without using window functions.
Developers: Ahmed Gouda.
Allow one user to run queries on behalf of another user:
GRANT IMPERSONATE ON user1 TO user2;
GRANT IMPERSONATE ON * TO user3;
Run queries on behalf of another user:
EXECUTE AS target_user SELECT * FROM table;
EXECUTE AS target_user; -- sets it for the session
SELECT * FROM table;
Use-case: an app authenticates as one user, and does the work under other configured users for access rights, limits, settings, quotas, and audit.
Developer: Shankar.
We already had all the data types for GIS:
Point, LineString, MultiLineString, Ring, Polygon, MultiPolygon
Since 25.11, we also have one unified data type: Geometry
— it can contain any geometry!
New functions, readWkt, readWkb,
which read any type of geometry as a Geometry value.
ClickHouse also supports H3 and S2 indexes, Geohashes, optimized spherical and geo distances,
Polygonal dictionaries for reverse geocoding, SVG rendering...
Developer: Konstantin Vedernikov.
Automatically provision TLS certificates on a distributed cluster:
$ cat /etc/clickhouse-server/config.d/ssl.yaml
http_port: 80
acme:
email: [email protected]
terms_of_service_agreed: true
# zookeeper_path: '/clickhouse/acme' # by default
domains:
- domain: play.clickhouse.com
Supports the HTTP-01 challenge.
The certificates are shared across the cluster with ClickHouse Keeper.
Developer: Konstantin Bogdanov, Sergey Lokhmatikov.
Classifies texts based on the frequences of n-grams of bytes, code points,
or words (tokens), according to configured models with reference frequences.
nb_models:
model:
name: language
path: '/etc/clickhouse-server/config.d/language.bin'
n: 2 # n-gram size
mode: codepoint # byte, codepoint, token
alpha: 1.0 # optional, smoothing factor
priors: # optional
- prior:
class: 0
value: 0.6
- prior:
class: 1
value: 0.4
Developer: Nihal Z. Miaji.
A simple and fast model that can detect charset, language, topic, etc.,
easily created (trained) on your corpus.
Remains practical in the LLM era for routine tasks on massive volumes of data,
but can't compete on precision.
:) SELECT naiveBayesClassifier('language', text);
-- takes the model name and return the class number, according to the model.
Demo
Developer: Nihal Z. Miaji.
OneLake - Microsoft Fabric's unified data lake,
powered by the OneLake Tables APIs and Apache Iceberg.
Compatible with the Iceberg REST Catalog.
CREATE DATABASE onelake
ENGINE = DataLakeCatalog('https://onelake.table.fabric.microsoft.com/iceberg')
SETTINGS
catalog_type = 'onelake',
warehouse = 'warehouse_id/data_item_id',
onelake_tenant_id = '<tenant_id>',
oauth_server_uri = 'https://login.microsoftonline.com/<tenant_id>/oauth2/v2.0/token',
auth_scope = 'https://storage.azure.com/.default',
onelake_client_id = '<client_id>',
onelake_client_secret = '<client_secret>'
Developers: Konstantin Vedernikov.
ClickStack now supports alerts.
Python driver supports SQLAlchemy 2.x, with Pandas 1 and 2,
r/w support for Arrow, and Polars integration.
Rust driver: type safety for reads and writes, asynchronous inserts.
ODBC driver: proper handling of network errors.
JavaScript driver: handling exceptions in the middle of result stream.
Updates for dbt, Fivetran, Flink, Spark, Kafka connectors.
ClickPipes support Azure Blob Storage as GA.
chDB - a ClickHouse engine embedded in Python
and other languages
Version 3.7.1 is 1.9x faster on ClickBench.
Faster operations with ArrowTable.
— ๐บ๐ธ House Party, The SQL, Dec 2
— ๐ต๐ฑ Warsaw, Nov 26
— ๐บ๐ธ New York, Dec 8
— ๐บ๐ธ San Francisco, Dec 8
— ๐ฎ๐ฉ Jakarta, Dec 9
— ๐ฏ๐ต Tokyo, Dec 15
— ๐ฎ๐ฑ Tel Aviv, Dec 29
— chDB Kernel Upgrade Journey
— ClickHouse integrates with OneLake
— Alerting arrives in ClickStack
— ClickHouse welcomes LibreChat
— Global weather data from flying airplanes
— Merkle Science fights crypto crime at scale
— How we made our internal data warehouse AI-first
— Building StockHouse: Real-time market analytics
— ClickHouse Partners with Japan Cloud
— Team's productivity metrics