1. (35 min) What's new in ClickHouse 25.9.
2. (20 min) Guest Talks.
3. (5 min) Q&A.
ClickHouse Autumn Release.
β 25 new features π
β 22 performance optimizations π
β 83 bug fixes πΏ
-- 25.8:
SET use_query_cache = true;
-- 25.9:
SET use_query_cache;
Developer: Thraeka.
:) SELECT isValidASCII('Hello')
ββisValidASCII('Hello')ββ
1. β 1 β
βββββββββββββββββββββββββ
:) SELECT isValidASCII('ΠΡΠΈΠ²Π΅Ρ')
ββisValidASCII('ΠΡΠΈΠ²Π΅Ρ')ββ
1. β 0 β
ββββββββββββββββββββββββββ
:) SELECT match('hello', '^[[:ascii:]]*$');
:) SELECT match('hello', '^[\x00-\x7F]*$');
:) SELECT arrayAll(c -> ascii(c) < 128, extractAll('hello', '.'));
Developer: Rajat Mohan.
-- 25.8:
:) SELECT arrayFilter(x -> x NOT IN [1, 3, 5], [1, 2, 3, 4]) AS res
-- 25.9
:) SELECT arrayExcept([1, 2, 3, 4], [1, 3, 5]) AS res
ββresββββ
1. β [2,4] β
βββββββββ
Developer: Joanna Hulboj.
CREATE TABLE test (s String)
ENGINE = S3('s3://mybucket/test.parquet',
storage_class_name = 'INTELLIGENT_TIERING');
INSERT INTO test VALUES ('Hello');
-- works with a table function as well:
INSERT INTO FUNCTION s3('s3://mybucket/test.parquet',
storage_class_name = 'INTELLIGENT_TIERING');
Developer: Alexander Sapin.
A new system table to introspect Replicated databases,
similar to system.replicas for Replicated tables.
:) DESCRIBE system.database_replicas
ββnameβββββββββββββββββ¬βtypeββββ
1. β database β String β
2. β is_readonly β UInt8 β
3. β max_log_ptr β Int32 β
4. β replica_name β String β
5. β replica_path β String β
6. β zookeeper_path β String β
7. β shard_name β String β
8. β log_ptr β Int32 β
9. β total_replicas β UInt32 β
10. β zookeeper_exception β String β
11. β is_session_expired β UInt8 β
βββββββββββββββββββββββ΄βββββββββ
Developer: Konstantin Morozov.
Since 25.8, ClickHouse can query other Arrow Flight data sources
with the arrowFlight table function.
And ClickHouse can work as an Arrow Flight server by itself.
New in 25.9: TLS and authentication.
-- without authentication:
SELECT * FROM arrowFlight('host:port', 'dataset_name');
-- with authentication:
SELECT * FROM arrowFlight(
'host:port', 'dataset_name', 'username', 'password');
Developer: Vitaly Baranov.
ClickHouse already has sampling and watermark memory profilers.
Now it uses its memory allocator (jemalloc) for consistent sampling of allocations and deallocations.
It is built in ClickHouse by default and can be used in runtime:
:) SET jemalloc_enable_profiler;
:) SET jemalloc_collect_profile_samples_in_trace_log;
-- after collecting some data
:) SELECT arrayStringConcat(symbols, '\n') AS s, count() AS c,
sum(size) AS size, round(size / c) AS avg_size
FROM system.trace_log WHERE trace_type = 'JemallocSample'
GROUP BY s ORDER BY abs(size) DESC LIMIT 20 FORMAT Vertical
Developer: Antonio Andelic.
:) SET jemalloc_enable_profiler;
:) SET jemalloc_collect_profile_samples_in_trace_log;
-- analyze the samples:
:) SELECT arrayStringConcat(symbols, '\n') AS s, count() AS c,
sum(size) AS size, round(size / c) AS avg_size
FROM system.trace_log WHERE trace_type = 'JemallocSample'
GROUP BY s ORDER BY abs(size) DESC LIMIT 20 FORMAT Vertical;
-- create a heap dump:
:) SYSTEM JEMALLOC FLUSH PROFILE;
Demo
Developer: Antonio Andelic.
In 25.8: reading and analyzing the index happens before the query starts reading the data, which introduces a noticable delay when using large indices.
In 25.9: reading the indices happens together when scanning the data. Queries start faster and can finish earlier, e.g., when LIMIT is reached.
:) SET use_skip_indexes_on_data_read;
Demo
Developer: Amos Bird.
Reordering of the JOIN graph, based on the amount of data to read
and on the column-level statistics.
:) SET query_plan_optimize_join_order_limit = 10;
:) SET allow_statistics_optimize = 1;
Demo
Developer: Vladimir Cherkasov.
Reordering of the JOIN graph, based on the amount of data to read
and on the column-level statistics.
:) SET query_plan_optimize_join_order_limit = 10;
:) SET allow_statistics_optimize = 1;
Demo (TPC-H with statistics)
Developer: Vladimir Cherkasov.
Developer: Vladimir Cherkasov.
A full-text search index in ClickHouse!
— In development since 2022
— first prototype in 2023 (by Harry Lee and Larry Luo)
— now rewritten to make it to production
— remains experimental in 25.9
Developers: Anton Popov, Elmi Ahmadov, Jimmy Aguilar Mena.
CREATE TABLE text_log
(
message String,
...
INDEX inv_idx(message)
TYPE text(tokenizer = 'default')
GRANULARITY 128
)
ENGINE = SharedMergeTree ORDER BY id;
SELECT ... WHERE hasToken(message, 'DDLWorker');
SELECT ... WHERE searchAll(message, ['peak', 'memory']);
SELECT ... WHERE searchAny(message, tokens('01442_merge_detach_attach'));
Developers: Anton Popov, Elmi Ahmadov, Jimmy Aguilar Mena.
Demo
Developers: Anton Popov, Elmi Ahmadov, Jimmy Aguilar Mena.
In development since 2021. Production-ready since 25.8.
ALTER TABLE dbpedia
ADD INDEX vector_index(vector)
TYPE vector_similarity(
'hnsw', 'cosineDistance', 1536, 'bf16', 64, 512);
ALTER TABLE dbpedia MATERIALIZE INDEX vector_index;
WITH ... AS reference_vector
SELECT ... FROM table WHERE ...
ORDER BY cosineDistance(vector, reference_vector)
LIMIT 10;
Developer: Shankar Iyer.
22.9 — experimental version
Introduced by Arthur Filatenkov, Vladimir Makarov, Danila Mishin, Nikita Vasilenko, Alexander Piachonkin, Nikita Evsiukov, Hakob Sagatelyan.
23.8 — integration with the USearch library
Introduced by Davit Vardanyan.
25.1 — faster vector indices
By Robert Schulze, Michael Kolupaev.
25.5 — beta (prefiltering, postfiltering, rescoring)
Developer: Shankar Iyer, Robert Schulze.
25.8 — GA (index-only reading, fetch multiplier, binary quantization)
Developer: Shankar Iyer.
HNSW algorithm with quantization options (bf16, i8, b1).
Supports different filtering modes:
post-filtering (run ANN search, then apply other filters)
or pre-filtering (apply regular filters, then run ANN across filtered results).
Supports filtering multiplier
(e.g., find 100 nearest candidates to filter and return 10).
Avoids reading the data column if not necessarily.
Developer: Shankar Iyer.
ALTER UPDATE for Iceberg
An option for DROP TABLE for Iceberg to remove the data
system.iceberg_metadata_log
ORC and Avro for Iceberg data files
Unity catalog on Azure
Distributed INSERT SELECT for data lakes
Developers: Konstantin Vedernikov, Smita Kulkarni.
ClickHouse already supports subscribing to NATS!
JetStream — persistent queues in NATS (similar to Kafka, but simpler).
CREATE TABLE test.nats (key UInt64, value UInt64)
ENGINE = NATS
SETTINGS nats_url = 'nats1:4444',
nats_stream = 'test_stream',
nats_consumer_name = 'test_consumer',
nats_subjects = 'test_subject',
nats_format = 'TSV',
nats_row_delimiter = '\n';
Developer: Dmitry Novikov.
ClickPipes
— Advanced Settings: full API, UI, and Terraform support
— JSON arrays & multiple JSON Objects
— S3 Unordered Mode: performance for high-volume S3 ingestions
— Postgres CDC: added failover replication slot support and PlanetScale Metal compatibility.
Ecosystem
— Official Flink Connector
— Python 0.9.0 release with SQLAlchemy 2.x support
— Polars integration, and Pandas 1.x/2.x compatibility
— Java client 0.9.2 targeting BI tools like Tableau and DBeaver
— C++ client 2.6.0: better handling of numeric data types, external data.
— πΈπ¬ Singapore, Sept 25
— π¦πΊ Melbourne, Sept 30
— π―π΅ Tokyo, Sept 30
— πͺπΈ Madrid, Sept 30
— πͺπΈ Barcelona, Oct 1
— πΊπΈ San Francisco (Iceberg), Oct 1
— πΊπΈ Seattle AI Night, Oct 2
— π¨π Zurich, Oct 9
— πΊπΈ Chicago, Oct 9
— π¬π§ London, Oct 15
— Optimizing GROUP BY with parallel replicas
— Optimizing ClickHouse for high-core Intel CPUs
— Cost-efficient Observability
— ClickStack for Next.JS applications
— Migrating data from TimescaleDB using ClickPipes
— Lyft, Exabeam, Ford, SewerAI
— Can LLMs replace on call SREs today?