1. (55 min) What's new in ClickHouse 25.1.
2. (5 min) Q&A.
ClickHouse New Year Release.
โ 15 new features ๐
โ 36 performance optimizations โธ๏ธ
โ 77 bug fixes โ
24.12:
CREATE TABLE test
(
t Tuple(a UInt8, s String)
) ORDER BY tupleElement(t, 'a')
25.1:
CREATE TABLE test
(
t Tuple(a UInt8, s String)
) ORDER BY t.a -- much better!
Developer: Pavel Kruglov.
Especially useful with the new JSON data type!
24.12:
CREATE TABLE test
(
data JSON
) ORDER BY getSubcolumn(data, 'headers.referer')
25.1:
CREATE TABLE test
(
data JSON
) ORDER BY data.headers.referer -- much better!
Works with the primary key and secondary indices.
clickhouse-client --query "
-- First query:
SELECT 1;
-- Second query:
SELECT 2;
"
When running a script, the number of each query, and the line number where it begins, will be logged in the query log and text log.
Developer: Alexey Milovidov.
CREATE TABLE t1 (...)
PARALLEL WITH
CREATE TABLE t2 (...)
...
Works with queries that don't return any output
(CREATE, DROP, RENAME, ALTER, OPTIMIZE, etc).
Implemented to speed up the RESTORE command.
Can speed up creating complex database schemas.
Developer: Vitaly Baranov.
24.12:
CREATE TABLE t
(
a UInt64, b UInt64, c UInt64,
INDEX idx_a(a) TYPE minmax,
INDEX idx_b(b) TYPE minmax,
INDEX idx_c(c) TYPE minmax
) ORDER BY ();
25.1:
CREATE TABLE t
(
a UInt64, b UInt64, c UInt64
) ORDER BY () SETTINGS add_minmax_index_for_numeric_columns = 1;
Developer: Smita Kulkarni.
24.12:
:) CREATE TABLE t
(
a UInt64, b UInt64, c UInt64
) /*ENGINE=MergeTree*/ ORDER BY () SETTINGS index_granularity = 8192;
Code: 115. DB::Exception: Unknown setting 'index_granularity'.
:) CREATE TABLE t2 AS t SETTINGS index_granularity = 8192;
Code: 115. DB::Exception: Unknown setting 'index_granularity'.
25.1: Everything works!
You can even mix query-level and table-level settings.
Developer: Raul Marin.
:) SELECT '{"a":{"b":{"c":[123]}}}'::JSON AS x FORMAT JSON
24.12:
In 24.5 it's also controlled by the output_format_json_pretty_print setting. This applies to "presentational" formats, such as JSON, JSONCompact, not for line-oriented formats, such as JSONEachRow, JSONCompactEachRow. Developer: Pavel Kruglov. |
25.1:
|
# This is a text format:
$ clickhouse-client -q "SELECT * FROM table FORMAT JSON"
# This is a binary format, and we are writing into a file:
$ clickhouse-client -q "SELECT * FROM table FORMAT Parquet" > file.parquet
# This is a binary format, and we are writing to the terminal:
$ clickhouse-client -q "SELECT * FROM table FORMAT Parquet"
The requested output format `Parquet` is binary and could produce side-effects when output directly into the terminal.
If you want to output it into a file, use the "INTO OUTFILE" modifier in the query or redirect the output of the shell command.
Do you want to output it anyway? [y/N]
Developer: Alexey Milovidov.
:) SELECT 'Hello, ' AS a, 'World\n ' AS b;
How will this look like in the CLI?
Demo
Controlled by the output_format_pretty_highlight_trailing_spaces setting.
Developer: Alexey Milovidov.
:) SELECT 'Hello' AS a, 'Beautiful\nWorld' AS b
UNION ALL SELECT 'World', 'Goodbye' FORMAT Pretty;
How will this look like in the CLI?
Demo
Controlled by the output_format_pretty_multiline_fields setting.
Developer: Alexey Milovidov.
In Pretty formats:
:) SELECT time::Date,
uniq(data.headers.`cf-connecting-ip`),
uniq(data.url)
FROM website_traffic
WHERE time >= yesterday() GROUP BY 1 ORDER BY 1
โโCAST(time, 'Date')โโฌโuniq(data.heโฏnecting-ip)โโฌโuniq(data.url)โโ
1. โ 2025-01-27 โ 897995 โ 65974 โ
2. โ 2025-01-28 โ 246953 โ 21596 โ
โโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโ
Developer: Alexey Milovidov.
# The easiest way to convert data between formats.
# The compression is automatically detected from file names:
$ ch --copy < data.csv.gz > data.jsonl.zst
# Process the data and convert it on the fly:
$ ch --query "
SELECT * FROM table WHERE upyachka != 'fail'
" < data.avro.gz > result.parquet
Developer: Alexey Milovidov.
ClickHouse 24.12:
:) SELECT * FROM system.numbers WHERE number % 30000 = 1 LIMIT 10;
โโnumberโโ
1. โ 1 โ
2. โ 30001 โ
3. โ 60001 โ
โโโโโโโโโโ
โโnumberโโ
4. โ 90001 โ
5. โ 120001 โ
โโโโโโโโโโ
Does it have to be this way? Not anymore! Demo.
Developer: Alexey Milovidov.
Demo
Developer: Alexey Milovidov.
Parallel Hash Join is the default algorithm since 24.12.
Now it is faster. Demo.
Developer: Nikita Taranov.
Using the new, secondary index cache, Controlled by server settings,
skipping_index_cache_size and skipping_index_cache_max_entries.
Example: a table with 33,114,778 embeddings of 768 dimensions:
Full scan: 216 sec.
Indexed search (without the cache): 33 sec.
Indexed search (with the cache): 0.064 sec ๐
Developers: Robert Schulze, Michael Kolupaev.
ClickHouse 25.1 — 40% improvement on ClickBench!
Parallel Replicas (enabled with the enable_parallel_replicas setting) allow to use the resources of all replicas for a single query, to speed up heavy queries, especially in the case of operating on a shared storage.
Developers: Igor Nikonov, Nikita Taranov.
system.trace_log table contains traces from the Sampling Profiler, embedded in ClickHouse.
The Sampling Profiler collects information about where the query is spending time, as well as where did it allocate memory, and allows collecting flame graphs down to the level of individual queries, or up to aggregations over time across the cluster.
In ClickHouse 24.12, it contains addresses in the machine code, that could be transformed to code locations on the fly using the addressToLine and addressToSymbol functions. This information is invalidated on every release.
In ClickHouse 25.1, there are persisted symbols and lines columns, so the analysis can be done across different builds.
Developer: Alexey Milovidov.
Aka "on the fly mutations".
SET apply_mutations_on_fly = 1;
Now all ALTER UPDATE/DELETE will be applied on the fly on SELECT queries, even before the corresponding operations have been fully materialized.
Developer: Anton Popov.
This feature has been already enabled for the customers of ClickHouse Cloud for more than a year. Now it is implemented for the regular MergeTree and ReplicatedMergeTree engines.
CREATE TABLE test
(
id UInt64 DEFAULT generateSerialID('test'),
data String
)
ORDER BY id;
INSERT INTO test (data) VALUES VALUES ('Hello'), ('World');
The new, generateSerialID function implements named distributed counters
(stored in Keeper), which can be used for auto-increments in tables.
It is fast (thanks to batching) and safe for parallel and distributed operation.
Developer: Alexey Milovidov.
The Merge table engine allows to unify multiple tables as one.
Also available as a merge table function.
CREATE TABLE t1 (a UInt8, b String) ENGINE = Memory;
CREATE TABLE t2 (a Int32, c Array(String)) ENGINE = Memory;
CREATE TABLE merge_table ENGINE = Merge(currentDatabase(), '^t(1|2)$');
SELECT * FROM merge_table;
SELECT * FROM merge('^t(1|2)$');
In version 24.12, Merge tables took the structure of the first table
(unless the structure was explicitly specified).
In version 25.1, Merge tables make a union of columns of all tables.
It even unifies data types to a common type or a Variant data type.
Developer: Alexey Milovidov.
$ curl http://localhost:8123/ -d "SELECT number % 3 AS k, count()
FROM numbers(1e9) GROUP BY k WITH TOTALS FORMAT JSONEachRowWithProgress"
{"progress":{"read_rows":"74631669","read_bytes":"597053352","total_rows_to_read":"1000000000","elapsed_ns":"100040117"}}
{"progress":{"read_rows":"151160199","read_bytes":"1209281592","total_rows_to_read":"1000000000","elapsed_ns":"200065327"}}
{"progress":{"read_rows":"227754138","read_bytes":"1822033104","total_rows_to_read":"1000000000","elapsed_ns":"300115954"}}
{"progress":{"read_rows":"304348077","read_bytes":"2434784616","total_rows_to_read":"1000000000","elapsed_ns":"400193152"}}
{"progress":{"read_rows":"380680380","read_bytes":"3045443040","total_rows_to_read":"1000000000","elapsed_ns":"500227569"}}
...
{"progress":{"read_rows":"762472713","read_bytes":"6099781704","total_rows_to_read":"1000000000","elapsed_ns":"1000340080"}}
{"progress":{"read_rows":"838870425","read_bytes":"6710963400","total_rows_to_read":"1000000000","elapsed_ns":"1100394164"}}
{"progress":{"read_rows":"914548638","read_bytes":"7316389104","total_rows_to_read":"1000000000","elapsed_ns":"1200466502"}}
{"progress":{"read_rows":"990880941","read_bytes":"7927047528","total_rows_to_read":"1000000000","elapsed_ns":"1300489758"}}
{"progress":{"read_rows":"1000000000","read_bytes":"8000000000","total_rows_to_read":"1000000000","elapsed_ns":"1312531422"}}
{"meta":[{"name":"k","type":"UInt8"},{"name":"count()","type":"UInt64"}]}
{"row":{"k":0,"count()":"333333334"}}
{"row":{"k":1,"count()":"333333333"}}
{"row":{"k":2,"count()":"333333333"}}
{"totals":{"k":0,"count()":"1000000000"}}
Developer: Alexey Milovidov.
The formats JSONEachRowWithProgress, JSONStringsEachRowWithProgress
stream newline-delimited JSON events, where each event is one of:
progress, meta, row, totals, extremes, exception, as soon as they appear.
Works properly with Transfer-Encoding: chunked, even with HTTP compression (Content-Encoding: gzip/deflate/brotli/zstd).
Demo.
Developer: Alexey Milovidov.
Note: these formats already existed, but the implementation didn't work as I wanted.
We sped up reading from Iceberg!
No comments. Install 25.1 and try it by yourself.
Developer: Dan Ivanik.
ClickPipes supports column renaming.
JDBC driver improves standard compliance and reliability.
dbt now supports Refreshable Materialized Views.
Grafana has initial support for Variant, Dynamic, and JSON data types.
Go client fully supports Variant, Dynamic, and JSON data types.
A template for Dataflow for transferring from BigQuery to ClickHouse.
— ๐ญ๐ท Zagreb, Jan 30th
— ๐ง๐ช FOSDEM, Brussels, Feb 1, 2
— ๐ฎ๐ณ Mumbai, Feb 1st
— ๐ฌ๐ง London, Feb 5th
— ๐ฎ๐ณ Bangalore, Feb 8th
— ๐ฆ๐ช Dubai, Feb 10th
I will be in person on some of these meetups :)
— Evolution of the Cloud
— Compute-compute separation
— The journey for ARM support
— BuzzHouse — Fuzzing ClickHouse
— Medalion architecture for BlueSky
— Evidence.dev with ClickHouse
— Solving problems with monitoring dashboards
— IBM, M3ter, AMP, Increff...