1. (50 min) What's new in ClickHouse 25.5.
3. (10 min) Q&A.
ClickHouse Spring/Summer Release.
โ 15 new features ๐ธ
โ 23 performance optimizations ๐ฆ
โ 64 bug fixes ๐
Two new functions:
— stringBytesUniq — the number of distinct bytes in a string;
— stringBytesEntropy — Shannon's entropy of the distribution of bytes;
Determine how random the string is.
SELECT round(stringBytesEntropy(randomString(10000)), 2) AS res
โโโresโโ
1. โ 7.98 โ -- close to 8 bit per character
โโโโโโโโ
Demo.
Developer: Sachin Kumar Singh.
base32Encode, base32Decode — use the RFC-4648 alphabet
:) WITH 'ClickHouse is a good database' AS s
SELECT base32Encode(s),
base58Encode(s),
base64Encode(s)
Row 1:
โโโโโโ
base32Encode(s): INWGSY3LJBXXK43FEBUXGIDBEBTW633EEBSGC5DBMJQXGZI=
base58Encode(s): 44Xce2JCNL96kJGtfCgHPpV4axarmE3843VSYYAQ
base64Encode(s): Q2xpY2tIb3VzZSBpcyBhIGdvb2QgZGF0YWJhc2U=
Developer: Joanna Hulboj.
An extension to ngrams that deterministically adds a small amount of longer infrequent substrings to the set of ngrams.
New functions: sparseGrams, sparseGramsUTF8,
sparseGramsHashes, sparseGramsHashesUTF8.
Demo.
It is an elegant algorithm to accelerate text and code search.
Developers: Konstantin Vedernikov.
Parquet files can contain a WKB-encoded geometry data.
In previous ClickHouse versions, this data was recognized as a binary string.
Since version 25.5, it is inferred as Point, LineString, MultiLineString, Polygon or MultiPolygon types in ClickHouse.
Demo.
Developer: Konstantin Vedernikov.
When processing files from stdin, the FROM clause can be omitted:
ch -q "name, price + tax ORDER BY 2 DESC" < data.csv
ch -q "arrayJoin(members.login), org_name" < data.json
ch -q "arrayJoin(json.members.login), json.org_name
" --input-format JSONAsObject < data.json
SELECT can be omitted as well.
This makes clickhouse-local a (faster and richer) replacement of jq.
Demo.
Developer: Alexey Milovidov.
Now the query TRUNCATE ALL TABLES FROM db
can be clarified: TRUNCATE ALL TABLES FROM db LIKE '%pattern%'
Developer: Yarik Briukhovetskyi.
For partition pruning of Iceberg tables.
Example: an Iceberg table in AWS Athena:
CREATE TABLE iceberg_benchmark.bucket_partitioned
(
watchid bigint,
javaenable int,
title string,
...
) PARTITIONED BY (bucket(50, watchid))
TBLPROPERTIES ('table_type'='iceberg');
is split into 50 buckets by the hash of watchid.
Developer: Dan Ivanik.
SELECT watchid
FROM "iceberg_benchmark.bucket_partitioned"
WHERE watchid = 8005433341155183383
LIMIT 10
ClickHouse 25.4: 51.2 sec.
ClickHouse 25.5: 1.3 sec.
Plus two new functions, icebergHash and icebergBucketTransform.
Developer: Dan Ivanik.
Now ClickHouse has support for every data catalog:
— 24.12: REST
— 25.3: Unity
— 25.3: Glue
— 25.5: Hive Metastore
Developer: Konstantin Vedernikov.
Iceberg tables may contain a version-hint.text pointing to the actual snapshot. So it can be used instead of finding the latest metadata file.
A table-level setting, iceberg_use_version_hint to complete the existing iceberg_metadata_file_path, iceberg_metadata_table_uuid, iceberg_recent_metadata_file_by_last_updated_ms_field.
Helps using tables with concurrent updates, without the help of a data catalog*
* yes, everything around Iceberg ecosystem is complicated and brittle. Sorry about that.
Developer: Arnaud Briche.
ClickHouse allows tables to perform data transformations during background merges.
E.g., ReplacingMergeTree does deduplication,
AggregatingMergeTree does aggregation.
If a SELECT query has the FINAL modifier after a table,
it performs merge on SELECT.
The problem: secondary indices couldn't skip ranges of data on SELECT,
because this data can affect the query result due to the merge.
The problem: secondary indices couldn't skip ranges of data on SELECT,
because this data can affect the query result due to the merge.
Filtering with secondary indices can be enabled with the setting:
use_skip_indexes_if_final, which allows to filter the data,
but also may skip some of the updates.
A new option in ClickHouse 25.5, use_skip_indexes_if_final_exact_mode
— takes care about all the updates by reading extra ranges after filtering.
Demo.
Developer: Shankar Iyer.
— are subqueries that depend on the columns from the outer scope.
SELECT * FROM users AS u1 WHERE EXISTS (
SELECT * FROM users2 AS u2 WHERE u1.age = u2.age)
They can appear in many different contexts: EXISTS, IN, scalar...
In 25.4 we supported correlated subqueries inside the WHERE clause with the EXISTS operator.
In 25.5 we support scalar correlated subqueries inside the WHERE clause
and correlated subqueries inside SELECT!
Now it is enough to cover the TPC-H test suite without modifying queries.
Developer: Dmitry Novik.
22.9 — experimental version
Introduced by Arthur Filatenkov, Vladimir Makarov, Danila Mishin, Nikita Vasilenko, Alexander Piachonkin, Nikita Evsiukov, Hakob Sagatelyan.
23.8 — integration with the USearch library
Introduced by Davit Vardanyan.
25.1 — faster vector indices
By Robert Schulze, Michael Kolupaev.
25.5 — beta
Developer: Shankar Iyer, Robert Schulze.
Support for pre-filtering and post-filtering modes.
SET vector_search_filter_strategy = auto|postfilter|prefilter
Determines how to filter data for hybrid search.
Developer: Shankar Iyer.
ClickPipes supports Azure Blob Storage in private preview.
dbt v1.9.0 added a microbatch strategy.
and support for column-level TTL.
Java client speeds up insertion with the new RowBinaryWriter.
Grafana integration v4.9.0 added suggestions in the SQL editor,
as well as SQL formatter.
Better support for traces: added kind, status, instrumentation library, links, events and state data.
A chat agent connected to ClickHouse demo datasets
using the ClickHouse MCP server.
The ClickHouse MCP server also works with your databases!
Example with the Foursquare dataset:
— ๐ฏ๐ต Tokyo, June 5
— ๐จ๐พ Cyprus, June 10
— ๐บ๐ธ Washington DC, June 12
— ๐ฎ๐ฑ Tel Aviv, June 17
— ๐บ๐ธ Atlanta, July 8
— ๐บ๐ธ New York, July 15
I will be in person on some of these meetups :)
— Dashboard In ClickHouse Cloud
— ClickHouse and Parquet
— How we made MySQL CDC
— Chartmetric — music artists analytics
— BoundaryML — how to make AI structured
— Wallarm, Flock Safety, Real Estate Analytics