1. (50 min) What's new in ClickHouse 24.10.
2. (10 min) Q&A.
ClickHouse Halloween Release.
β 25 new features π°
β 15 performance optimizations πΏοΈ
β 60 bug fixes π
:) SET param_name = 'Hello'
:) SELECT 1 AS {name:Identifier}
ββHelloββ
1. β 1 β
βββββββββ
:) SET param_value = 'World'
:) SELECT {value:String} AS {name:Identifier}
ββHelloββ
1. β World β
βββββββββ
Developer: Anton Kozlov.
We already have system tables, query_log and metric_log.
So we added another one, named query_metric_log!
Demo
Developer: Pablo Marcos.
clickhouse-local — a tool for data processing in a command-line;
— full ClickHouse engine with all formats, integrations, and performance!
A new feature: --copy mode, which is a shortcut for SELECT * FROM table:
ch --copy < trace_log.tsv > trace_log.jsonl
Demo
Developer: Denis Hananein.
A calculator mode for ClickHouse:
ch --implicit-select -q "today() - 7"
The intention is to enable it by default for clickhouse-local.
However, you can use it with clickhouse-server under the implicit_select setting.
Developer: Alexey Milovidov.
Before:
clickhouse-client --output_format_json_escape_forward_slashes=1 --query="SELECT 1"
After:
clickhouse-client --output_format_json_escape_forward_slashes --query="SELECT 1"
Developer: Davidtsuk.
For query and format settings.
Useful for development environments and SQL UIs.
Example:
A query to format the docs in Markdown.
https://pastila.nl/?00006ec0/2c96c0bc9df36c5f67bb54da4459e6ed.md
Developer: Alexey Milovidov.
:) SELECT arrayUnion([1, 2], [1, 3], [2])
ββxββββββββ
1. β [3,2,1] β
βββββββββββ
:) SELECT arrayDistinct(arrayConcat([1, 2], [1, 3], [2]))
ββxββββββββ
1. β [3,2,1] β
βββββββββββ
Developer: Peter Nguyen.
It is an old cryptographic hash function that you shouldn't use.
:) SELECT hex(RIPEMD160('Hello, world!'))
ββhex(RIPEMD160('Hello, world!'))βββββββββββ
1. β 58262D1FBDBE4530D8865D3518C6D6E41002610F β
ββββββββββββββββββββββββββββββββββββββββββββ
But it is used in Bitcoin.
And a whole lot of companies are processing blockchain data in ClickHouse.
So now ClickHouse has it!
Developer: Dergousov Maxim.
Did you know that ClickHouse can query MongoDB directly?
SELECT trash, garbage FROM mongodb(
'127.0.0.1:27017', 'test', 'my_collection', 'user', 'password',
'trash String, garbage String');
New in 24.10:
— connection strings with the mongodb:// schema;
— push down of WHERE conditions and ORDER BY;
— support for all MongoDB data types;
SELECT trash, garbage FROM mongodb(
'mongodb://root:clickhouse@localhost:27017/database', 'my_collection',
'trash String, garbage String');
Developer: Kirill Nikiforov.
Did you know about the problem of "xid overflow" in ZooKeeper?
Did you know about the problem of "zxid overflow" at least?
Ok, never mind.
ClickHouse Keeper does not have these problems, and it works perfectly!
Developer: Antonio Andelic.
We have to optimize merges on a cluster with a very high ingestion rate (400 million records per second) and a medium number of machines (30) where all of them are replicas, concurrently assigning merges on a shared storage...
How to do it?
— First we have to visualize it!
The visualization is available at the builtin /merges handler
of the HTTP interface, similarly to /play, /dashboard, and /binary.
Developer: Alexey Milovidov.
ClickHouse 24.10 supports filtering by bloom filters in parquet files.
Caveats: most tools cannot write bloom filters,
and even in Spark it takes efforts to enable them.
Developer: Arthur Passos.
Geospatial queries will be speed up if you have a minmax index on lat, lon,
or an index by mortonEncode(lat, lon) or hilbertEncode(lat, lon).
Developer: Jacky Woo.
10..30% improvement across a wide range of queries.
This was already possible:
GRANT SELECT ON db.* TO user1;
GRANT SELECT ON *.* TO user2;
New in 24.10:
GRANT SELECT ON db.table_pefix_* TO user1;
GRANT SELECT ON db*.table TO user2;
Wildcards work for databases and tables.
They can be put at the end of designate a prefix of database or table names.
You can revoke with a different wildcard than what was granted.
Developer: Pufit.
What happens inside your query right now?
Demo
Developer: Maria Khristenko, Julia Kartseva.CREATE TABLE hits2 ENGINE = MergeTree CLONE AS hits
Quick forking (branching) a table without copying its data.
Identical to creating an empty table and attaching all partitions
from the source.
Demo
Developer: Tuan Pach.
If you have a MergeTree table on an S3-backed disk,
we already provide a feature of local filesystem cache.
The new feature in 24.10 is caching of directly accessed files
and data lake tables on S3 and Azure.
The cache entries are identified by path + ETag.
SELECT * FROM s3('s3://datasets/test.tsv')
SETTINGS filesystem_cache_name = 'cache_for_s3', enable_filesystem_cache = 1;
Developer: Kseniia Sumarokova.
23.12 — the first version with experimental feature.
24.9 — support for the APPEND clause.
24.10 — support for the Replicated database engine.
In version 24.10, Refreshable Materialized Views are production ready!
Demo
Developer: Michael Kolupaev.
Introduced in version 21.12.
In version 24.10, parallel replicas are in beta!
Demo
Developer: Igor Nikonov, Nikita Mikhailov, ...
Introduced in version 24.8. Curently still in the experimental stage.
CREATE TABLE website_traffic
(
time DateTime DEFAULT now(),
data JSON
)
ORDER BY time
Developer: Pavel Kruglov
SELECT data.cf.country, count() AS c,
uniq(data.headers.`cf-connecting-ip`) AS u
FROM website_traffic
WHERE data.headers.host = 'packages.clickhouse.com'
AND match(data.url::String, 'clickhouse.+\.(deb|rpm|apk|tgz)$')
GROUP BY 1 ORDER BY u DESC LIMIT 100
Automatic application of functions for Variant and Dynamic types in 24.10:
SELECT data.cf.country, count() AS c,
uniq(data.headers.`cf-connecting-ip`) AS u
FROM website_traffic
WHERE data.headers.host = 'packages.clickhouse.com'
AND match(data.url, 'clickhouse.+\.(deb|rpm|apk|tgz)$')
GROUP BY 1 ORDER BY u DESC LIMIT 100
Developer: Pavel Kruglov
ClickHouse ODBC driver is built as a self-contained library
(no dependencies, no symbol conflicts).
Fixed an error with conflicting session_id.
Improvements for Java, Python, Go, JavaScript, and Rust drivers
with performance improvements and support for new data types.
Updates for Metabase integration and Apache Beam.
Thanks for many updates to our contributors:
Am-phi, dizider, javiercj93, (Java), earwin, EpicStep (Go), ardabeyazoglu (JS)
— JSON data type under the hood;
— ClickHouse + Supabase;
— Single-page applications with ClickHouse;
— Cleaning the MTA data for NYC;
— ClickHouse for Ad-Tech;
— ClickHouse for Security;
— ClickHouse for Blockchain;
— Migration from Rockset;