ClickHouse: Release 24.10 Webinar

ClickHouse Release 24.10

Release 24.10 Webinar

1. (50 min) What's new in ClickHouse 24.10.

2. (10 min) Q&A.

Release 24.10

ClickHouse Halloween Release.

β€” 25 new features 🌰

β€” 15 performance optimizations 🐿️

β€” 60 bug fixes πŸŽƒ

Small And Nice Features

Parameterized Aliases

:) SET param_name = 'Hello' :) SELECT 1 AS {name:Identifier} β”Œβ”€Hello─┐ 1. β”‚ 1 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”˜ :) SET param_value = 'World' :) SELECT {value:String} AS {name:Identifier} β”Œβ”€Hello─┐ 1. β”‚ World β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”˜

Developer: Anton Kozlov.

Query Metric Log

We already have system tables, query_log and metric_log.

So we added another one, named query_metric_log!

Demo

Developer: Pablo Marcos.

clickhouse-local --copy

clickhouse-local — a tool for data processing in a command-line;

— full ClickHouse engine with all formats, integrations, and performance!

A new feature: --copy mode, which is a shortcut for SELECT * FROM table:

ch --copy < trace_log.tsv > trace_log.jsonl

Demo

Developer: Denis Hananein.

Implicit Select

A calculator mode for ClickHouse:

ch --implicit-select -q "today() - 7"

The intention is to enable it by default for clickhouse-local.

However, you can use it with clickhouse-server under the implicit_select setting.

Developer: Alexey Milovidov.

Boolean Settings as Toggles in the CLI

Before:

clickhouse-client --output_format_json_escape_forward_slashes=1 --query="SELECT 1"

After:

clickhouse-client --output_format_json_escape_forward_slashes --query="SELECT 1"

Developer: Davidtsuk.

Detailed Builtin Documentation

For query and format settings.

Useful for development environments and SQL UIs.

Example:

A query to format the docs in Markdown.

https://pastila.nl/?00006ec0/2c96c0bc9df36c5f67bb54da4459e6ed.md

Developer: Alexey Milovidov.

arrayUnion

:) SELECT arrayUnion([1, 2], [1, 3], [2]) β”Œβ”€x───────┐ 1. β”‚ [3,2,1] β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ :) SELECT arrayDistinct(arrayConcat([1, 2], [1, 3], [2])) β”Œβ”€x───────┐ 1. β”‚ [3,2,1] β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Developer: Peter Nguyen.

RIPEMD160

It is an old cryptographic hash function that you shouldn't use.

:) SELECT hex(RIPEMD160('Hello, world!')) β”Œβ”€hex(RIPEMD160('Hello, world!'))──────────┐ 1. β”‚ 58262D1FBDBE4530D8865D3518C6D6E41002610F β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

But it is used in Bitcoin.
And a whole lot of companies are processing blockchain data in ClickHouse.

So now ClickHouse has it!

Developer: Dergousov Maxim.

Improvements For Querying MongoDB

Did you know that ClickHouse can query MongoDB directly?

SELECT trash, garbage FROM mongodb( '127.0.0.1:27017', 'test', 'my_collection', 'user', 'password', 'trash String, garbage String');

New in 24.10:
— connection strings with the mongodb:// schema;
— push down of WHERE conditions and ORDER BY;
— support for all MongoDB data types;

SELECT trash, garbage FROM mongodb( 'mongodb://root:clickhouse@localhost:27017/database', 'my_collection', 'trash String, garbage String');

Developer: Kirill Nikiforov.

64-bit XID in Keeper

Did you know about the problem of "xid overflow" in ZooKeeper?

Did you know about the problem of "zxid overflow" at least?

Ok, never mind.

ClickHouse Keeper does not have these problems, and it works perfectly!

Developer: Antonio Andelic.

Performance Improvements

Optimization For Background Merges

We have to optimize merges on a cluster with a very high ingestion rate (400 million records per second) and a medium number of machines (30) where all of them are replicas, concurrently assigning merges on a shared storage...

How to do it?

— First we have to visualize it!

Optimization For Background Merges

The visualization is available at the builtin /merges handler
of the HTTP interface, similarly to /play, /dashboard, and /binary.

Developer: Alexey Milovidov.

Optimization For Parquet

ClickHouse 24.10 supports filtering by bloom filters in parquet files.

Caveats: most tools cannot write bloom filters,
and even in Spark it takes efforts to enable them.

Developer: Arthur Passos.

pointInPolygon Uses Indices

Geospatial queries will be speed up if you have a minmax index on lat, lon,
or an index by mortonEncode(lat, lon) or hilbertEncode(lat, lon).

Developer: Jacky Woo.

10..30% improvement across a wide range of queries.

Something Interesting

Wildcard Grants

This was already possible:

GRANT SELECT ON db.* TO user1; GRANT SELECT ON *.* TO user2;

New in 24.10:

GRANT SELECT ON db.table_pefix_* TO user1; GRANT SELECT ON db*.table TO user2;

Wildcards work for databases and tables.
They can be put at the end of designate a prefix of database or table names.

You can revoke with a different wildcard than what was granted.

Developer: Pufit.

Real Time Metrics In The Client

What happens inside your query right now?

Demo

Developer: Maria Khristenko, Julia Kartseva.

Table Cloning

CREATE TABLE hits2 ENGINE = MergeTree CLONE AS hits

Quick forking (branching) a table without copying its data.

Identical to creating an empty table and attaching all partitions
from the source.

Demo

Developer: Tuan Pach.

Caching Of Remote Files

If you have a MergeTree table on an S3-backed disk,
we already provide a feature of local filesystem cache.

The new feature in 24.10 is caching of directly accessed files
and data lake tables on S3 and Azure.

The cache entries are identified by path + ETag.

SELECT * FROM s3('s3://datasets/test.tsv') SETTINGS filesystem_cache_name = 'cache_for_s3', enable_filesystem_cache = 1;

Developer: Kseniia Sumarokova.

Refreshable Materialized Views

23.12 — the first version with experimental feature.

24.9 — support for the APPEND clause.

24.10 — support for the Replicated database engine.

In version 24.10, Refreshable Materialized Views are production ready!

Demo

Developer: Michael Kolupaev.

Parallel Replicas

Introduced in version 21.12.

In version 24.10, parallel replicas are in beta!

Demo

Developer: Igor Nikonov, Nikita Mikhailov, ...

JSON Data Type πŸ§ͺ

Introduced in version 24.8. Curently still in the experimental stage.

CREATE TABLE website_traffic ( time DateTime DEFAULT now(), data JSON ) ORDER BY time

Developer: Pavel Kruglov

JSON Data Type πŸ§ͺ

SELECT data.cf.country, count() AS c, uniq(data.headers.`cf-connecting-ip`) AS u FROM website_traffic WHERE data.headers.host = 'packages.clickhouse.com' AND match(data.url::String, 'clickhouse.+\.(deb|rpm|apk|tgz)$') GROUP BY 1 ORDER BY u DESC LIMIT 100

Automatic application of functions for Variant and Dynamic types in 24.10:

SELECT data.cf.country, count() AS c, uniq(data.headers.`cf-connecting-ip`) AS u FROM website_traffic WHERE data.headers.host = 'packages.clickhouse.com' AND match(data.url, 'clickhouse.+\.(deb|rpm|apk|tgz)$') GROUP BY 1 ORDER BY u DESC LIMIT 100

Developer: Pavel Kruglov

Guest Talk

Bonus

Integrations

ClickHouse ODBC driver is built as a self-contained library
(no dependencies, no symbol conflicts).
Fixed an error with conflicting session_id.

Improvements for Java, Python, Go, JavaScript, and Rust drivers
with performance improvements and support for new data types.

Updates for Metabase integration and Apache Beam.

Thanks for many updates to our contributors:
Am-phi, dizider, javiercj93, (Java), earwin, EpicStep (Go), ardabeyazoglu (JS)

Reading Corner

https://clickhouse.com/blog/

— JSON data type under the hood;
— ClickHouse + Supabase;
— Single-page applications with ClickHouse;
— Cleaning the MTA data for NYC;
— ClickHouse for Ad-Tech;
— ClickHouse for Security;
— ClickHouse for Blockchain;
— Migration from Rockset;

Q&A