ClickHouse: Release 24.6 Webinar

ClickHouse Release 24.6

Photo by Eduard Gordeev, https://www.shutterstock.com/image-photo/amsterdam-town-autumn-november-rain-1571566054

Release 24.6 Webinar

1. (50 min) What's new in ClickHouse 24.6.

2. (5 min) Guest Talks.

2. (5 min) Q&A.

Release 24.6

ClickHouse Summer Release.

β€” 23 new features πŸ–οΈ

β€” 24 performance optimizations πŸ‰

β€” 59 bug fixes 🍦

Photo by Eduard Gordeev, https://www.shutterstock.com/image-photo/amsterdam-night-city-autumn-october-1571585038

Small And Nice Features

Empty Tuples

Almost useless. But needed for completeness.

SELECT (); CREATE TABLE test (x String) ORDER BY (); ALTER TABLE test DETACH PARTITION ();

Demo

Developer: Amos Bird.

generateSnowflakeID

:) SELECT generateSnowflakeID() FROM numbers(3) β”Œβ”€generateSnowflakeID()─┐ 1. β”‚ 7201822390320365568 β”‚ 2. β”‚ 7201822390320365569 β”‚ 3. β”‚ 7201822390320365570 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Generates a monotonic 64-bit unique identifier.

Similar to UUID, but smaller and monotonic.

Has nothing to do with Snowflake Inc.

Developer: Danila Puzov.

snowflakeID Functions

:) SELECT generateSnowflakeID() AS x, snowflakeIDToDateTime(x), snowflakeIDToDateTime64(x) Row 1: ────── x: 7213775618830237696 snowflakeIDToDateTime(x): 2024-07-02 07:28:52 snowflakeIDToDateTime64(x): 2024-07-02 07:28:52.999

Developer: Robert Schulze and Jasine.

Base64 For URL

:) SELECT 'πŸŒžπŸ–οΈπŸ‰πŸŠπŸ¦β›±οΈπŸοΈπŸŒ΄πŸŒŠ' AS x, base64Encode(x) AS a, base64URLEncode(x) AS b, base64Decode(a) AS x1, base64URLDecode(b) AS x2 -- also, tryBase64URLDecode Row 1: ────── x: πŸŒžπŸ–οΈπŸ‰πŸŠπŸ¦β›±οΈπŸοΈπŸŒ΄πŸŒŠ a: 8J+MnvCfj5bvuI/wn42J8J+PivCfjabim7HvuI/wn4+d77iP8J+MtPCfjIo= b: 8J-MnvCfj5bvuI_wn42J8J-PivCfjabim7HvuI_wn4-d77iP8J-MtPCfjIo x1: πŸŒžπŸ–οΈπŸ‰πŸŠπŸ¦β›±οΈπŸοΈπŸŒ΄πŸŒŠ x2: πŸŒžπŸ–οΈπŸ‰πŸŠπŸ¦β›±οΈπŸοΈπŸŒ΄πŸŒŠ

Developer: Mikhail Gorshkov.

parseReadableSize

:) SELECT parseReadableSize('1GB'), parseReadableSize('1 GiB') β”Œβ”€parseReadableSize('1GB')─┬─parseReadableSize('1 GiB')─┐ 1. β”‚ 1000000000 β”‚ 1073741824 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ :) SELECT parseReadableSizeOrNull(arrayJoin(['1.1 MiB', 'Hello'])) AS x β”Œβ”€β”€β”€β”€β”€β”€β”€x─┐ 1. β”‚ 1153434 β”‚ 2. β”‚ ᴺᡁᴸᴸ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The opposite of formatReadableSize.

Developer: Francisco J. Jurado Moreno.

editDistanceUTF8

:) WITH 'HΓ΄tel GenΓ¨ve' AS a, 'Hotel Geneve' AS b SELECT editDistance(a, b), editDistanceUTF8(a, b) β”Œβ”€editDistance(a, b)─┬─editDistanceUTF8(a, b)─┐ 1. β”‚ 4 β”‚ 2 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Developer: Liu Neng.

Virtual Column _time

:) SELECT _path, _file, _time FROM file('*', One) β”Œβ”€_path──────────────────────┬─_file──────┬───────────────_time─┐ 1. β”‚ ...release_24.6/LICENSE.md β”‚ LICENSE.md β”‚ 2024-05-30 15:53:13 β”‚ 2. β”‚ ...release_24.6/index.html β”‚ index.html β”‚ 2024-07-02 08:07:36 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Works for all file-like table functions: file, url, s3, hdfs, azureBlobStorage...

Developer: Ilya Golshtein.

Comparison of IPv4 and IPv6

:) SELECT toIPv4('127.0.0.1') == toIPv6('::ffff:127.0.0.1') 1 :) SELECT toIPv4('127.0.0.1') == toIPv6('::1') 0 :) SELECT toIPv4('127.0.0.1') == toIPv6('::127.0.0.1') 0

IPv4 is considered equal to the corresponding IPv6-mapped address.

Developer: Francisco J. Jurado Moreno.

Boring Security Features

Logging privileges in the query_log

:) SELECT query, used_privileges, missing_privileges FROM system.query_log WHERE type IN ('QueryFinish', 'ExceptionBeforeStart') AND event_time >= now() - 300 Row 1: ────── query: DESCRIBE system.query_log used_privileges: ['SHOW COLUMNS ON system.query_log'] missing_privileges: [] Row 2: ────── query: SELECT query, used_privileges, missing_privileges FROM system.query_log WHERE type IN ('QueryFinish', 'ExceptionBeforeStart') AND event_time >= now() - 300 used_privileges: ['SELECT(query, used_privileges, missing_privileges, type, event_time) ON system.query_log'] missing_privileges: []

You can paste these into a GRANT query.

Developer: Alexey Katsman.

Limits On The Number Of Tables

A new server settings:

max_database_num_to_warn, max_table_num_to_warn
— a user will receive a warning in the client.

max_database_num_to_throw, max_table_num_to_throw
— a user will receive an exception on attempt to create a next
  database or table over the limit.

A question: how many tables a reasonable person can have in ClickHouse?

Developer: Xu Jia.

Usability Improvements

Inline Documentation For System Tables

:) SELECT table, comment FROM system.tables WHERE database = 'system' Row 1: ────── table: aggregate_function_combinators comment: Contains a list of all available aggregate function combinators, which could be applied to aggregate functions and change the way they work. Row 2: ────── table: asynchronous_insert_log comment: Contains a history for all asynchronous inserts executed on the current server.

Developer: Nikita Mikhailov.

Pretty Formats Are Event Prettier

Demo

Developer: Shaun Struwig.

Performance Improvements

Optimal Table Sorting

The physical storage order of MergeTree tables
is defined by their ORDER BY key.

CREATE TABLE hits (CounterID UInt32, UserID UInt64, EventTime DateTime, URL String) ORDER BY (CounterID, UserID);

It has three purposes:
— an index for range requests (can be specified as PRIMARY KEY);
— a key for merging modes, such as Aggregating- or ReplacingMergeTree;
— a way to improve compression by co-locating data;

Can the order be automatically selected for the best compression?

Developer: Igor Markelov.

Optimal Table Sorting

CREATE TABLE hits (CounterID UInt32, UserID UInt64, EventTime DateTime, URL String) ENGINE = MergeTree ORDER BY (CounterID, UserID) SETTINGS optimize_row_order = true;

— After sorting by ORDER BY key, it will automatically sort the data
  for best-effort optimal compression rate!

Demo

Developer: Igor Markelov.

An Option To Skip Secondary Indices On Insert

Get faster INSERTs and postpone the calculation
of secondary indices until background merges:

SET materialize_skip_indexes_on_insert = 0; SET materialize_statistics_on_insert = 0;

Useful in the presence of a large number of indices.

Does not significantly affect the performance of SELECT.

Developer: Anton Popov.

Something Interesting

Named Collections In Keeper

Shared storage of named collections on the cluster:

$ cat /etc/clickhouse-server/config.d/collections.yaml named_collections_storage: type: keeper path: '/named_collections' update_timeout_ms: 1000 # Optional

CREATE NAMED COLLECTION mydb AS host = 'example.test', port = 3306, user = 'business', password = 'xyz', db = 'test'; GRANT NAMED COLLECTION ON mydb TO viewer; SELECT * FROM system.named_collections; SELECT * FROM mysql(mydb, table = 'mytable');

Developer: Kseniia Sumarokova

Hilbert Curves

New functions: hilbertEncode, hilbertDecode.

Similar to mortonEncode, mortonDecode, but different curves.

Use when you have two similar keys, and you want to mix their order:

CREATE TABLE t (x UInt32, y UInt32, ...) ORDER BY hilbertEncode(x, y);

Demo

Developer: Artem Mustafin

Bonus

https://chdb.io/

chdb 2.0

pip install chdb==2.0.0b0

Query Insights

OSS Trust Center

https://trust.clickhouse.com/

Early notifications of vulnerabilities in the open-source ClickHouse distribution.

Bonus: ClickHouse, Inc. is now a CVE Numbering Authority (CNA).

50th International Conference on Very Large Databases, Guangzhou, China - August 26-30, 2024.

Integrations

Integrations

Support for Metabase 0.50 and DBT 1.8.

The Spark connector is now officially supported:
https://github.com/ClickHouse/spark-clickhouse-connector.

Updates for Node.JS and Go clients,
as well as Kafka connector, Grafana, and OTEL.

+ Thanks for many fixes to our contributors:
the4thamigo-uk, gogingersnap777, andy1xx8, sylph-eu, Holi0317, dmitryax.

Reading Corner

https://clickhouse.com/blog/

— How we launched ClickHouse Cloud on Azure;
— How Trip.com processes 50 PB of data in ClickHouse;
— Java Client… the SEQUEL?!;
— Building a UI for Query Insights;
— Shopee, CommonRoom, and others;
— ClickHouse OSS Trust Center;

Q&A