ClickHouse: Release 23.12 Webinar

Release 23.12

ClickHouse New Year Release.

— 21 new feature 🎁

— 18 performance optimizations 🛷

— 37 bug fixes ⛄

ORDER BY ALL

Sort by all selected columns:

SELECT ... FROM table ORDER BY ALL

SELECT ... FROM table ORDER BY ALL DESC

SELECT ... FROM table ORDER BY ALL DESC COLLATE 'tr'

Developer: ZhongYuanKai 🎁.

PASTE JOIN

SELECT * FROM ( SELECT toYear(date), medianExact(price) AS m, bar(m, 0, max(m) OVER (), 20) AS b FROM uk_price_paid GROUP BY ALL ORDER BY ALL ASC ) AS t PASTE JOIN ( SELECT * FROM numbers(100) ) AS u

Demo

Developer: Yarik Briukhovetskyi 🥳.

SQID

Short Unique Identifiers From Numbers

:) SELECT sqid(123); UKk :) SELECT sqid(123456789); rDOOFW :) SELECT sqid(123, 456, 789); eVH6til6J

Developers: AwakeLJW, Robert Schulze 🎄.

seriesPeriodDetectFFT

Find the main frequency of a signal.

Demo.

Developer: Bhavna Jindal 🎇.

SHA-512/256

SELECT hex(SHA512('ClickHouse')) AS a, hex(substring(SHA512('ClickHouse'), 1, 32)) AS b, hex(SHA256('ClickHouse')) AS c, hex(SHA512_256('ClickHouse')) AS d FORMAT Vertical Row 1: ────── a: 668F21E61C95DE56E6D1A3424FD127B68F460A882DE39622084FB3D2E062DC59FE092BA261227BF1675C0F50824A665445FC2C5685FB63148F95F0D0A03781F1 b: 668F21E61C95DE56E6D1A3424FD127B68F460A882DE39622084FB3D2E062DC59 c: DEE91DDB90597936302CFD4341E409C87779ED9E0E8C8FDE580E0AF6CDACE179 d: F02AF5534B170E405BDA96CBB667103E7E022A92C7BF2EF07AA962C719E09350

Developer: Bharat Nallan 🪅.

Indices On ALIAS Columns

This was already supported: indices on materialized columns:

CREATE TABLE gis ( lat Float64, lon Float64, mercator_x Float64 MATERIALIZED (lon + 180) / 360, mercator_y Float64 MATERIALIZED 1/2 - log(tan((lat + 90) / 360 * pi())) / 2 / pi(), INDEX idx_x (mercator_x) TYPE minmax, INDEX idx_y (mercator_y) TYPE minmax, PRIMARY KEY mortonEncode(0xFFFFFFFF * mercator_x, 0xFFFFFFFF * mercator_y) ) ENGINE = MergeTree

Indices On ALIAS Columns

This was already supported: indices on functional expressions:

CREATE TABLE gis ( lat Float64, lon Float64, INDEX idx_x ((lon + 180) / 360) TYPE minmax, INDEX idx_y (1/2 - log(tan((lat + 90) / 360 * pi())) / 2 / pi()) TYPE minmax, PRIMARY KEY mortonEncode(0xFFFFFFFF * mercator_x, 0xFFFFFFFF * mercator_y) ) ENGINE = MergeTree

Indices On ALIAS Columns

This is supported since 23.12: indices on ALIAS columns:

CREATE TABLE gis ( lat Float64, lon Float64, mercator_x Float64 ALIAS (lon + 180) / 360, mercator_y Float64 ALIAS 1/2 - log(tan((lat + 90) / 360 * pi())) / 2 / pi(), INDEX idx_x (mercator_x) TYPE minmax, INDEX idx_y (mercator_y) TYPE minmax, PRIMARY KEY mortonEncode(0xFFFFFFFF * mercator_x, 0xFFFFFFFF * mercator_y) ) ENGINE = MergeTree

Developer: Vladimir Cherkasov 🎅.

Indices On ALIAS Columns

If you need indexing of location data,
you can also use H3, S2, or Geohash:

:) SELECT geoToH3(52.3676, 4.9041, 6) 605641106796838911 :) SELECT geoToS2(52.3676, 4.9041) 4402065658753719925 :) SELECT geohashEncode(52.3676, 4.9041) t0v5zsubmw2x

What else should we implement for GIS data?

APPLY DELETED MASK

Batch DELETE:

ALTER TABLE t DELETE WHERE user = 'Petya'

— a full rewrite of data parts by copying all the records
except the deleted ones.

Lightweight DELETE:

DELETE FROM t WHERE user = 'Petya'

— a lazy deletion by creating a mask of records
to filter out on SELECT and during merges.

ALTER TABLE t APPLY DELETED MASK

— cleans the deleted records after a lightweight delete operation,
without the need for OPTIMIZE query.

Developer: Anton Popov ⛄.

APPLY DELETED MASK

ALTER TABLE t APPLY DELETED MASK

— cleans the deleted records after a lightweight delete operation,
without the need for OPTIMIZE query.

Essentially transforms lightweight DELETEs to a batch operation.

Bonus:

ALTER TABLE t APPLY DELETED MASK IN PARTITION p

Developer: Anton Popov ⛄.

_part_offset Column

Pick up to 100 first records from every data part:

:) SELECT count(), uniq(_part) FROM github_events WHERE _part_offset < 100 ┌─count()─┬─uniq(_part)─┐ │ 1700 │ 17 │ └─────────┴─────────────┘ 1 row in set. Elapsed: 0.007 sec. Processed 139.09 thousand rows

Developer: Amos Bird 🐉.

Performance Improvements

count() For Merge Tables

Demo

Developer: Skyoct 🎊.

Memory Usage In Hash JOIN

23.11:

:) SELECT count() FROM hits AS t1 INNER JOIN hits AS t2 ON t1.URL = t2.Referer WHERE t1.URL != '' AND t2.Referer != '' Received exception from server (version 23.11.3): Memory limit (for query) exceeded: would use 98.02 GiB: While executing JoiningTransform.

Memory Usage In Hash JOIN

23.12:

:) SELECT count() FROM hits AS t1 INNER JOIN hits AS t2 ON t1.URL = t2.Referer WHERE t1.URL != '' AND t2.Referer != '' ┌─────count()─┐ │ 51624536761 │ └─────────────┘ 1 row in set. Elapsed: 131.179 sec. Processed 199.99 million rows. Peak memory usage: 8.36 GiB.

51,624,536,761 joined records, and everything is fine 🥂

Developer: Vladimir Cherkasov, Liu Neng.

Faster Aggregation

Optimization for the case of identical keys spanning a block:

SELECT number DIV 100000 AS k, avg(number) AS c FROM numbers_mt(1000000000) GROUP BY k ORDER BY k LIMIT 10

Was: 101 ms, 9602551188 rows/sec.

Now: 81 ms, 11695102172 rows/sec.

Developer: Anton Popov ⛄.

Faster MIN/MAX

This is a low-level optimization.
It adds batching and CPU dispatching for various instruction sets.

SELECT max(number) FROM system.numbers

Was: 1.48 billion rows/s.
Now: 3.40 billion rows/s.

SELECT max(number) FROM system.numbers_mt

Was: 61.81 billion rows/s.
Now: 69.95 billion rows/s.

Developer: Raúl Marín 🍾.

Optimizations For FINAL

FINAL - apply the merge logic on the fly on SELECT query.

Applicable for Replacing-, Collapsing-, Aggregating-, Summing-,
and other MergeTree tables.

Allows to get the behavior of unique primary key, as well as UPSERT, REPLACE, INSERT IGNORE as in OLTP databases.

Can be specified explicitly in a query:
SELECT ... FROM table FINAL
or with a setting: SET final = 1

A downside: SELECT becomes slower.

Optimizations For FINAL

Do not merge the data across different partitions
if the table's partition key consists of columns from the primary key.

Developer: Maksim Kita 🎈.

Optimizations For FINAL

Find data parts with non-intersecting ranges of the primary key
to avoid merging data from that parts.

Developer: Maksim Kita 🎈.

A few more optimizations are on the way!

Something Interesting

Refreshable Materialized Views

CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db.]table_name REFRESH EVERY|AFTER interval [OFFSET interval] RANDOMIZE FOR interval DEPENDS ON [db.]name [, [db.]name [, ...]] [TO [db.]name] [(columns)] [ENGINE = engine] AS SELECT ...

??? 🤔

Developer: Michael Kolupaev, Michael Guzov.

Refreshable Materialized Views

Run the SELECT query in background
and atomically replace the table with its result.

Run the refresh process using a flexible configured schedule.

Support a dependency graph with multiple materialized views.

Motivation:
— periodic export from external databases
similar to dictionaries;
— periodic non-incremental calcualtions
in contrast to the (incremental) materialized views.

Developer: Michael Kolupaev, Michael Guzov 🎁.

Refreshable Materialized Views

Demo.

Developer: Michael Kolupaev, Michael Guzov 🎁.

Binary Map

Why ClickHouse binary is large?

Open http://localhost:8123/binary to check it.

Developer: Alexey Milovidov 🧑‍🎄.

Map Of the Internet

If you liked it, check another project: map of the Internet

https://reversedns.space/

Integrations

Grafana plugin v4-beta:
— improved UX and lowered the learning curve;

Please share your feedback at
https://github.com/grafana/clickhouse-datasource/releases/tag/v4.0.0-beta

Integrations

MySQL compatibility:
— substring supports the Enum data type;
— date_trunc supports case-insensitive unit names;
— format supports all data types;

JavaScript client:
— expose query statistics;
— support CloudFlare workers;
— recognize retriable error codes

Updates for PowerBI, Metabase, dbt, Apache Beam, Kafka Connector.

Reading Corner

https://clickhouse.com/blog/

— ClickHouse + RedPanda, by Goldsky
— A Year of ClickHouse Cloud;
— Druid Deprecation and ClickHouse Adoption at Lyft;
— The Geek Narrator: Clickhouse Internals with Tom and Tyler.

Video Recordings: https://www.youtube.com/c/ClickHouseDB

ClickHouse: Release 23.12 Webinar

ClickHouse
Release 23.12

Release 23.12 Webinar

Release 23.12

Small Treats

ORDER BY ALL

PASTE JOIN

SQID

seriesPeriodDetectFFT

SHA-512/256

Indices On ALIAS Columns

Indices On ALIAS Columns

Indices On ALIAS Columns

Indices On ALIAS Columns

APPLY DELETED MASK

APPLY DELETED MASK

_part_offset Column

Performance Improvements

count() For Merge Tables

Memory Usage In Hash JOIN

Memory Usage In Hash JOIN

Faster Aggregation

Faster MIN/MAX

Optimizations For FINAL

Optimizations For FINAL

Optimizations For FINAL

Something Interesting

Refreshable Materialized Views

Refreshable Materialized Views

Refreshable Materialized Views

Bonus

Binary Map

Map Of the Internet

Integrations

Integrations

Reading Corner

Q&A

ClickHouseRelease 23.12

Release 23.12 Webinar

Release 23.12

Small Treats

ORDER BY ALL

PASTE JOIN

SQID

seriesPeriodDetectFFT

SHA-512/256

Indices On ALIAS Columns

Indices On ALIAS Columns

Indices On ALIAS Columns

Indices On ALIAS Columns

APPLY DELETED MASK

APPLY DELETED MASK

_part_offset Column

Performance Improvements

count() For Merge Tables

Memory Usage In Hash JOIN

Memory Usage In Hash JOIN

Faster Aggregation

Faster MIN/MAX

Optimizations For FINAL

Optimizations For FINAL

Optimizations For FINAL

Something Interesting

Refreshable Materialized Views

Refreshable Materialized Views

Refreshable Materialized Views

Bonus

Binary Map

Map Of the Internet

Integrations

Integrations

Reading Corner

Q&A

ClickHouse
Release 23.12