ClickHouse: Release 24.2 Webinar

ClickHouse Release 24.2

Release 24.2 Webinar

1. (40 min) What's new in ClickHouse 24.2.

2. (10 min) Lightning Talks.

3. (10 min) Q&A.

Release 24.2

ClickHouse Winter Release.

— 18 new features ☃️

— 18 performance optimizations ⛷️

— 49 bug fixes 🧊

Small And Nice Features

groupArrayIntersect

:) SELECT groupArrayIntersect(arr) FROM ( SELECT [1, 2, 3] AS arr UNION ALL SELECT [2, 3, 4] ) ┌─groupArrayIntersect(arr)─┐ │ [3,2] │ └──────────────────────────┘

Developer: Yarik Briukhovetskyi.

groupArrayIntersect

Let's build our own search index with ClickHouse!

Demo...

Developer: Yarik Briukhovetskyi.

Negative Positional Arguments

:) SELECT SearchEngineID, SearchPhrase, count() FROM hits WHERE SearchPhrase != '' GROUP BY ALL ORDER BY -1 DESC LIMIT 5 ┌─SearchEngineID─┬─SearchPhrase──────────────┬─count()─┐ │ 2 │ карелки │ 46258 │ │ 2 │ мангу в зарабей грама │ 18871 │ │ 2 │ смотреть онлайн │ 16905 │ │ 3 │ албатрутдин │ 16748 │ │ 2 │ смотреть онлайн бесплатно │ 14909 │ └────────────────┴───────────────────────────┴─────────┘

Developer: Flynn (ucasFL).

Negative Positional Arguments

:) SELECT SearchEngineID, SearchPhrase, count() FROM hits WHERE SearchPhrase != '' GROUP BY ALL ORDER BY count() DESC LIMIT 5 ┌─SearchEngineID─┬─SearchPhrase──────────────┬─count()─┐ │ 2 │ карелки │ 46258 │ │ 2 │ мангу в зарабей грама │ 18871 │ │ 2 │ смотреть онлайн │ 16905 │ │ 3 │ албатрутдин │ 16748 │ │ 2 │ смотреть онлайн бесплатно │ 14909 │ └────────────────┴───────────────────────────┴─────────┘

Negative Positional Arguments

:) SELECT SearchEngineID, SearchPhrase, count() AS c FROM hits WHERE SearchPhrase != '' GROUP BY ALL ORDER BY c DESC LIMIT 5 ┌─SearchEngineID─┬─SearchPhrase──────────────┬─────c─┐ │ 2 │ карелки │ 46258 │ │ 2 │ мангу в зарабей грама │ 18871 │ │ 2 │ смотреть онлайн │ 16905 │ │ 3 │ албатрутдин │ 16748 │ │ 2 │ смотреть онлайн бесплатно │ 14909 │ └────────────────┴───────────────────────────┴───────┘

Pretty Format Is Even Prettier

24.1:

:) SELECT count() FROM table ┌────count()─┐ │ 2106592092 │ └────────────┘

24.2:

:) SELECT count() FROM table ┌────count()─┐ │ 2106592092 │ -- 2.11 billion └────────────┘

Developer: RogerYK.

Table function merge with a single argument

Already works:

:) SELECT * FROM merge(database, '^events.+_log$') :) SELECT * FROM merge(currentDatabase(), '^events.+_log$')

Since 24.2:

:) SELECT * FROM merge('^events.+_log$')

Developers: HowePa.

Backups To Azure

If you use ClickHouse in Microsoft Azure...
or if you use it anywhere, but want to have backups on Azure.

BACKUP TABLE data TO AzureBlobStorage( 'AccountName=devstoreaccount1;AccountKey=...;BlobEndpoint=https://...;', 'test_container', 'data_backup'); RESTORE TABLE data AS data_restored FROM AzureBlobStorage( 'AccountName=devstoreaccount1;AccountKey=...;BlobEndpoint=https://...;', 'test_container', 'data_backup');

Developers: Smita Kulkarni.

Private preview of ClickHouse Cloud on Azure: https://clickhouse.com/cloud/azure-waitlist

Inline Template Format

:) SELECT 'Hello' AS a, 'World' AS b FORMAT Template SETTINGS format_template_resultset_format = '<html><body>${data}</body></html>', format_template_row_format = '<p>${a:Raw}: ${b:Quoted}</p>'; <html><body><p>Hello: 'World'</p> </body></html>

Developer: Shaun Struwig.

system.dns_cache

A new system table.

Demo.

Developer: Kirill Nikiforov.

New Compression Level For LZ4

CREATE TABLE test ( x String CODEC(LZ4HC(2)), ... ) ENGINE = MergeTree ORDER BY ...;

Demo.

Developer: Yann Collet.

Performance Improvements

Memory Usage For Primary Key

Vectorized Distances And Dot Product:

Optimized L2Distance, cosineDistance, and dotProduct:
— with FMA instructions in AVX-512;
— and with auto-vectorization for ARM;

Example: cosineDistance over 600-dimension vectors of Float32:
56% performance improvement;

On servers with DDR-4 memory it is memory-bound

Even higher speed-up on servers with DDR-5.

Even higher speed-up on Mac.

Developer: Robert Schulze.

Parallel Flush For Buffer Tables

CREATE TABLE ... ENGINE = Buffer(database, table, num_layers, min_time, max_time, min_rows, max_rows, min_bytes, max_bytes)

The num_layes parameter defines the number of buffers
and parallelism for INSERTs.

But the moving of data from buffers to the target table was sequential.

Now it is parallel! Result: successful continuous insert of >20 GB/sec of compressed data on a cluster.

Developer: Alexander Sapin.

Optimization For Int8 Data Type

By forcing C++ compiler to use strict aliasing.

SELECT sum(number::Int8) FROM numbers(1_000_000_000);

24.1: 0.448 sec. (1.99 billion rows/s., 15.95 GB/s.)

24.2: 0.230 sec. (3.79 billion rows/s., 30.31 GB/s.)

Developer: Raul Marin.

Bonus

— Memory usage optimization for ClickHouse Keeper,
  another ~30% improvement!

Developer: Antonio Andelic.

— Speed up for ASOF JOIN.

Developer: Maksim Kita.

— Optimization For If With Maps.

WITH rand32() % 2 as x SELECT if(x, map(1,2,3,4), map(3,4,5,6)) FROM numbers(10000000);

24.1: 4.293 sec.
24.2: 0.411 sec.

Developer: TaiYang-Li.

Something Interesting

Encapsulation Of Access Control In Views

CREATE TABLE table1 (...); CREATE VIEW view1 AS SELECT c1, c2 FROM table1 WHERE user = 'Vasya'; SELECT * FROM view1;

What access rights are required to read from view1?

By default, it is required to have access for both table1 and view1.

But if I want to give access to a view,
I also have to give access to the whole table.

Encapsulation Of Access Control In Views

CREATE VIEW view1 AS SELECT c1, c2 FROM table1 WHERE user = 'Vasya';

We want to give access to the view without giving access to the table.

To let the view control access to the subset of data.

Solution: the new SQL SECURITY and DEFINER specifications
for CREATE VIEW queries.

Applicable for ordinary VIEW and MATERIALIZED views

Developer: Artem Brustovetskii.

Encapsulation Of Access Control In Views

CREATE VIEW view1 [DEFINER = { user | CURRENT_USER }] [SQL SECURITY { DEFINER | INVOKER | NONE }] AS SELECT c1, c2 FROM table1 WHERE user = 'Vasya';

SQL SECURITY:

INVOKER: the access rights for operations inside the view
  are checked against the user running the query;
— this is equivalent to the default behavior for ordinary views;

DEFINER: the access rights for operations inside the view
  are checked against the defined user;
— this is the new behavior, allowing the view to remember the access rights;

Developer: Artem Brustovetskii.

Encapsulation Of Access Control In Views

-- Alice has access to table1 and creates view1: CREATE VIEW view1 DEFINER = alice SQL SECURITY DEFINER AS SELECT c1, c2 FROM table1 WHERE user = 'Bob'; -- She gives access to view1 to Bob: GRANT SELECT ON default.view1 TO bob; -- Bob does not have access to table1 but can use view1, -- so view1 will SELECT from table1 on behalf of Alice.

Developer: Artem Brustovetskii.

Automatic Detection Of Formats

$ echo '{"Hello": "world"}' > data1 $ echo 'Hello,world' > data2 $ echo -e 'Hello\tworld' > data3 :) SELECT * FROM file('data1') -- we didn't say it is JSON, but it works! ┌─Hello─┐ │ world │ └───────┘ :) SELECT * FROM file('data2') -- we didn't say it is CSV, but it works! ┌─c1────┬─c2────┐ │ Hello │ world │ └───────┴───────┘ :) SELECT * FROM file('data3') -- we didn't say it is TSV, but it works! ┌─c1────┬─c2────┐ │ Hello │ world │ └───────┴───────┘

Developer: Pavel Kruglov.

Adaptive Asynchronous Inserts

SET async_insert = true;

— allows to combine data for insertion from many concurrent clients;
  to support highly concurrent, frequent inserts.

24.1: if there are not many inserts,
they will wait up to the timeout (= 200 ms by default);

— In 24.1, infrequent inserts have a higher latency.

Since 24.2: automatically adjusts to the frequency of inserts;

— In 24.2, you can enable async_insert and stop worrying.

Developer: Julia Kartseva.

Integrations

Lightning Talks

Integrations

Grafana plugin v4 released

The release of version 4, sees a new philosophy to user interaction where
logs and traces become first class citizens in a new query builder experience.

Kafka Connector
— support for complex types (Array, Map) and FixedString.

Updates for PowerBI and Metabase.

Enhancements for dbt support and for Node.js driver.

Reading Corner

https://clickhouse.com/blog/

— Hybrid Query Execution;
— A CDC solution for ClickHouse;
— Analyzing Global Internet Speeds;
— Grafana plugin 4.0 - Leveling up SQL Observability;

Video Recordings: https://www.youtube.com/c/ClickHouseDB

Q&A