1. (40 min) What's new in ClickHouse 24.2.
2. (10 min) Lightning Talks.
3. (10 min) Q&A.
ClickHouse Winter Release.
— 18 new features ☃️
— 18 performance optimizations ⛷️
— 49 bug fixes 🧊
:) SELECT groupArrayIntersect(arr)
FROM
(
SELECT [1, 2, 3] AS arr
UNION ALL
SELECT [2, 3, 4]
)
┌─groupArrayIntersect(arr)─┐
│ [3,2] │
└──────────────────────────┘
Developer: Yarik Briukhovetskyi.
Let's build our own search index with ClickHouse!
Demo...
Developer: Yarik Briukhovetskyi.
:) SELECT SearchEngineID, SearchPhrase, count()
FROM hits
WHERE SearchPhrase != ''
GROUP BY ALL
ORDER BY -1 DESC
LIMIT 5
┌─SearchEngineID─┬─SearchPhrase──────────────┬─count()─┐
│ 2 │ карелки │ 46258 │
│ 2 │ мангу в зарабей грама │ 18871 │
│ 2 │ смотреть онлайн │ 16905 │
│ 3 │ албатрутдин │ 16748 │
│ 2 │ смотреть онлайн бесплатно │ 14909 │
└────────────────┴───────────────────────────┴─────────┘
Developer: Flynn (ucasFL).
:) SELECT SearchEngineID, SearchPhrase, count()
FROM hits
WHERE SearchPhrase != ''
GROUP BY ALL
ORDER BY count() DESC
LIMIT 5
┌─SearchEngineID─┬─SearchPhrase──────────────┬─count()─┐
│ 2 │ карелки │ 46258 │
│ 2 │ мангу в зарабей грама │ 18871 │
│ 2 │ смотреть онлайн │ 16905 │
│ 3 │ албатрутдин │ 16748 │
│ 2 │ смотреть онлайн бесплатно │ 14909 │
└────────────────┴───────────────────────────┴─────────┘
:) SELECT SearchEngineID, SearchPhrase, count() AS c
FROM hits
WHERE SearchPhrase != ''
GROUP BY ALL
ORDER BY c DESC
LIMIT 5
┌─SearchEngineID─┬─SearchPhrase──────────────┬─────c─┐
│ 2 │ карелки │ 46258 │
│ 2 │ мангу в зарабей грама │ 18871 │
│ 2 │ смотреть онлайн │ 16905 │
│ 3 │ албатрутдин │ 16748 │
│ 2 │ смотреть онлайн бесплатно │ 14909 │
└────────────────┴───────────────────────────┴───────┘
24.1:
:) SELECT count() FROM table
┌────count()─┐
│ 2106592092 │
└────────────┘
24.2:
:) SELECT count() FROM table
┌────count()─┐
│ 2106592092 │ -- 2.11 billion
└────────────┘
Developer: RogerYK.
Already works:
:) SELECT * FROM merge(database, '^events.+_log$')
:) SELECT * FROM merge(currentDatabase(), '^events.+_log$')
Since 24.2:
:) SELECT * FROM merge('^events.+_log$')
Developers: HowePa.
If you use ClickHouse in Microsoft Azure...
or if you use it anywhere, but want to have backups on Azure.
BACKUP TABLE data TO AzureBlobStorage(
'AccountName=devstoreaccount1;AccountKey=...;BlobEndpoint=https://...;',
'test_container', 'data_backup');
RESTORE TABLE data AS data_restored FROM AzureBlobStorage(
'AccountName=devstoreaccount1;AccountKey=...;BlobEndpoint=https://...;',
'test_container', 'data_backup');
Developers: Smita Kulkarni.
Private preview of ClickHouse Cloud on Azure: https://clickhouse.com/cloud/azure-waitlist
:) SELECT 'Hello' AS a, 'World' AS b
FORMAT Template
SETTINGS
format_template_resultset_format = '<html><body>${data}</body></html>',
format_template_row_format = '<p>${a:Raw}: ${b:Quoted}</p>';
<html><body><p>Hello: 'World'</p>
</body></html>
Developer: Shaun Struwig.
A new system table.
Demo.
Developer: Kirill Nikiforov.
CREATE TABLE test
(
x String CODEC(LZ4HC(2)),
...
)
ENGINE = MergeTree ORDER BY ...;
Demo.
Developer: Yann Collet.
Optimized L2Distance, cosineDistance, and dotProduct:
— with FMA instructions in AVX-512;
— and with auto-vectorization for ARM;
Example: cosineDistance over 600-dimension vectors of Float32:
— 56% performance improvement;
On servers with DDR-4 memory it is memory-bound
Even higher speed-up on servers with DDR-5.
Even higher speed-up on Mac.
Developer: Robert Schulze.
CREATE TABLE ...
ENGINE = Buffer(database, table,
num_layers,
min_time, max_time, min_rows, max_rows, min_bytes, max_bytes)
The num_layes parameter defines the number of buffers
and parallelism for INSERTs.
But the moving of data from buffers to the target table was sequential.
Now it is parallel! Result: successful continuous insert of >20 GB/sec of compressed data on a cluster.
Developer: Alexander Sapin.
By forcing C++ compiler to use strict aliasing.
SELECT sum(number::Int8) FROM numbers(1_000_000_000);
24.1: 0.448 sec. (1.99 billion rows/s., 15.95 GB/s.)
24.2: 0.230 sec. (3.79 billion rows/s., 30.31 GB/s.)
Developer: Raul Marin.
— Memory usage optimization for ClickHouse Keeper,
another ~30% improvement!
Developer: Antonio Andelic.
— Speed up for ASOF JOIN.
Developer: Maksim Kita.
— Optimization For If With Maps.
WITH rand32() % 2 as x SELECT if(x, map(1,2,3,4), map(3,4,5,6)) FROM numbers(10000000);
24.1: 4.293 sec.
24.2: 0.411 sec.
Developer: TaiYang-Li.
CREATE TABLE table1 (...);
CREATE VIEW view1 AS SELECT c1, c2 FROM table1 WHERE user = 'Vasya';
SELECT * FROM view1;
What access rights are required to read from view1?
By default, it is required to have access for both table1 and view1.
But if I want to give access to a view,
I also have to give access to the whole table.
CREATE VIEW view1 AS SELECT c1, c2 FROM table1 WHERE user = 'Vasya';
We want to give access to the view without giving access to the table.
To let the view control access to the subset of data.
Solution: the new SQL SECURITY and DEFINER specifications
for CREATE VIEW queries.
Applicable for ordinary VIEW and MATERIALIZED views
Developer: Artem Brustovetskii.
CREATE VIEW view1
[DEFINER = { user | CURRENT_USER }]
[SQL SECURITY { DEFINER | INVOKER | NONE }]
AS SELECT c1, c2 FROM table1 WHERE user = 'Vasya';
SQL SECURITY:
— INVOKER: the access rights for operations inside the view
are checked against the user running the query;
— this is equivalent to the default behavior for ordinary views;
— DEFINER: the access rights for operations inside the view
are checked against the defined user;
— this is the new behavior, allowing the view to remember the access rights;
Developer: Artem Brustovetskii.
-- Alice has access to table1 and creates view1:
CREATE VIEW view1
DEFINER = alice SQL SECURITY DEFINER
AS SELECT c1, c2 FROM table1 WHERE user = 'Bob';
-- She gives access to view1 to Bob:
GRANT SELECT ON default.view1 TO bob;
-- Bob does not have access to table1 but can use view1,
-- so view1 will SELECT from table1 on behalf of Alice.
Developer: Artem Brustovetskii.
$ echo '{"Hello": "world"}' > data1
$ echo 'Hello,world' > data2
$ echo -e 'Hello\tworld' > data3
:) SELECT * FROM file('data1') -- we didn't say it is JSON, but it works!
┌─Hello─┐
│ world │
└───────┘
:) SELECT * FROM file('data2') -- we didn't say it is CSV, but it works!
┌─c1────┬─c2────┐
│ Hello │ world │
└───────┴───────┘
:) SELECT * FROM file('data3') -- we didn't say it is TSV, but it works!
┌─c1────┬─c2────┐
│ Hello │ world │
└───────┴───────┘
Developer: Pavel Kruglov.
SET async_insert = true;
— allows to combine data for insertion from many concurrent clients;
to support highly concurrent, frequent inserts.
24.1: if there are not many inserts,
they will wait up to the timeout (= 200 ms by default);
— In 24.1, infrequent inserts have a higher latency.
Since 24.2: automatically adjusts to the frequency of inserts;
— In 24.2, you can enable async_insert and stop worrying.
Developer: Julia Kartseva.
Grafana plugin v4 released
The release of version 4, sees a new philosophy to user interaction where
logs and traces become first class citizens in a new query builder experience.
Kafka Connector
— support for complex types (Array, Map) and FixedString.
Updates for PowerBI and Metabase.
Enhancements for dbt support and for Node.js driver.
— Hybrid Query Execution;
— A CDC solution for ClickHouse;
— Analyzing Global Internet Speeds;
— Grafana plugin 4.0 - Leveling up SQL Observability;
Video Recordings: https://www.youtube.com/c/ClickHouseDB