1. (45 min) What's new in ClickHouse 24.7.
2. (10 min) Guest Talks.
2. (5 min) Q&A.
ClickHouse Summer Release.
— 18 new features 🏖️
— 12 performance optimizations 🍉
— 76 bug fixes 🍦
changeYear, changeMonth, changeDay,
changeHour, changeMinute, changeSecond
:) SELECT changeMonth('2024-06-30'::Date, 7)
2024-07-30
:) SELECT changeHour('2024-06-30 01:02:03'::DateTime, 7)
2024-06-30 07:02:03
:) SELECT changeMonth('2024-07-31'::Date, 6)
2024-07-01
Developer: Maksim Galkin.
:) SELECT groupConcat(number) FROM numbers(10)
0123456789
:) SELECT groupConcat(', ')(number) FROM numbers(10)
0, 1, 2, 3, 4, 5, 6, 7, 8, 9
:) SELECT arrayStringConcat(groupArray(number), ', ') FROM numbers(10)
0, 1, 2, 3, 4, 5, 6, 7, 8, 9
Developer: Yarik Briukhovetskyi.
These tables already existed in previous versions:
Total number of any error messages since server startup:
SELECT value FROM system.events WHERE name = 'LogError'
Total number of errors by their type since server startup:
SELECT * FROM system.errors
The number of any error messages by time, stored persistently:
SELECT event_time, ProfileEvent_LogError
FROM system.metric_log ORDER BY event_time
All error messages with full details, persistent:
SELECT * FROM system.text_log WHERE level = 'Error'
AND event_date = today() ORDER BY event_time_microseconds
The number of error messages by time, for each type of error, persistent:
SELECT * FROM system.error_log ORDER BY event_time
Developer: Pablo Marcos.
If you did DETACH TABLE, you will find the information there
and you can ATTACH the table back.
Developer: Konstantin Morozov.
A table engine for streaming consumption from Azure.
It is the same as S3Queue but for Azure:
CREATE TABLE queue ENGINE = AzureQueue('AccountName=devstoreaccount1;AccountKey=...;BlobEndpoint=https://account.blob.core.windows.net/endpoint;',
'test_container') SETTINGS mode = 'unordered';
CREATE TABLE data ENGINE = MergeTree ORDER BY () EMPTY AS SELECT * FROM queue;
CREATE MATERIALIZED VIEW consumer TO data AS SELECT * FROM queue;
— Consumes data from a bunch of files on Azure.
— Consumes new files as soon as they appear.
— Supports parallel and distributed consumption.
Developer: Kseniia Sumarokova.
If you specify aliases (AS) for elements of a tuple, it will be a named tuple.
SELECT ('Hello' AS a, 123 AS b) AS x, toTypeName(x)
24.6: Tuple(String, UInt8)
24.7: Tuple(a String, b UInt8)
SELECT ('Hello' AS a, 123 AS b) AS x FORMAT JSONEachRow
24.6: {"x":["Hello",123]}
24.7: {"x":{"a":"Hello","b":123}}
Controlled by the setting enable_named_columns_in_function_tuple.
Developer: Amos Bird.
Since version 24.7,
ClickHouse supports reading of partitioned Delta Lake tables,
where partition columns are not stored inside the data files.
Developer: Kseniia Sumarokova.
Happens when:
— you output data in a text format (TSV, JSON, etc);
— you do type conversion such as toString;
10..20% faster!
Demo
Developer: Raul Marin.
By using runtime statistics of hash table sizes.
SET join_algorithm = 'parallel_hash';
Up to 80% faster!
Demo
Developer: Nikita Taranov.
ASOF JOIN — a JOIN method to find the closest matching record,
e.g. the closest by time, without strict equality*
* ClickHouse is the first SQL DBMS to introduce the ASOF JOIN in Apr 2019.
SELECT count() FROM hits h
ASOF JOIN sessions s
ON h.UserID = s.UserID AND h.EventTime > s.StartTime;
SET join_algorithm = 'full_sorting_merge';
Less memory usage and often faster. Demo.
Developer: Vladimir Cherkasov.
If a query contains ORDER BY corresponding to the table's ORDER BY key,
the "read in order" algorithm is activated
(controlled by the optimize_read_in_order setting).
CREATE TABLE hits (...) ORDER BY CounterID, UserID, EventTime;
SELECT UserID FROM hits WHERE SearchEngineID != 0 ORDER BY CounterID;
In ClickHouse 24.7 this algorithm is up to 10x faster,
if the query uses a high-selectivity filter.
Demo.
Developer: Anton Popov.
Keeper can tell about its availability zone:
$ cat /etc/clickhouse-keeper/config.d/az.yaml
placement:
use_imds: 0 # Or 1 for auto-detection
availability_zone: az2 # Arbitrary string
Server will switch to a closest Keeper node after some time:
$ cat /etc/clickhouse-server/config.d/keeper.yaml
prefer_local_availability_zone: 1
fallback_session_lifetime:
min: 0
max: 1
placement: # Similar configuration for the server
use_imds: 1
Developer: Alexander Tokmakov.
Run SQL commands at server startup.
The server accepts connections only after processing the script.
Good for infrastructure automatization.
$ cat /etc/clickhouse-server/config.d/startup.yaml
startup_scripts:
scripts:
# Run a query:
query: "CREATE ROLE OR REPLACE test_role"
scripts:
# Run a query if another query returned 1:
query: "CREATE TABLE test (id UInt64) ENGINE = TinyLog"
condition: "SELECT 1"
Developer: Artem Brustovetskii.
ClickHouse Keeper is a drop-in replacement for ZooKeeper,
that is faster, uses less memory, more stable, and easier to use.
Traditionally, distributed consensus systems,
such as ZooKeeper, ClickHouse Keeper, Etcd, keep the entire
dataset in RAM, and require enough RAM to accommodate it.
What if you have to keep billions of values in ClickHouse Keeper?
$ cat /etc/clickhouse-keeper/config.d/on-disk.yaml
keeper_server:
experimental_use_rocksdb: 1
With the cost of ~2..3x less performance,
it no longer requires as much RAM as data size.
Developer: Han Fei.
ClickHouse supports idempotent inserts for exactly-once semantics.
For each INSERT it calculates the deduplication token as a hash of the data,
or a user can provide a custom deduplication token (insert_deduplication_token).
How the deduplication works when there is a chain of materialized views, attached to a table?
24.6 and older: it didn't guarantee exactly once insertion into the views.
24.7: it calculates deterministic Merkle Tree of the source token and its transformations... and everything works as expected!
Developer: Sema Checherinda.
A table in ClickHouse can contain "projections" — they represent the same data in a different physical order or an aggregation, to automatically optimize queries by using this data.
CREATE TABLE hits (CounterID UInt32, URL String, ...
PROJECTION totals (SELECT CounterID, count(), uniq(URL) GROUP BY CounterID)
) ENGINE = MergeTree ORDER BY (CounterID, EventTime);
What happens to projections when you apply a mutation, such as DELETE?
24.6 and older: projections remains in an inconsistent state,
and deleted rows are still visible when a query uses projections.
24.7: SETTINGS lightweight_mutation_projection_mode = 'throw'/'drop';
Developer: ShiChao Jin.
Continuous insertion from S3 in ClickPipes.
Support for _path, _size, _timestamp virtual columns.
Kafka connector now supports schema ids inside messages.
Updates for Fivetran and dbt integrations,
as well as the Java, Python, and node.js drivers,
Grafana, and OTEL.
+ Thanks for many fixes to our contributors:
the4thamigo-uk, rjoelnorgren, steffen030, dmitryax, pimeys, pan3793, vrischmann, jmaicher.
— How to replace ZooKeeper with ClickHouse Keeper;
— ClickHouse improves AI at Braintrust;
— ClickHouse Cloud Monitoring with Prometheus;
— User-Facing Dashboard With ClickHouse and Luzmo;
— Rill replaces DuckDB with ClickHouse;