ClickHouse: Release 23.3 Webinar

ClickHouse
Release 23.3

Release 23.3 Webinar

1. (50 min) What's new in ClickHouse 23.3.

2. (10 min) Q&A.

Release 23.3 LTS

ClickHouse Spring Release.

— 22 new features 🌳

— 14 performance optimizations 🛼

— 61 bug fixes 🦋

UNDROP TABLE

Did you know?

When you DROP table in ClickHouse, the table is not instantly removed

— because ClickHouse uses MVCC for atomic DDL operations;
— it allows queries to continue running from a dropped table,
  and creating another table in place of a dropped table;

Just in case, tables are physically removed only after 8 minutes.

Why don't allow restoring dropped tables?

UNDROP TABLE

SELECT * FROM system.dropped_tables
— info about the recently dropped tables;

UNDROP TABLE my_table
— restore the recently dropped table;

Developer: chen (xiedeyantu).

Server Settings Introspection

The settings from the global server config:

SELECT * FROM system.server_settings

Demo

Developer: Artem Brustovetskii.

Nested Dynamic Disks

ATTACH TABLE uk_price_paid UUID 'cf712b4f-2ca8-435c-ac23-c4393efe52f7' ( price UInt32, date Date, postcode1 LowCardinality(String), postcode2 LowCardinality(String), type Enum8('other' = 0, 'terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4), is_new UInt8, duration Enum8('unknown' = 0, 'freehold' = 1, 'leasehold' = 2), addr1 String, addr2 String, street LowCardinality(String), locality LowCardinality(String), town LowCardinality(String), district LowCardinality(String), county LowCardinality(String) ) ENGINE = MergeTree ORDER BY (postcode1, postcode2, addr1, addr2) SETTINGS disk = disk( type = cache, path = '/home/milovidov/.cache/clickhouse-local/', max_size = '10G', disk = disk( type = web, endpoint = 'https://raw.githubusercontent.com/ClickHouse/web-tables-demo/main/web/'));

Developer: Kseniia Sumarokova.

MySQL Compatibility

ClickHouse has support for MySQL wire protocol.

mysql --port 9004 --host 127.0.0.1 --user default

And for PostgreSQL wire protocol as well.

ClickHouse's SQL dialect is different.

The set of SQL functions is also different.

MySQL Compatibility

SELECT INSTR('ClickHouse', 'lick'); SELECT 'ClickHouse' REGEXP 'M?ouse$'; SELECT STR_TO_DATE('03/03/2023', '%m/%d/%Y');

This is compatible with MySQL, but ugly.

Developer: Robert Schulze, Flynn.

MySQL Compatibility

SELECT positionCaseInsensitive('ClickHouse', 'lick'); SELECT match('ClickHouse', 'M?ouse$'); SELECT toDateTime('2023-03-30 01:02:03'); SELECT parseDateTime('03/03/2023', '%m/%d/%Y');

This is better.

parseDateTime

parseDateTime
parseDateTimeOrZero
parseDateTimeOrNull

SELECT parseDateTime('2023/03/30 01:02', '%Y/%m/%d %H:%M')

parseDateTimeInJodaSyntax
parseDateTimeInJodaSyntaxOrZero
parseDateTimeInJodaSyntaxOrNull

SELECT parseDateTimeInJodaSyntax('2023/03/30 01:02', 'yyyy/MM/dd HH:mm')

Developer: Robert Shulze.

Parquet/ORC/Arrow Improvements

Control over compression method:

SET output_format_parquet_compression_method = 'snappy|lz4|brotli|zstd|gzip|none';

Control over format version:

SET output_format_parquet_version = '2.6'

Supported versions are: 1.0, 2.4, 2.6, 2.latest.

Developer: Pavel Kruglov.

SSL Certificate Authentication In The Native Protocol

config.yaml for clickhouse-client:

openSSL: client: certificateFile: 'path.cert' privateKeyFile: 'path.key'

Developer: Nikolai Degterinskiy.

Next steps: authentication with SSH keys?

Lightweight Deletes

Lightweight DELETEs are production ready
and enabled by default in 23.3 LTS.

DELETE FROM t WHERE ...

Developer: Alexander Gololobov.

Next steps: lightweight updates?

Performance Optimizations

Compressed Marks In Memory

Marks take 3..6 times less space in RAM!

Enabled by default.

Demo.

Developer: Michael Kolupaev.

Optimizations For Backups

Full 100 TB and 1 000 000 files backup in one hour.

Incremental 100 TB backups in 20 minutes.

— server-size copies for S3;
— dedicated thread pool;
— increased parallelism;
— faster metadata writing;
— retries of Keeper operations;

Developers: Vitaly Baranov, Nikita Mikhailov, Alexey Milovidov.

Optimizations For Backups

Parallel Replicas

Without parallel replicas:

SELECT count() FROM github_events WHERE body ILIKE '%ClickHouse%'; Elapsed: 458.332 sec. 1.18 TB (2.58 GB/s.)

With 10 parallel replicas:

SET allow_experimental_parallel_reading_from_replicas = 1, max_parallel_replicas = 10; SELECT count() FROM github_events WHERE body ILIKE '%ClickHouse%'; Elapsed: 40.284 sec. 1.18 TB (29.34 GB/s.)

With 100 parallel replicas:

Elapsed: 8.790 sec. 1.18 TB (134.44 GB/s.)

Developer: Nikita Mikhailov.

Parallel Replicas With Dynamic Shards

SET max_parallel_replicas = 100, parallel_replicas_custom_key = 'UserID', parallel_replicas_custom_key_filter_type = 'default'; -- parallel_replicas_custom_key_filter_type = 'range'

Represents every replica as a shard for distributed queries
by pre-filtering data by remainder of division or a range.

Useful for large JOIN or GROUP BY with
the distributed_group_by_no_merge setting.

Developer: Antonio Andelic.

Something Unusual

Partial Result On Query Cancellation

SET partial_result_on_first_cancel = 1;

After query cancellation: stop reading the data
but get the query result, as all the data is processed.

Demo

Developer: Alexey Perevyshin.

Temporary Tables With an Engine

CREATE TEMPORARY TABLE tmp_uk_price_paid ( ... ) ENGINE = MergeTree ORDER BY (postcode1, postcode2, addr1, addr2)

What for?

Developer: Roman Vasin.

Integrations

Superset:

— native support for ClickHouse;
— ships ClickHouse support out-of-the-box!

Grafana:

— support for traces visualization;

Updates for Python, Node.js, Go, and Java drivers.

Integrations

And if Grafana is not enough for you, you can use QRYN:
— polyglot monitoring and observability (qryn.dev).

ClickHouse Cloud

— free 30-day trial with $300 credits up to 10 TB of data;

— affordable clusters for developers with full HA < $100/month

Try it! https://clickhouse.cloud/.

Q&A