ClickHouse: Release 23.1 Webinar

ClickHouse
Release 23.1

Release 23.1 Webinar

1. (50 min) What's new in ClickHouse 23.1.

2. (10 min) Q&A.

Release 23.1

ClickHouse New Year release.

— 17 new features

— 17 performance optimizations

— 78 bug fixes

SQL Language Features

ARRAY JOIN with Map

:) SELECT * FROM VALUES('map Map(String, UInt8)', ('{\'Hello\':1,\'World\':2}')) ┌─map───────────────────┐ │ {'Hello':1,'World':2} │ └───────────────────────┘ :) SELECT map FROM VALUES('map Map(String, UInt8)', ('{\'Hello\':1,\'World\':2}')) ARRAY JOIN map ┌─map─────────┐ │ ('Hello',1) │ │ ('World',2) │ └─────────────┘ :) SELECT untuple(arrayJoin(map)) FROM VALUES('map Map(String, UInt8)', ('{\'Hello\':1,\'World\':2}')) ┌─tupleElement(arrayJoin(map), 1)─┬─tupleElement(arrayJoin(map), 2)─┐ │ Hello │ 1 │ │ World │ 2 │ └─────────────────────────────────┴─────────────────────────────────┘

Developer: TaiYang Li.

`age` function

:) SELECT dateDiff('month', '2022-11-30'::Date, '2023-01-25'::Date) AS diff ┌─diff─┐ │ 2 │ └──────┘ :) SELECT age('month', '2022-11-30'::Date, '2023-01-25'::Date) AS diff ┌─diff─┐ │ 1 │ └──────┘

Developer: Roman Vasin.

SQL Standard Hex And Binary Literals

:) SELECT 0xBAADC0DE, 0b00101010 ┌─3131949278─┬─42─┐ │ 3131949278 │ 42 │ └────────────┴────┘ :) SELECT x'436C69636B486F757365', b'00101010' ┌─'ClickHouse'─┬─'*'─┐ │ ClickHouse │ * │ └──────────────┴─────┘ :) SELECT unhex('436C69636B486F757365') AS x, unbin('00101010') AS y ┌─x──────────┬─y─┐ │ ClickHouse │ * │ └────────────┴───┘

Developer: Mo Xuan.

Extended Formats For Streaming

Kafka, RabbitMQ, NATS and FileLog now support all data formats
for INSERTs, including block based formats such as Native and Parquet.

Allow to control the block size for insertion into Kafka and other queues.

— for block based formats it produces a block per message;

— for row-based formats it produces multiple formatted rows per message.

Developer: Pavel Kruglov.

Autodetect Headers Of TSV/CSV

No need to choose between CSV, CSVWithNames, CSVWithNamesAndTypes
for data import.

Just write CSV, and it will find the headers if there are any.

Everything detected automatically whenever possible!

Developer: Pavel Kruglov.

Performance Improvements

Reading Structs From Parquet/ORC

SELECT sum(event.time) FROM demo ┌─────sum(event.time)─┐ │ 8868419195034535628 │ └─────────────────────┘ Before: Elapsed: 7.810 sec. Processed 5.31 million rows, 206.93 MB (679.37 thousand rows/s., 26.50 MB/s.) After: Elapsed: 0.872 sec. Processed 5.31 million rows, 206.93 MB (6.09 million rows/s., 237.34 MB/s.)

Developer: Lgbo-ustc.

Reading From S3

How long does it take to read 7000 files?

SELECT count(), uniq(_file) FROM s3('https://s3.eu-west-1.amazonaws.com/.../*', 'CSV') ┌─count()─┬─uniq(_file)─┐ │ 7343 │ 7343 │ └─────────┴─────────────┘ Before: Median timing: 12.682 sec. After: Median timing: 3.291 sec.

Developer: Anton Popov.

Data Parts Loading

What if the server is starting up very slowly
due to a large number of data parts?

Sometimes there is a large number of inactive data parts,
to be deleted immediately after loading.

We don't load them anymore!

Developer: Anton Popov.

What's next? Lazy loading of data parts? Lazy loading of tables?

MMap For File Engine

Demo

Developer: Artem Brustovetskii.

Something Interesting

Parameterized Views

CREATE VIEW wiki AS SELECT toStartOfMonth(time), sum(hits) AS h, bar(h, 0, max(h) OVER (), 100) FROM wikistat WHERE path = {page:String} GROUP BY 1 ORDER BY 1; SELECT * FROM wiki(page = 'ClickHouse');

Demo

Developer: Smita Kulkarni.

Regexp Tree Dictionaries

CREATE DICTIONARY user_agent ( regexp String, name String, version UInt16 ) PRIMARY KEY(regexp) SOURCE(YAMLRegExpTree(PATH '/.../regexp_tree.yaml')) LAYOUT(regexp_tree)

Developers: Vage Ogannisian, Han Fei.

Regexp Tree Dictionaries

- regexp: 'Linux/(\d+[\.\d]*).+tlinux' name: 'TencentOS' version: '\1' - regexp: '\d+/tclwebkit(?:\d+[\.\d]*)' name: 'Andriod' versions: - regexp: '33/tclwebkit' version: 13 - regexp: '3[12]/tclwebkit' version: 12 - regexp: '30/tclwebkit' version: 11 - regexp: '29/tclwebkit' version: 10

Regexp Tree Dictionaries

SELECT dictGet('user_agent', ('name', 'version'), UserAgent);

— Traverses the tree, and determines the values of the attributes.

— The tree can be arbitrarily deep,
  and each nested level can override the values.

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36

— All regular expressions are checked in a single pass for performance!

— Can be loaded from YAML file or from a table in any source.

Query Result Cache

$ cat /etc/clickhouse-server/config.d/query_cache.yaml query_result_cache: size: 1073741824 max_entries: 1024 max_entry_size: 104857600 max_entry_records: 30000000

SET enable_experimental_query_result_cache = 1;

Developers: Mikhail Stetsyuk, Robert Schulze.

Query Result Cache

Allows to control on per-query basis:

— min query runs to cache the result;
— min query runtime to cache the result;
— max result size to put into cache;
— max staleness to use the cached entry;
— passive usage of the existing entries in cache;
— caching of queries with non-deterministic functions;
— sharing the cache between users;

Developers: Mikhail Stetsyuk, Robert Schulze.

Next steps: compressed cache; on disk cache; cache of intermediate data.

Inverted Full Text Indices

ALTER TABLE hackernews_indexed ADD INDEX inv_idx(text) TYPE inverted; ALTER TABLE hackernews_indexed MATERIALIZE INDEX inv_idx;

Supports tokens and n-grams.

Optimizes hasToken, multiSearchAny, equality comparison...

Developer: Larry Luo, Harry Lee.

Inverted Full Text Indices

:) SELECT count() FROM hackernews WHERE hasToken(text, 'ClickHouse') ┌─count()─┐ │ 948 │ └─────────┘ 1 row in set. Elapsed: 0.579 sec. Processed 33.95 million rows, 11.61 GB (58.63 million rows/s., 20.05 GB/s.) :) SELECT count() FROM hackernews_indexed WHERE hasToken(text, 'ClickHouse') ┌─count()─┐ │ 948 │ └─────────┘ 1 row in set. Elapsed: 0.206 sec. Processed 3.71 million rows, 1.32 GB (18.00 million rows/s., 6.43 GB/s.)

Developer: Larry Luo.

Integrations

Integrations

Metabase:

— now official in the ClickHouse organization;
— changed the license from AGPL to Apache 2.0;

Java:

— added R2DBC support in addition to JDBC;
— support for file upload/download;

Python:

— server-side parameters binding;
— support for file insert;
— performance improvement with streaming;

DBT: multithreading, CTE, incremental strategy;

Apache Beam: update

ClickHouse Cloud

— free 30-day trial up to 10 TB of data;

— affordable clusters for developers with full HA < $100/month

Try it! https://clickhouse.cloud/.

Q&A