ClickHouse: Release 22.7 Webinar

ClickHouse
Release 22.7

Release 22.7 Webinar

1. (45 min) What's new in ClickHouse 22.7.

2. (15 min) Q&A.

🌀

Release 22.7

ClickHouse "mid-summer" release.

β€” 24 new features

β€” 19 performance optimizations

Window Functions Inside Expressions

SELECT y::String || '.' || (y < toYear(today()) - 2000 - 1 ? '*' : m::String) AS Version, (n <= 3 OR (is_lts AND lts_n <= 2)) ? 'βœ”οΈ' : '❌' AS Supported FROM ( SELECT y, m, count() OVER (ORDER BY y DESC, m DESC) AS n, m IN (3, 8) AS is_lts, countIf(is_lts) OVER (ORDER BY y DESC, m DESC) AS lts_n FROM ( WITH extractGroups(version, 'v(\d+)\.(\d+)') AS v, v[1]::INT AS y, v[2]::INT AS m SELECT y, m FROM file('version_date.tsv', TSV, 'version String, date String') ORDER BY y DESC, m DESC LIMIT 1 BY y, m) ) LIMIT 1 BY Version FORMAT Markdown

Window Functions Inside Expressions

WITH count() OVER (ORDER BY y DESC, m DESC) AS n, m IN (3, 8) AS is_lts, countIf(is_lts) OVER (ORDER BY y DESC, m DESC) AS lts_n SELECT y::String || '.' || (y < toYear(today()) - 2000 - 1 ? '*' : m::String) AS Version, (n <= 3 OR (is_lts AND lts_n <= 2)) ? 'βœ”οΈ' : '❌' AS Supported FROM ( WITH extractGroups(version, 'v(\d+)\.(\d+)') AS v, v[1]::INT AS y, v[2]::INT AS m SELECT y, m FROM file('version_date.tsv', TSV, 'version String, date String') ORDER BY y DESC, m DESC LIMIT 1 BY y, m) LIMIT 1 BY Version FORMAT Markdown

Developer: Dmitriy Novik.

SQLInsert Format

SELECT * FROM numbers(10) FORMAT SQLInsert INSERT INTO table (`number`) VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9) ;

Controlled by the settings:
— output_format_sql_insert_max_batch_size;
— output_format_sql_insert_table_name;
— output_format_sql_insert_include_column_names;
— output_format_sql_insert_use_replace;
— output_format_sql_insert_quote_names;

Developer: Pavel Kruglov.

MongoDB table function

db.hits.aggregate( [ { $group: { _id: '$SearchPhrase' } }, { $count: 'value' }, { $project: { key: null, value: 1, } } ], {allowDiskUse: true});

SELECT COUNT(DISTINCT SearchPhrase) FROM mongodb('localhost:27017', 'test', 'hits', 'root', '', structure = 'SearchPhrase String');

Developer: Anastasia Petrenko, Ksenia Sumarokova.

MongoDB table function

MongoDB query — 7 min 47 sec

ClickHouse over MongoDB query — 11 min 56 sec :(

MongoDB table function

MongoDB query — 7 min 47 sec

ClickHouse over MongoDB query — 11 min 56 sec :(

ClickHouse query — 10 ms

MongoDB table function

Bonus:

INSERT INTO FUNCTION mongodb(...) CREATE TABLE ... ENGINE = MongoDB(...)

Developer: Anastasia Petrenko, Ksenia Sumarokova.

MeiliSearch integration

Create a view on top of MeiliSearch index:

CREATE TABLE t (id UInt64, data String) ENGINE = MeiliSearch('http://meili1:7700', 't', '')

Insert data for indexing:

INSERT INTO t ...

Query the data with special search conditions:

SELECT * FROM t WHERE meiliMatch($$ "q" = "abaca" $$)

Developer: Mikhail Artemenko, Ksenia Sumarokova.

MeiliSearch integration

You Know, for Search!

integration with MeiliSearch;

— simple integration with Quickwit: https://clickhouse.com/docs/en/guides/developer/full-text-search/

ElasticSearch? Sphinx? ManticoreSearch?
Full-text indices inside ClickHouse?

Streaming import from NATS

Streaming table engines in ClickHouse:

— Kafka
  (works with RedPanda as well);
— RabbitMQ;
— FileLog;
...
NATS;
S3Queue;

Developer: Pavel Chebarykov, Ksenia Sumarokova.

What else should we add?
SQS? Kinesis? Pulsar? MySQL binlog?

Or maybe add persistent queues inside ClickHouse?

New JOIN algorithms

— "direct" algorithm:
  to join with key-value tables by direct lookups a.k.a. nested loops.
  Good if the left table is small, but the right table is like a large dictionary.
  Good to use in MATERIALIZED VIEW queries.

— "parallel_hash" algorithm:
  speed-up if the right hand table is large.

— "full_sorting_merge" algorithm:
  when right hand side is large
  and does not fit in memory and does not allow lookups.

— "grace_hash" algorithm:
  expected in 22.8.

Developer: Vladimir Cherkasov, lgbo-ustc.

Out Of Band Filters

SET additional_result_filter = $$ name = 'Vasya' $$; SELECT * FROM values('name String', ('Vasya'), ('Petya')) β”Œβ”€name──┐ β”‚ Vasya β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”˜

SET additional_table_filters = {'t': $$ name != 'Petya' $$}; CREATE TEMPORARY TABLE t (name String); INSERT INTO t VALUES ('Vasya'), ('Petya'); SELECT * FROM t; β”Œβ”€name──┐ β”‚ Vasya β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”˜

Developer: Nikolai Kochetov.

Performance Optimizations

Improvement of ORDER BY, insert and merge in MergeTree,
and window functions.

SELECT WatchID FROM hits_100m_obfuscated ORDER BY Age

Before:
Elapsed: 4.154 sec. (24.07 million rows/s., 216.64 MB/s.)

After:
Elapsed: 0.482 sec. (207.47 million rows/s., 1.87 GB/s.)

Developer: Maksim Kita.

Performance Optimizations

Speed-up of SELECT with FINAL modifier.

It "simply" improves performance up to 4 times.

Especially for complex transforms like Collapsing and Replacing.

Developer: Nikita Taranov.

Small & Neat Things

CREATE TABLE ... EMPTY AS SELECT ...

SELECT base58Encode('Hello, world'), base58Decode('2NEpo7TZsLFA2wMeK')

SELECT translate('Hello, world', ', ', ';\n'), translateUTF8('test', 'tes', 'ВСс')

SELECT parseTimeDelta('1h 35min 23sec')

SET group_by_use_nulls = 1

multiMatch, multiSearch, ... — support for non-constant needles.

Compatibility Setting

SET compatibility = '21.8'

Allows you to upgrade to the latest version
but keep all the default behavior unchanged, as in the specified version.

... it is not going to cover 100% of possible changes.
But should give something.

Please don't use this feature!

Developer: Pavel Kruglov.

Drivers and Integrations

Go driver has HTTP support in addition to Native protocol.
Developer: Artem Iurin.

Go driver now uses high-performant low-level "ch-go" implementation.
Developer: Aleksandr Razumov, Dale McDiarmid.

Grafana plugin now has support for JSON data type.
Developer: Dale McDiarmid.

Official ClickHouse support in Superset.
Developer: Dale McDiarmid.

clickhouse-diagnostics tool now packaged with ClickHouse.
Developer: Dale McDiarmid, Mikhail Shiryaev.

Updated Benchmark

New ClickHouse Website

TLDR

This is just a fraction of what we've made in a month.

More interesting stuff is being prepared!

Q&A