ClickHouse: Release 22.6 Webinar

ClickHouse
Release 22.6

Release 22.6 Webinar

1. (45 min) What's new in ClickHouse 22.6.

2. (15 min) Q&A.

🌤

Release 22.6

ClickHouse "summer" release.

— 19 new features

— 27 performance optimizations

GROUPING Function

Used with ROLLUP, CUBE or GROUPING SETS.

To distinguish different sets.

SELECT k, GROUPING(k) FROM table GROUP BY k WITH ROLLUP

(demo)

Developer: Dmitriy Novik.

New JSON Formats

— JSONColumns
— JSONCompactColumns
— JSONColumnsWithMetadata

(demo)

Let's write some JavaScript... https://github.com/leeoniya/uPlot/

async function chart() { const opts = { width: 800, height: 400, series: [ { label: "time" }, { label: "pageviews", stroke: "#8DF" } ] }; const response = await fetch( "https://play.clickhouse.com/?user=play", { method: "POST", body: ` SELECT toStartOfDay(time, 'Europe/Amsterdam')::INT AS t, sum(hits)::INT AS v FROM wikistat WHERE path = 'ClickHouse' GROUP BY t ORDER BY t FORMAT JSONCompactColumns` }); const data = await response.json(); let uplot = new uPlot(opts, data, document.getElementById('chart')); }

New JSON Formats

Query Tracing With OpenTelemetry

Did you know?

— ClickHouse has support for Open Telemetry

— data is collected in the system.opentelemetry_span_log table

— easy to visualize with Jaeger or Zipkin

— detailed instrumentation since 22.6!

The problem: it is too detailed...

A query can generate ~ 1 mln spans.

Query Tracing With OpenTelemetry

utils/trace_visualizer, developer: Sergei Trifonov.

Query Tracing With OpenTelemetry

utils/trace_visualizer, developer: Sergei Trifonov.

FPC Codec

FPC — a codec for floating point compression.

"High Throughput Compression of Double-Precision Floating-Point Data"
— Martin Burtscher and Paruj Ratanaworabhan,
School of Electrical and Computer Engineering,
Cornell University, Ithaca, NY

Developer: Mikhail Guzov.

(demo)

Non-Constant LIKE and match

SELECT DISTINCT repo_name, title
FROM github_events
WHERE title ILIKE (
  repo_name LIKE '%ClickHouse%' ? '%fast%' : '%slow%')
AND repo_name IN ('ClickHouse/ClickHouse', 'elastic/elasticsearch')

Now I can put LIKE inside LIKE and looks like you're going to like it.

Developer: Robert Schulze.

INSERT INTO ZooKeeper

INSERT INTO system.zookeeper (name, path, value)
VALUES ...

Creates new persistent nodes in ZooKeeper.

Batch INSERTs will create multiple nodes in transaction.

Please don't mess with ZooKeeper and don't use this feature.

Developer: Han Fei.

Parametric Executable UDF

SELECT my_function('hello')(x)

Parameters will be substituted to the command template:

./myprogram --xyz {param:String}
./myprogram --xyz 'hello'

Note: only allowed for direct "exec" without a shell interpreter.
Because the interpretation of escape sequences inside string literals
in shell interpeter is different than in SQL.

Developer: Maksim Kita.

clickhouse-disks Tool

clickhouse-disks \
  --config-file ... \
  --disk ... \
  list|read|write|remove|link|copy path

A tool for introspection and operations on ClickHouse VFS

Can manipulate files on object storage (S3, HDFS, Azure, Web)

Developer: Artyom Yurkov.

Performance Improvements

A ton of optimizations for AArch64 with ARM NEON!

(Apple M1, AWS Graviton...)

Almost all low-level vector instructions, LZ4 and ZSTD optimizations,
UTF-8 handling, string functions and sorting, etc, etc...

Now, AArch64 is a first-class citizen of ClickHouse!

Developer: Daniel Kutenin (Google), Alexandra Pilipyuk.

Performance Improvements

By the way... AArch64 builds got proper stack traces.

(exceptions, system.stack_trace, query profiler)

Developer: Maksim Kita.

Performance Improvements

Optimize COALESCE.

Optimize multiIf with single branch.

Optimize functions L1/L2/Lp/Linf Distances and Norms.

Optimize number comparison.

Optimize hasAll with AVX-512.

Optimize greatCircleAngle, greatCircleDistance, geoDistance.

Developer: Maksim Kita, Anton Popov, Alexander Gololobov.

Performance Performance

Optimize ORDER BY with LIMIT.

Optimize ORDER BY with single column.

Optimize INSERT into MergeTree with composite ORDER key.

Optimize dictGetChildren, dictGetDescendants.

Optimize cleanup stage of queries with large GROUP BY.

Optimize background CPU usage of large number of tables.

Developer: Maksim Kita, Nikita Taranov.

Improvements Improvements

Cold queries from MergeTree will start faster.
— developer: Anton Kozlov, CloudFlare.

Avoid VFS cache thrashing.
— developer: Han Shukai, ICT Beijing.

Move cases for condition pushdown into JOIN.
— developer: Amos Bird, ICT Beijing.

Filtering reads from s3/hdfs/etc with _file and _path.
— developer: Amos Bird, ICT Beijing.

Less copying in compression loop.
— developer: Jasper Zhu, Intel.

Faster INSERT

CREATE TABLE hits_insert_test (UserID UInt64, EvenTime DateTime, WatchID UInt64) ENGINE = MergeTree ORDER BY (UserID, EventTime); INSERT INTO hits_insert_test SELECT UserID, EventTime, WatchID FROM hits_100m_obfuscated;

Before:
0 rows in set. Elapsed: 10.945 sec. Processed 100.00 million rows, 2.00 GB (9.14 million rows/s., 182.73 MB/s.)

After:
0 rows in set. Elapsed: 7.562 sec. Processed 100.00 million rows, 2.00 GB (13.22 million rows/s., 264.48 MB/s.)

Integrations

ClickHouse Go Driver now supported by ClickHouse, Inc.

— added support for JSON data type;

— added support for insertion of structs and maps;

On the way: merging clickhouse-go and go-faster/ch
will add a low-level API with performance boost.

Developer: Dale McDiarmid, Anton Popov.

Integrations

Official integration with dbt, passed the test suite, better perf...
— developer: Dmitriy Sokolov + Guy Kohen, Dale McDiarmid.

Official integration with Airbyte, improved for dbt :)
— developer: Mark Zitnik.

What's Next

Extended Federated Queries:

— querying Redis, MongoDB, Cassandra
and external ZooKeeper clusters with table functions;

What's Next

Roadmap 2022 is published:

https://github.com/ClickHouse/ClickHouse/issues/32513

— open for discussions.

... all great stuff is discussed there, don't miss.

Q&A