ClickHouse: Release 23.8 Webinar

ClickHouse
Release 23.8 LTS

Release 23.8 Webinar

1. (55 min) What's new in ClickHouse 23.8.

2. (5 min) Q&A.

Release 23.8 LTS

ClickHouse August Release.

— 29 new features 🌞

— 19 performance optimizations 🕶️

— 63 bug fixes ⛱️

Uninteresting Features

Arithmetic On Vectors

ClickHouse 23.7: 🤓

:) SELECT arrayMap((x, y) -> x + y, [1, 2, 3], [4, 5, 6]) [5, 7, 9]

ClickHouse 23.8: 😎

:) SELECT [1, 2, 3] + [4, 5, 6] [5, 7, 9]

Works for + and .

Developer: Yarik Briukhovetskyi.

Concatenation Of Tuples

:) SELECT (1, 2) || ('Hello', [3, 4]); (1, 2, 'Hello', [3, 4]) :) SELECT concat((1, 2), ('Hello', [3, 4])); (1, 2, 'Hello', [3, 4])

Developer: Nikolay Degterinsky.

Default Arguments For cluster/clusterAllReplicas

ClickHouse 23.7: 🥱

SELECT hostname(), version(), uptime() FROM clusterAllReplicas(default, system.one);

ClickHouse 23.8: 😎

SELECT SELECT hostname(), version(), uptime() FROM clusterAllReplicas(default); SELECT hostname(), version(), uptime() FROM clusterAllReplicas();

Developer: Yangkuan Liu.

Hints For Database And Table Names

:) SELECT * FROM hecks Received exception from server (version 23.8.1): Code: 60. DB::Exception: Received from localhost:9000. DB::Exception: Table default.hecks does not exist. Maybe you meant checks?. (UNKNOWN_TABLE)

:) SELECT * FROM defail.checks Received exception from server (version 23.8.1): Code: 81. DB::Exception: Received from localhost:9000. DB::Exception: Database defail does not exist. Maybe you meant default?. (UNKNOWN_DATABASE)

Developer: Yarik Briukhovetskyi.

TRUNCATE DATABASE

Drop all tables in a database, but keep the database.

Developer: Bharat Nallan.

azureBlobStorageCluster

:) SELECT * FROM azureBlobStorageCluster(my_cluster, ...);

Process or import data from Azure using the resources of all servers in ClickHouse cluster.

Developer: Smita Kulkarni.

Performance Optimizations

Counting From Metadata

:) SELECT count() FROM file('hits_*.parquet')

23.7: 0.118 sec.

23.8: 0.022 sec.

5 times faster!

Works for most of data formats: Parquet, ORC, TSV, CSV, JSON...

Developer: Pavel Kruglov.

Skipping Reading Files

SELECT count(), uniq(_file) FROM url('https://clickhouse-public-datasets.s3.amazonaws.com/ hits_compatible/athena_partitioned/hits_{0..99}.parquet') WHERE _file LIKE '%9%'

23.8: Elapsed: 0.068 sec. Processed 19.00 million rows
(279.93 million rows/s.)

23.7: It didn't work at all.

Developer: Pavel Kruglov.

Data Skipping In Files

SELECT URL, count() FROM file('hits_*.parquet') WHERE CounterID = 155568 -- this utilizes min/max index for granules GROUP BY 1 ORDER BY 2 DESC LIMIT 5

23.7: Elapsed: 0.722 sec. Processed 99.89 million rows, 14.71 GB
(138.37 million rows/s., 20.37 GB/s.)

23.8: Elapsed: 0.111 sec. Processed 3.52 million rows, 386.20 MB
(31.71 million rows/s., 3.48 GB/s.)

Developer: Michael Kolupaev.

JIT Compilation For ARM

Developer: Maksim Kita.

Something Interesting

Direct Import From Archives

ClickHouse already supports reading and writing compressed files:
— zstd, lz4, snappy, gz, xz, bz2...

Now it can directly read multiple files inside archives:
zip, 7z, and all kinds of tarballs.

SELECT * FROM file('path/to/archive.zip :: path/inside/archive.csv')

Demo

Developer: Nikita Keba, Antonio Andelic, Pavel Kruglov.

Streaming Consumption From S3

With the S3Queue table engine:

CREATE TABLE queue ENGINE = S3Queue('https://clickhouse-public-datasets.s3.amazonaws.com/ hits_compatible/athena_partitioned/hits_*.parquet') SETTINGS mode = 'unordered'; CREATE TABLE data ENGINE = MergeTree ORDER BY () EMPTY AS SELECT * FROM queue; CREATE MATERIALIZED VIEW consumer TO data AS SELECT * FROM queue;

— Consumes data from a bunch of files on S3.
— Consumes new files as soon as they appear.
— Supports parallel and distributed consumption.

Developer: Sergei Katkovskiy, Kseniia Sumarokova.

Bonus

ClickHouse On Big Iron

ClickHouse builds are available for

x86_64,
Aarch64,
RISC-V 64,
PowerPC 64LE,
— and now s390x.

Developer: Yakov Olkhovskiy.

Serverless ClickHouse On AWS Lambda

Makes your S3 bucket queryable with ClickHouse.

# your S3 bucket https://bucket.s3.eu-central-1.amazonaws.com/data.csv # queryable datasets curl -d "SELECT * FROM table" \ --aws-sigv4 "aws:amz:eu-central-1:lambda" \ --user 'AWS_ACCESS_KEY:AWS_SECRET_KEY' \ https://abcdef.lambda-url.eu-central-1.on.aws/bucket/data.csv

Similar to S3 Select but with all ClickHouse power!

https://github.com/aws-samples/aws-lambda-clickhouse

Developer: Roman Boiko (AWS).

Serverless ClickHouse On AWS Lambda

Advanced Dashboard Just Got Better

Example: https://play.clickhouse.com/dashboard

User: play.

ClickHouse Cloud

SharedMergeTree Table Engine

A new table engine to be used by default in ClickHouse Cloud,
a replacement for ReplicatedMergeTree.

Advantages:

— Better scalability;
— Better performance of INSERTs;
— Cost savings;
— Faster server startup;
— Less operational complexity;

It is only available in ClickHouse Cloud and partner cloud providers.

https://clickhouse.com/blog/clickhouse-cloud-boosts-performance-with-sharedmergetree-and-lightweight-updates

MySQL Protocol

ClickHouse is a polyglot database :)

It has support for HTTP (RESTful), Native TCP,
GRPC, MySQL, and PostgreSQL protocols
as well as ODBC and JDBC.

Since recently we support MySQL protocol in ClickHouse Cloud!

It is enabled by request.

What is next? Arrow Flight? ADBC? MongoDB protocol? Redis interface?

PowerBI Connector

https://github.com/ClickHouse/power-bi-clickhouse

Reading Corner

https://clickhouse.com/blog/

HuggingFace hugs ClickHouse;
Klaviyo: how we learned to stop worrying
  and love OLAP databases;
— Data Engineering and Infrastructure
  at Instacart;
— Asynchronous Inserts;
MessageBird and Ongage's experience
  with ClickHouse;

Video Recordings: https://www.youtube.com/c/ClickHouseDB

Q&A