1. (55 min) What's new in ClickHouse 23.8.
2. (5 min) Q&A.
ClickHouse August Release.
— 29 new features 🌞
— 19 performance optimizations 🕶️
— 63 bug fixes ⛱️
ClickHouse 23.7: 🤓
:) SELECT arrayMap((x, y) -> x + y, [1, 2, 3], [4, 5, 6])
[5, 7, 9]
ClickHouse 23.8: 😎
:) SELECT [1, 2, 3] + [4, 5, 6]
[5, 7, 9]
Works for + and −.
Developer: Yarik Briukhovetskyi.
:) SELECT (1, 2) || ('Hello', [3, 4]);
(1, 2, 'Hello', [3, 4])
:) SELECT concat((1, 2), ('Hello', [3, 4]));
(1, 2, 'Hello', [3, 4])
Developer: Nikolay Degterinsky.
SELECT hostname(), version(), uptime()
FROM clusterAllReplicas(default, system.one);
SELECT SELECT hostname(), version(), uptime()
FROM clusterAllReplicas(default);
SELECT hostname(), version(), uptime()
FROM clusterAllReplicas();
Developer: Yangkuan Liu.
:) SELECT * FROM hecks
Received exception from server (version 23.8.1):
Code: 60. DB::Exception: Received from localhost:9000. DB::Exception:
Table default.hecks does not exist.
Maybe you meant checks?. (UNKNOWN_TABLE)
:) SELECT * FROM defail.checks
Received exception from server (version 23.8.1):
Code: 81. DB::Exception: Received from localhost:9000. DB::Exception:
Database defail does not exist.
Maybe you meant default?. (UNKNOWN_DATABASE)
Developer: Yarik Briukhovetskyi.
Drop all tables in a database, but keep the database.
Developer: Bharat Nallan.
:) SELECT * FROM azureBlobStorageCluster(my_cluster, ...);
Process or import data from Azure using the resources of all servers in ClickHouse cluster.
Developer: Smita Kulkarni.
:) SELECT count() FROM file('hits_*.parquet')
23.7: 0.118 sec.
23.8: 0.022 sec.
5 times faster!
Works for most of data formats: Parquet, ORC, TSV, CSV, JSON...
Developer: Pavel Kruglov.
SELECT count(), uniq(_file) FROM
url('https://clickhouse-public-datasets.s3.amazonaws.com/
hits_compatible/athena_partitioned/hits_{0..99}.parquet')
WHERE _file LIKE '%9%'
23.8: Elapsed: 0.068 sec. Processed 19.00 million rows
(279.93 million rows/s.)
23.7: It didn't work at all.
Developer: Pavel Kruglov.
SELECT URL, count() FROM file('hits_*.parquet')
WHERE CounterID = 155568 -- this utilizes min/max index for granules
GROUP BY 1 ORDER BY 2 DESC LIMIT 5
23.7: Elapsed: 0.722 sec. Processed 99.89 million rows, 14.71 GB
(138.37 million rows/s., 20.37 GB/s.)
23.8: Elapsed: 0.111 sec. Processed 3.52 million rows, 386.20 MB
(31.71 million rows/s., 3.48 GB/s.)
Developer: Michael Kolupaev.
Developer: Maksim Kita.
ClickHouse already supports reading and writing compressed files:
— zstd, lz4, snappy, gz, xz, bz2...
Now it can directly read multiple files inside archives:
— zip, 7z, and all kinds of tarballs.
SELECT * FROM file('path/to/archive.zip :: path/inside/archive.csv')
Demo
Developer: Nikita Keba, Antonio Andelic, Pavel Kruglov.
With the S3Queue table engine:
CREATE TABLE queue ENGINE = S3Queue('https://clickhouse-public-datasets.s3.amazonaws.com/
hits_compatible/athena_partitioned/hits_*.parquet') SETTINGS mode = 'unordered';
CREATE TABLE data
ENGINE = MergeTree ORDER BY ()
EMPTY AS SELECT * FROM queue;
CREATE MATERIALIZED VIEW consumer TO data AS SELECT * FROM queue;
— Consumes data from a bunch of files on S3.
— Consumes new files as soon as they appear.
— Supports parallel and distributed consumption.
Developer: Sergei Katkovskiy, Kseniia Sumarokova.
ClickHouse builds are available for
— x86_64,
— Aarch64,
— RISC-V 64,
— PowerPC 64LE,
— and now s390x.
Developer: Yakov Olkhovskiy.
Makes your S3 bucket queryable with ClickHouse.
# your S3 bucket
https://bucket.s3.eu-central-1.amazonaws.com/data.csv
# queryable datasets
curl -d "SELECT * FROM table" \
--aws-sigv4 "aws:amz:eu-central-1:lambda" \
--user 'AWS_ACCESS_KEY:AWS_SECRET_KEY' \
https://abcdef.lambda-url.eu-central-1.on.aws/bucket/data.csv
Similar to S3 Select but with all ClickHouse power!
https://github.com/aws-samples/aws-lambda-clickhouse
Developer: Roman Boiko (AWS).
Example: https://play.clickhouse.com/dashboard
User: play.
A new table engine to be used by default in ClickHouse Cloud,
a replacement for ReplicatedMergeTree.
Advantages:
— Better scalability;
— Better performance of INSERTs;
— Cost savings;
— Faster server startup;
— Less operational complexity;
It is only available in ClickHouse Cloud and partner cloud providers.
ClickHouse is a polyglot database :)
It has support for HTTP (RESTful), Native TCP,
GRPC, MySQL, and PostgreSQL protocols
as well as ODBC and JDBC.
Since recently we support MySQL protocol in ClickHouse Cloud!
It is enabled by request.
What is next? Arrow Flight? ADBC? MongoDB protocol? Redis interface?
— HuggingFace hugs ClickHouse;
— Klaviyo: how we learned to stop worrying
and love OLAP databases;
— Data Engineering and Infrastructure
at Instacart;
— Asynchronous Inserts;
— MessageBird and Ongage's experience
with ClickHouse;
Video Recordings: https://www.youtube.com/c/ClickHouseDB