1. (50 min) What's new in ClickHouse 23.6.
2. (10 min) Q&A.
ClickHouse Summer Release.
β 10 new features π
β 12 performance optimizations πΆοΈ
β 31 bug fixes β±οΈ
SELECT transform(
what,
[from1, from2, ...],
[to1, to2, ...],
default);
SELECT CASE what
WHEN from1 THEN to1
WHEN from2 THEN to2
ELSE default END;
23.5: implemented only for numbers, strings, and date-times;
23.6: implemented for everything;
Note: from and to values are required to be constants even in 23.6.
Developer: Alexey Milovidov.
Version 23.5 introduced the APPEND modifier.
Version 23.6 adds the TRUNCATE modifier.
:) SELECT * FROM numbers(10)
INTO OUTFILE 'numbers.tsv'
Ok.
:) SELECT * FROM numbers(10)
INTO OUTFILE 'numbers.tsv'
Cannot open file numbers.tsv, errno: 17,
strerror: File exists, consider using APPEND or TRUNCATE. -- this is fine
:) SELECT * FROM numbers(10)
INTO OUTFILE 'numbers.tsv' TRUNCATE
Ok.
Developer: Manas Alekar.
New settings: s3_skip_empty_files, hdfs_skip_empty_files,
engine_file_skip_empty_files, engine_url_skip_empty_files.
For what?
— empty CSV, TSV, JSONEachRow, RowBinary, Native
— are valid empty datasets;
— but empty Parquet or ORC files are invalid — we want to skip them.
Q: Do we need a setting to skip any broken files on import?
Developer: Pavel Kruglov.
clickhouse-local --rename_files_after_processing '%f.processed' --query "
INSERT INTO table SELECT * FROM file('*.parquet')"
— to build simple data pipelines, when new files
constantly appear in the directory;
— it can rename by arbitrary pattern and move
into other directory, relative to the path;
— the pattern can include the timestamp of processing;
Demo
Developer: Aleksey Golub.
clickhouse-local --rename_files_after_processing '%f.processed' --query "
INSERT INTO table SELECT * FROM file('*.parquet')"
Feature requests for the next versions:
— extend it to S3 and HDFS;
— better guarantees for INSERT SELECT;
The normal way:
clickhouse-client --host myhost.clickhouse.cloud --secure --password
Alternative way (since 23.6):
clickhouse-client 'clickhouse://user:[email protected]/database?secure'
Developer: Alexey Gerasimchuck.
:) SELECT now()
ββββββββββββββββnow()ββ
β 2023-06-29 13:11:16 β
βββββββββββββββββββββββ
:) SELECT now('America/Los_Angeles')
ββnow('America/Los_Angeles')ββ
β 2023-06-29 04:11:21 β
ββββββββββββββββββββββββββββββ
:) SET session_timezone = 'America/Los_Angeles';
:) SELECT now();
ββββββββββββββββnow()ββ
β 2023-06-29 04:11:29 β
βββββββββββββββββββββββ
Developer: Andrey Zvonov.
CREATE TABLE test_mongo ... ENGINE = MongoDB(
'mongo1:27017', 'test', 'simple_table', 'testuser', 'clickhouse');
SELECT * FROM test_mongo;
SELECT * FROM mongodb(
'mongo1:27017', 'test', 'simple_table', 'testuser', 'clickhouse',
'x String, y UInt64, ...');
Was: perfectly working until MongoDB 5.1.
But: they changed the protocol :(
Now: everything works up to MongoDB 6.
Developer: Nikolai Degterinskiy.
How many is too many?
— parts_to_delay_insert, parts_to_throw_insert;
How to perform backpressure?
— should it slow-down long running inserts?
Example: inserting 400 billion records with 4 threads on a single machine,
7 million records/second, with concurrent SELECT queries for dashboards.
Decision: enable backpressure during insert
and increase the default value of max parts to 3000.
Before 23.6:
— aggressive scheduling of background threads
(cleanup and merge selecting) was leading to
a high CPU usage and excessive requests to Keeper.
Release 23.6:
— introduce backoff for tables without inserts.
Result:
— moderate CPU usage on a server with 5000 ReplicatedMergeTree tables;
three times less the number of requests to Keeper and network traffic!
Developer: Alexander Tokmakov.
If the data is not assumed to be sorted,
but appears to be sorted in certain ranges — detect that and use.
Demo
Developer: Maksim Kita, Alexey Milovidov.
clickhouse-local over partitioned Parquet files improved 2 times π₯ clickhouse-local over a single Parquet file improved 4 times β€οΈβπ₯
Demo.
Developer: Michael Kolupaev. Already included in the release 23.5.
ClickHouse + Redis = β€
CREATE TABLE test_redis (...)
ENGINE = Redis('host:6379', 0, 'password')
PRIMARY KEY k;
SELECT * FROM test_redis;
SELECT * FROM test_redis WHERE k = 'whatever';
SELECT * FROM test_redis WHERE k IN ('a', 'b');
SELECT * FROM redis(
'host:6379', k, 'k String, v String', 0, 'password');
Developer: Jacky Woo
ClickHouse + Redis = β€
Supports SELECT and INSERT,
SELECT by a key, set of keys, or full scan,
JOIN with Redis tables,
ALTER UPDATE and DELETE!
What next? Support for hash data type in Redis?
Developer: Jacky Woo
It has a chance to ship in the release 23.6.
... but it is not guaranteed.
Overlay Databases
with Filesystem, S3, and HDFS database engines!
ClickHouse is supported by Confluent Cloud:
The integration can be installed as a custom connector.
Updates for Metabase; Node.js, Python, and Go drivers.
Official Terraform provider for ClickHouse Cloud:
https://registry.terraform.io/providers/ClickHouse/clickhouse/latest
GCP
— ClickHouse Cloud is GA in GCP since Jun 20th!
Dedicated Tier
— dedicated environment;
— advanced isolation and security;
— custom compute options;
— scheduled upgrades;
— uptime SLAs;
— named lead support engineer;
"chDB is an embedded OLAP SQL engine powered by ClickHouse"
Available as a Python module, Rust, Go, Node.js, and Bun library.
Similar to clickhouse-local, but embedded into your app.
Works with local and remote datasets, and Pandas dataframes.
https://chdb.io/ Developers: Auxten, Lorenzo Mangani.
"Unlimited ClickHouse URL tables. Like a pastie for SQL/JSON data."
Share your query results and small datasets
on a random server on the Internet.
No guarantees. Just a demo/experiment.
https://get.urleng.com/ Developers: Lorenzo Mangani.
If you want to demonstrate your project on the ClickHouse Webinar,
write to [email protected] or connect with us on https://clickhouse.com/slack
— Choosing the Right Join Algorithm
— Real-time event streaming with ClickHouse,
Kafka Connect and Confluent Cloud
— CDC From Postgres To ClickHouse
ClickHouse for AI - Vectors, Embedding, Semantic Search, and more...
Video Recordings: https://www.youtube.com/c/ClickHouseDB
π«π· Paris, July 4th;
πΊπΈ Boston, July 18th;
πΊπΈ New York, July 19th;
π¨π¦ Toronto, July 20th;
πΈπ¬ Singapore, July 27th;
πΊπΈ San Francisco, Aug 8th;
π³π± Amsterdam, Aug 31th;