1. (50 min) What's new in ClickHouse 24.9.
2. (10 min) Q&A.
ClickHouse Autumn Release.
โ 23 new features ๐พ
โ 14 performance optimizations ๐
โ 76 bug fixes ๐ฆ
:) CREATE TEMPORARY TABLE test (x Tuple(a String, b Array(Tuple(c Tuple(e String), d String))), y String)
CREATE TEMPORARY TABLE test
(
`x` Tuple(
a String,
b Array(Tuple(
c Tuple(
e String),
d String))),
`y` String
)
A usual way:
CREATE TABLE IF NOT EXISTS test (x UInt8) ORDER BY ();
A new option, since 24.9:
SET create_if_not_exists = 1;
CREATE TABLE test (x UInt8) ORDER BY ();
— "out of band" if not exists specification as a setting.
Developer: Peter Nguyen.
Replaces a fragment in a string with another string in a specified position.
SELECT overlay('Hello, world!', 'test', 8, 5) AS res
โโresโโโโโโโโโโโ
1. โ Hello, test! โ
โโโโโโโโโโโโโโโโ
WITH 'Hello, world!' AS s, 'test' AS replacement, 8 AS pos, 5 AS length
SELECT concat(substring(s, 1, pos - 1),
replacement,
substring(s, pos + length)) AS res
โโresโโโโโโโโโโโ
1. โ Hello, test! โ
โโโโโโโโโโโโโโโโ
Developer: TaiYang Li.
:) CREATE TEMPORARY TABLE test (x String DEFAULT 'ClickHouse')
:) INSERT INTO test FORMAT JSONEachRow {"x":""}
โโxโโ
1. โ โ
โโโโโ
:) SET input_format_json_empty_as_default = 1
:) INSERT INTO test FORMAT JSONEachRow {"x":""}
โโxโโโโโโโโโโโ
1. โ โ
2. โ ClickHouse โ
โโโโโโโโโโโโโโ
:) ALTER TABLE test ADD COLUMN y UInt8 DEFAULT 123
:) INSERT INTO test FORMAT JSONEachRow {"x":"","y":""}
โโxโโโโโโโโโโโฌโโโyโโ
1. โ โ 0 โ
2. โ ClickHouse โ 0 โ
3. โ ClickHouse โ 123 โ
โโโโโโโโโโโโโโดโโโโโโ
Developer: Alexis Arnaud.
But you will rarely need it:
— omitted fields in JSON are interpreted as default;
— input_format_null_as_default is also enabled,
and nulls in JSON will be interpreted as default;
The new setting, input_format_json_empty_as_default is for rare cases
when an empty string should be mapped to the default expression.
DELETE FROM test IN PARTITION 202409 WHERE data LIKE '%trash%'
An option for DELETE query to explicitly limit it for a specified partition.
It avoids copying parts metadata for unrelated partitions.
Developer: Sunny.
A table in ClickHouse can contain "projections" — they represent the same data in a different physical order or an aggregation, to automatically optimize queries by using this data.
CREATE TABLE hits (CounterID UInt32, URL String
PROJECTION totals (SELECT CounterID, count(), uniq(URL) GROUP BY CounterID)
) ENGINE = MergeTree ORDER BY (CounterID, EventTime);
A new system table to introspect projections.
Demo
Developer: Jordi Villar.
Obtain a dictionary of HTTP response headers when querying a remote server.
SELECT _headers, *
FROM url('https://api.github.com/repos/ClickHouse/ClickHouse')
FORMAT Vertical
Demo
Developer: Flynn.
:) SELECT arrayZip([1, 2, 3], ['Hello', 'world'])
Received exception:
The argument 1 and argument 2 of function arrayZip have different
array sizes.
:) SELECT arrayZipUnaligned([1, 2, 3], ['Hello', 'world']) AS res
โโresโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1. โ [(1,'Hello'),(2,'world'),(3,NULL)] โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Developer: TaiYang Li.
A partitioning of data into different directories,
when an object path contains sub-directories in the form of key=value.
Example:
s3://ookla-open-data/parquet/performance/type=fixed/year=2019/quarter=1/
2019-01-01_performance_fixed_tiles.parquet
Developer: Yarik Briukhovetskyi.
ClickHouse 24.8:
SET use_hive_partitioning = 1;
— enables virtual columns from the paths: type, year, quarter
and automatically infers their types.
ClickHouse 24.9:
— does automatic partition pruning!
Demo
Developer: Yarik Briukhovetskyi.
:) SELECT [a, b, c] FROM table
Demo
Developer: TaiYang Li.
By a low-level optimization for the case of repeating keys.
Demo
Developer: KevinyhZou.
In version 24.8, Iceberg tables were supported only for S3.
In version 24.9, it is supported for Azure Blob storage
and local filesystem.
Now there are IcebergS3, IcebergAzure, and IcebergLocal table engines,
as well as icebergS3, icebergAzure, and icebergLocal table functions.
Developer: Daniil Ivanik.
-- create a user with two alternative passwords:
CREATE USER test IDENTIFIED
BY 'abc' VALID UNTIL '2024-10-01',
BY 'def' VALID UNTIL '2024-11-01';
-- create a user with multiple authentication methods:
CREATE USER test IDENTIFIED
WITH sha256_password BY 'abc',
WITH ssh_key BY KEY '...' TYPE 'ssh-ed25519';
-- add a new method:
ALTER USER test ADD IDENTIFIED ...;
-- keep only the latest method:
ALTER USER test RESET AUTHENTICATION METHODS TO NEW;
CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db.]table_name
REFRESH EVERY|AFTER interval [OFFSET interval]
RANDOMIZE FOR interval
DEPENDS ON [db.]name [, [db.]name [, ...]]
[APPEND] [TO [db.]name] [(columns)] [ENGINE = engine]
AS SELECT ...
Refreshable Materialized Views exist since 23.11.
The APPEND mode is new in 24.9.
Developer: Michael Kolupaev.
Run the SELECT query in background
and atomically
replace or append to the table with its result.
Run the refresh process using a flexible configured schedule.
Support a dependency graph with multiple materialized views.
Example: periodically request an external API with the url table function
and keep accumulating results in a table.
Developer: Michael Kolupaev.
Now ClickHouse supports automatic usage of the Variant data type
for schema inference.
Demo
Developer: Shaun Struwig.
distinctDynamicTypes,
distinctJSONPaths,
distinctJSONPathsAndTypes
SELECT DISTINCT arrayJoin(JSONAllPaths(data)) FROM website_traffic;
SELECT distinctJSONPaths(data) FROM website_traffic;
Developer: Pavel Kruglov.
ClickPipes:
— allows custom certificates for authentication in Kafka (SASL, TLS);
— shows the ingestion latency graph for Kafka and Kinesis;
— scaling controls for (private beta);
— allows to put the entire raw message into a single column.
Java connector: version 0.7.0 with memory and performance optimizations.
Rust client: improved parameter bindings; documentation.
JavaScript client: supports the experimental JSON type.
DBT: now supports projections! + role_arn for S3.
Updates for Spark and Beam, Grafana, PeerDB.
Thanks for many updates to our contributors:
mitchbregs, rjoelnorgren (DBT),
loyd, pravic, blind-oracle (Rust),
javiercj93, alxhill, BeenAxis (Java),
Kenterfie, Defman (Grafana),
achmad-dev (Go), bakwc, angusholder (Python),
AlexTheKing (Spark).
— ๐ฆ๐บ DataEngBytes, Perth, Sept 27th
— ๐ฎ๐ฉ Jakarta, Oct 1st
— ๐ฆ๐บ DataEngBytes, Melbourne, Oct 1st
— ๐ธ๐ฌ Singapore, Oct 3rd
— ๐ณ๐ฟ DataEngBytes, Auckland, Oct 4th
— ๐ช๐ธ Madrid, Oct 21th
— ๐ช๐ธ Barcelona, Oct 29th
— ๐ณ๐ด Oslo, Oct 31th
— ๐ณ๐ฑ Ghent, Nov 19th
— ๐ฆ๐ช Dubai, Nov 21th
— ๐ซ๐ท Paris, Nov 26th
I will be in person on some of these meetups :)
— How we built our DWH;
— VLDB paper + videos;
— Building apps with query API endpoints;
— Salesforce analytics with Estuary Flow;
— ePilot migrated from Redshift and Influx to ClickHouse;
— How Weights&Biases uses ClickHouse;
— ClickHouse for ML & AI;