1. (40 min) What's new in ClickHouse 24.12.
2. (15 min) Guest Talk.
2. (5 min) Q&A.
ClickHouse Christmas Release.
โ 16 new features ๐
โ 16 performance optimizations โธ๏ธ
โ 36 bug fixes โ
Table aliases can redefine column names:
SELECT a, b FROM
(
SELECT number, number * 2 FROM numbers(10)
) AS x(a, b);
CREATE VIEW my_view(a, b) AS SELECT 1, 2;
WITH t(a, b) AS (SELECT 1, 2)
SELECT a, b FROM t;
Developer: Yarik Briukhovetskyi.
:) SELECT translate('ClickHouse', 'HCc', 'm')
-- change H to m and delete characters C and c:
โโresโโโโโโโ
1. โ likhouse โ
โโโโโโโโโโโโ
And the translateUTF8 function is also supported.
Developer: Shuai Xu.
:) CREATE OR REPLACE VIEW pictures AS SELECT '...' AS img;
:) CREATE USER viewer SETTINGS http_response_headers =
$${'Content-Type': 'image/svg+xml'}$$;
:) GRANT SELECT ON pictures TO viewer;
Use-cases:
— control over client-side caching;
— content-disposition for downloadable files;
— controlling the content-type;
Demo
Developer: Alexey Milovidov.
2024 — a whole lot of optimizations for primary keys:
24.2: Lazy loading of the primary key
24.2: Optimizing primary key's memory usage
24.3: Skip useless columns in the primary key
24.4: SYSTEM UNLOAD PRIMARY KEY
24.12: primary key for tables' parts can be managed by an LRU cache:
ALTER TABLE t MODIFY SETTING primary_key_lazy_load = 1,
use_primary_key_cache = 1,
prewarm_primary_key_cache = 1;
Developer: Anton Popov.
What for?
— Tables with over quadrillion records on a cluster with shared storage.
Client-related settings, such as formatting and parsing options, can be specified on the server-side.
clickhouse-client will receive and apply these settings.
Demo
ClickHouse supports using disk as a scratch storage when there is not enough memory to perform GROUP BY or ORDER BY.
This is available since 2015, controlled by the settings max_bytes_before_external_group_by, max_bytes_before_external_sort.
It is also configured by default in ClickHouse Cloud with static thresholds.
Now it can be automatic!
Demo
Developer: Azat Khuzhin.
LIKE operator is supported for Enums:
:) CREATE TABLE test (x Enum('Hello', 'World')) ENGINE = Memory
:) INSERT INTO test VALUES ('World')
:) SELECT * FROM test WHERE x LIKE '%or%'
โโxโโโโโโ
1. โ World โ
โโโโโโโโโ
-- 24.11:
-- Code: 43. DB::Exception: Illegal type Enum8('Hello' = 1, 'World' = 2) of argument of function like: In scope SELECT * FROM test WHERE x LIKE '%or%'. (ILLEGAL_TYPE_OF_ARGUMENT)
Developers: ZhangLiStar.
Equality and IN operators accept unknown values:
:) CREATE TABLE test (x Enum('Hello', 'World')) ENGINE = Memory
:) INSERT INTO test VALUES ('World')
:) SELECT * FROM test WHERE x IN ('Goodbye', 'World')
โโxโโโโโโ
1. โ World โ
โโโโโโโโโ
-- 24.11:
-- Code: 691. DB::Exception: Unknown element 'Goodbye' for enum: while converting 'Goodbye' to Enum8('Hello' = 1, 'World' = 2). (UNKNOWN_ELEMENT_OF_ENUM)
Developers: ZhangLiStar.
CREATE TABLE ... ENGINE = MergeTree
ORDER BY (EventDate DESC, CounterID, UserID)
SETTINGS allow_experimental_reverse_key = 1;
When most of your queries will use the reverse order.
Demo
Developer: Amos Bird.
Both sides of JOIN are parallelized, with no to minimal memory overhead.
It does not harm even for short queries, and we made it as the default.
TPC-H, SF-100, Q3: 53 sec -> 2.6 sec.
TPC-H, SF-100, Q7: 94 sec -> 21 sec.
Demo
Developer: Nikita Taranov.
Using estimations of table sizes.
Controlled by the setting query_plan_join_swap_table
and enabled by default.
Demo
Developer: Vladimir Cherkasov.
SET optimize_extract_common_expressions = 1;
JOIN ... ON (a=b AND x) OR (a=b AND y) OR (a=b AND z)
JOIN ... ON a=b AND (x OR y OR z)
Allows using less number of hash tables and better conditions push-down.
Demo
Developer: Jรกnos Benjamin Antal.
-- Equi join
SELECT t1.*, t2.* FROM t1 JOIN t2 ON t1.key = t2.key;
-- Non-equi joins
SELECT t1.*, t2.* FROM t1 JOIN t2 ON t1.key != t2.key;
SELECT t1.*, t2.* FROM t1 JOIN t2 ON t1.key > t2.key;
Developer: Vladimir Cherkasov.
Developer: Alexander Gololobov.
argMax(c_arg, c_max)
— an aggregate function, that
returns the value of c_arg when the value of c_max is maximal.
Up to 10% performance increase for primitive types, such as integers.
Developer: Alexander Sapin.
From the creators of Iceberg ๐ง
CREATE DATABASE unity_demo
ENGINE = Iceberg('https://dbc-55555555-5555.cloud.databricks.com/api/2.1/unity-catalog/iceberg')
SETTINGS
catalog_type = 'rest',
catalog_credential = 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:...',
warehouse = 'unity',
oauth_server_uri = 'https://dbc-55555555-5555.cloud.databricks.com/oidc/v1/token',
auth_scope = 'all-apis,sql';
SHOW TABLES FROM unity_demo;
SELECT * unity_demo."webinar.test";
Compatible with Unity, Polaris.
Developer: Kseniia Sumarokova.
Compatible with Unity, Polaris.
Demo.
Developer: Kseniia Sumarokova.
The iceberg table function supports tables with:
— columns added or removed over time;
— renamed columns;
— data types changed between primitive types;
Developer: Dan Ivanik.
Are promoted from experimental to the beta stage.
We also backported all fixes to the previous release, 24.11.
Read the blog post about the architecture of the JSON data type:
https://clickhouse.com/blog/a-new-powerful-json-data-type-for-clickhouse.
Now we support ALTER from the deprecated Object type to JSON
to allow easy migrations.
Developer: Pavel Kruglov.
Now we support subcolumns for table's primary key and indices:
CREATE TABLE log
(
data JSON
)
ORDER BY data.time;
Developer: Pavel Kruglov.
Postgres CDC in ClickPipes, private preview:
https://clickhouse.com/cloud/clickpipes/postgres-cdc-connector
Improvements for dbt, Firetran, Kafka integrations
as well as for Python and JavaScript connectors.
— ๐ฏ๐ต Tokyo, Jan 23rd
— ๐ง๐ช FOSDEM, Brussels, Feb 1, 2
— ๐ฌ๐ง London, Feb 5th
— ๐ฆ๐ช Dubai, Feb 10th
I will be in person on some of these meetups :)
— Strategic partnership with AWS
— Medallion architecture with ClickHouse
— How we built product analytics on ClickHouse
— Query optimizations in ClickHouse
— From BigQuery to ClickHouse
— Our re:Invent announcements