YouTube stream: https://www.youtube.com/c/ClickHouseDB
Telegram chat: https://telegram.me/clickhouse_ru, clickhouse_en
T-shirts for questions!
If you asked a question via Zoom
— write to me in Telegram after the event:
your question, t-shirt size and courier delivery address.
— Compact data parts (since version 20.3).
— Data parts in memory (since version 20.6).
— Write-Ahead Log (since version 20.6).
— Durability settings (since version 20.10).
Разработчик — Антон Попов.
1. Wide — классический format.
2. Compact — все столбцы в одном файле.
3. Memory — данные в оперативке.
Controlled by settings:
— min_bytes_for_wide_part,
min_rows_for_wide_part:
если размер больше — использовать wide format.
— min_bytes_for_compact_part,
min_rows_for_compact_part:
если размер больше — использовать compact формат,
если меньше — использовать memory format.
Wide > Compact > Memory
Классический format.
Each column and index in a separate file.
Optimal for reading data from disks, including slow ones.
Cheap ALTER for adding or removing columns.
All columns in one file.
Optimal for inserting new data.
Especially on slow filesystems.
Less optimal for reading.
Recommended for small-sized data parts.
Not recommended to use for all parts.
Available since version 20.3, but disabled by default.
Since version 20.10 will be used for parts up to 10 MB.
Data in RAM.
+ Write-Ahead Log, can be disabled.
— in_memory_parts_enable_wal;
— write_ahead_log_max_bytes.
Even more optimal for inserting new data...
... if write-ahead log is disabled.
More efficient for reading...
... but data in RAM is uncompressed.
All parts are replicated as usual.
Experimental feature.
If all your data is on one server...
If you don't use replication...
If replication exists, but within one region...
— should we just call fsync before INSERT response?
Eat My Data (2007):
https://www.youtube.com/watch?v=LMe7hf2G1po
Files Are Hard (2015):
https://danluu.com/file-consistency/
PostgreSQL "Fsyncgate" (2018):
https://lwn.net/Articles/752063/
If all your data is on one server...
If you don't use replication...
If replication exists, but within one region...
Then here:
— min_rows_to_fsync_after_merge;
— min_compressed_bytes_to_fsync_after_merge;
— min_compressed_bytes_to_fsync_after_fetch;
— fsync_after_insert;
— fsync_part_directory;
— write_ahead_log_bytes_to_fsync;
— write_ahead_log_interval_ms_to_fsync;
— in_memory_parts_insert_sync.
Functions normalizeQuery, normalizedQueryHash.
SELECT normalizeQuery(query) FROM system.query_log
— replace literals with ?
— replace lists of literals with ?..
— replace complex aliases with `?`
Available since version 20.8.
A friend of mine had a slow query...
... but he didn't want to show it.
Solution:
clickhouse-format --obfuscate < query.sql
Available since version 20.10.
Data obfuscation: https://www.youtube.com/watch?v=2iR7i4akL44
CREATE TABLE hits
(
event_time DateTime,
...
) ENGINE MergeTree ORDER BY ...
TTL event_time + INTERVAL 1 MONTH
RECOMPRESS CODEC(ZSTD(1)),
event_time + INTERVAL 1 YEAR
RECOMPRESS CODEC(ZSTD(6))
Разработчик — Александр Сапин. Available since version 20.10.
CREATE TABLE hits
(
event_time DateTime CODEC(Delta, Default),
...
) ENGINE MergeTree ORDER BY ...
ALTER TABLE hits MODIFY COLUMN c
REMOVE COMMENT|CODEC|TTL
|DEFAULT|MATERIALIZED|ALIAS
Разработчик — Александр Сапин. Available since version 20.10.
SELECT * EXCEPT(secret_column) FROM table;
SELECT table.* REPLACE(
(URL LIKE '%yandex%' ? '' : URL) AS URL) FROM table;
SELECT COLUMNS('^packet_') FROM table;
SELECT t.* APPLY(sum) FROM table AS t;
SELECT COLUMNS(x, y, z) APPLY(sum) FROM table;
Разработчик — Amos Bird, mfridental. Available since version 20.10.
COLUMNS('regexp'): разработчик — mfridental. Доступно с версии 19.12.
20.10 — testing.
20.9 — stable.
20.8 — LTS until 2021-09-30.
20.7 — stable.
20.6 ... 20.4 — obsolete.
20.3 — LTS until 2021-03-12.
...
19.14 — obsolete.
Data import from RabbitMQ.
Kerberos authentication for Kafka.
Beta unlimited storage in Yandex.Cloud.
WITH for subqueries and with global scope.
Formats Regexp, RawBLOB, JSONAsString, LinesAsString.
Running clickhouse without packages and configuration.
system.crash_log table, Sentry.
Rank correlation calculation.
256bit Decimal.
Backups.
Projections.
Move away from ZooKeeper.
OpenTracing support.
Query hedging.
Reading column slices.
Functions анализа геоданных.
Text near-duplicate analysis.
https://clickhouse.tech/docs/ru/whats-new/extended-roadmap/
~ 500 задач с подробным описанием — зависимости, исполнители...