Author: Alexey Milovidov, 2021-03-11.
YouTube live stream:
https://www.youtube.com/c/ClickHouseDB:
https://youtu.be/HPJOgzQkRls
Telegram chat: https://telegram.me/clickhouse_ru, clickhouse_en
T-shirts for questions!
If you asked a question via Zoom
— write to me after the event via Telegram direct message:
your question, t-shirt size and courier delivery address.
EXPLAIN queries Compact and In-memory data parts PostgreSQL wire protocol LDAP authentication Atomic database Background data recompression Column transformers RabbitMQ integration 256 bit Decimal Kerberos for Kafka and HDFS Query obfuscation and normalization Embedded Web UI
External dictionaries improvements — Maksim Kita.
Native PostgreSQL integration — Ksenia Sumarokova.
Replicated databases — Alexander Tokmakov.
Enable in config: <grpc_port>9100</grpc_port>.
All features of the native protocol are available:
— TLS, compression, query progress,
query cancellation, sessions, external data...
Example: clickhouse-client using GRPC on Python.
utils/grpc-client/clickhouse-grpc-client.py
Developer — Vitaly Baranov. Available since version 21.1.
Allows sending a query to another replica,
if one of the replicas takes too long to respond,
and choosing the fastest replica from several.
— to eliminate tail latencies on very large clusters.
Available since version 21.3.
* The largest ClickHouse cluster at Yandex is >630 servers,
but other companies have much larger clusters.
Developer — Pavel Kruglov and Nikolai Kochetov.
For tracing ClickHouse queries within a large infrastructure:
— enabled by default;
— understands opentelemetry HTTP headers;
— many spans are already marked inside ClickHouse;
— data is written to system.opentelemetry_log;
Developer — Alexander Kuzmenkov. Available since version 20.11.
Transparent work with compressed files:
— gz, brotli;
— xz, zstd;
Example:
CREATE TABLE github_events_url ( ... ) ENGINE = URL( 'https://datasets.clickhouse.com/github_events_v2.native.xz', Native);
Developer — Abi Palagashvili. Available since version 21.1.
— UNION DISTINCT;
— REPLACE TABLE and CREATE OR REPLACE TABLE;
— aggregate_functions_null_for_empty;
— Extended CTE;
— Type cast for IN subquery;
— SHOW [CHANGED] SETTINGS;
— POSITION(needle IN haystack);
— DIV / MOD;
— SELECT ALL;
— experimental and incomplete;
— will help in developing third-party tools,
working with ClickHouse queries;
For example: syntax highlighting in code editor.
Developer — Ivan Lezhankin. Available since version 21.1.
Example: map Map(String, String)
SELECT map['hello']
Developer — Hexiaoting.
— for key-value queries;
— ideal as a dictionary source;
Developer — Sundy Li.
ALTER UPDATE / DELETE IN PARTITION:
— limiting the scope of mutations.
ALTER DROP PART:
— now also for Replicated tables.
TTL now removes empty parts:
— no confusion with remaining parts.
Text analysis using Min-Hash and Sim-Hash algorithms
to search for similar or copied pieces.
Min-Hash algorithm:
1. Extract all shingles of N consecutive words from the text.
2. Hash the shingles!
3. Select M hashes with minimum values and the same amount with maximum.
4. Hash the hashes!
5. Texts are considered duplicates if at least one of the two hashes matches.
Developer — ucasFL. Available since version 21.1.
Tests to check if distributions have different means,
useful for A/B testing.
— studentTTest;
— welchTTest;
— mannWitneyUTest;
Rank correlation:
— rankCorr.
Developer — Nikita Mikhailov and others. Available since version 21.1.
ClickHouse Roadmap is publicly available on GitHub:
https://github.com/ClickHouse/ClickHouse/issues/17623
There's too much there... I'll only tell you a little bit.
Provide alternative for ZooKeeper Nested and semistructured data Limited support for transactions Backups Hedged requests Window functions Separation of storage and compute Short-circuit evaluation Projections Lightweight DELETE/UPDATE Workload management User Defined Functions Simplify replication JOIN improvements Embedded documentation Pluggable auth with tokens
In development. Initial support in version 21.1.
SET allow_experimental_window_functions = 1
Already in release:
— OVER (PARTITION BY ... ORDER BY ...)
— aggregate functions over windows;
— WINDOW clause;
— frame specifications;
In development:
— non-aggregate window functions (rank, etc...);
Developer — Alexander Kuzmenkov.
In development. Initial support in version 21.1.
Multiple nesting:
cart Nested(
item_id UInt64,
item_price Decimal(20, 5),
features Nested(
...))
SELECT cart.item_id, cart.features.f1 FROM table
SELECT cart.* FROM table
Support for nested JSON and Protobuf.
Developer — Anton Popov. Available since version 21.1.
Multiple data representations in one table.
— different sort keys;
— column subset;
— row subset;
— pre-aggregations.
Under review.
Differences from materialized views:
— data in projections is guaranteed to be consistent;
— updated atomically with the table;
— replicated the same way as the table;
— best projections are automatically chosen for SELECT.
In development.
— ZooKeeper network protocol implemented;
— abstraction layer over ZooKeeper is used;
— ZooKeeper data model implemented;
— TestKeeperServer: server with ZooKeeper data model for tests;
— NuKeeperServer: server with distributed RAFT consensus;
— snapshots, logs, crash recovery implemented;
Advantages:
— simpler operation;
— fixing "zxid overflow";
— fixing max packet size issues;
— fixing "session expired" due to GC pauses;
— reduced memory consumption;
— compressed snapshots and logs;
— embedding into clickhouse-server.
Developer — Alexander Sapin.
SELECT IF(number = 0, 0, 123 % number) FROM numbers(10)
— division by zero.
SELECT * FROM numbers(10) WHERE number > 0 AND 10 % number > 0
— division by zero.
— both branches of IF, AND and OR are always executed.
SELECT * FROM
(
SELECT * FROM numbers(10)
WHERE number > 0
)
WHERE 10 % number > 0
— division by zero.
Considering 5 ways to implement UDF, two of them are mandatory:
1. UDF as SQL expressions.
CREATE FUNCTION f AS x -> x + 1
2. UDF as an executable program.
Interaction via pipes, data serialized in any format.
What happens if you send INSERTs to ClickHouse one row at a time?
Before: nothing good.
— you had to buffer data yourself, or use Kafka, or install clickhouse-bulk, kittenhouse, ...
Will be: all good.
— ClickHouse will automatically collect data in batches and perform asynchronous INSERT.
21.4 — testing.
21.3 — prestable, LTS until 2022-03-01.
21.2 — stable.
21.1 — stable.
20.12 — stable.
20.12 ... 20.9 — obsolete.
20.8 — LTS until 2021-09-30.
...
20.3 — obsolete.
Public roadmap 2021: