Author: Alexey Milovidov, 2022-12-02.
ClickHouse should always work.
If ZooKeeper is unavailable at server startup,
the ReplicatedMergeTree tables will start in read-only mode
and initialize asynchronously in background
as soon as ZooKeeper will be available.
— the same applicable for ClickHouse Keeper as well;
— especially useful for embedded ClickHouse Keeper;
Developers: Antonio Andelic. Since 22.9.
Session expired. Table is in readonly mode 😠
Never again:
SET insert_keeper_max_retries = 10;
INSERT will survive restarts of ClickHouse Keeper or ZooKeeper
and reconnections.
Developer: Igor Nikonov. Since 22.11.
Table is in readonly mode 😠
Was: around one minute for reconnection, for no reason.
Now: milliseconds 🥲
Developer: Raul Marin. Since 22.10.
DB::Exception: Too many parts (300). Merges are processing significantly slower than inserts 😠
Now: relaxing the "too many parts" threshold:
Allow large number of parts if they are large in average.
Developer: Alexey Milovidov. Since 22.10.
Was: strict per-query limit, max_memory_usage = 10 GB by default.
Now: query can use the memory if it's available;
in case of memory shortage, the most "overcommitted" query is terminated with exception.
Developer: Dmitry Novik.
No need to use ZooKeeper anymore!
Always use ClickHouse Keeper instead of ZooKeeper.
Developer: Alexander Sapin, Antonio Andelic.
ClickHouse should support everything you expect.
SELECT
y::String || '.'
|| (y < toYear(today()) - 2000 - 1 ? '*' : m::String) AS Version,
(n <= 3 OR (is_lts AND lts_n <= 2)) ? '✔️' : '❌' AS Supported
FROM (
SELECT y, m,
count() OVER (ORDER BY y DESC, m DESC) AS n,
m IN (3, 8) AS is_lts,
countIf(is_lts) OVER (ORDER BY y DESC, m DESC) AS lts_n
FROM (
WITH extractGroups(version, 'v(\d+)\.(\d+)') AS v,
v[1]::INT AS y, v[2]::INT AS m
SELECT y, m
FROM file('version_date.tsv', TSV, 'version String, date String')
ORDER BY y DESC, m DESC
LIMIT 1 BY y, m)
)
LIMIT 1 BY Version
FORMAT Markdown
Supported range for DateTime64 and Date32 data types:
Was: 1925–2283.
Now: 1900–2300.
Uses proleptic Gregorian calendar.
Motivation: store dates of birth of the customers.
Developer: Roman Vasin. Since 22.8.
SET mutations_sync = 1;
ALTER TABLE hits
DELETE WHERE Title LIKE '%Mongo%';
— 205 sec (for a table with 100 million records).
DELETE FROM hits
WHERE Title LIKE '%Mongo%';
— ??? sec.
Developers: Alexander Gololobov, Jianmei Zhang.
Used with ROLLUP, CUBE or GROUPING SETS.
To distinguish different sets.
SELECT k, GROUPING(k) FROM table GROUP BY k WITH ROLLUP
Developer: Dmitriy Novik.
SELECT DISTINCT repo_name, title
FROM github_events
WHERE title ILIKE (
repo_name LIKE '%ClickHouse%' ? '%fast%' : '%slow%')
AND repo_name IN ('ClickHouse/ClickHouse', 'elastic/elasticsearch')
Now I can put LIKE inside LIKE and looks like you're going to like it.
Developer: Robert Schulze. Since 22.6.
Examples:
SELECT now() + INTERVAL 1 MONTH;
SELECT now() + INTERVAL '1 MONTH';
SELECT now() + INTERVAL 1 MONTH - INTERVAL 2 DAY;
SELECT now() + INTERVAL '1 MONTH -2 DAY';
SELECT now() + (INTERVAL 1 MONTH - INTERVAL 2 DAY);
SELECT INTERVAL '1 MONTH -2 DAY';
SELECT (INTERVAL 1 MONTH - INTERVAL 2 DAY);
SELECT INTERVAL '1 MONTH 1 MONTH';
Developer: Nikolai Degterinsky. Since 22.11.
ClickHouse never slows down!
Improvement of ORDER BY, insert and merge in MergeTree,
and window functions.
SELECT WatchID FROM hits_100m_obfuscated ORDER BY Age
Before:
Elapsed: 4.154 sec. (24.07 million rows/s., 216.64 MB/s.)
After:
Elapsed: 0.482 sec. (207.47 million rows/s., 1.87 GB/s.)
Developer: Maksim Kita.
Speed-up of SELECT with FINAL modifier.
It "simply" improves performance up to 4 times.
Especially for complex transforms like Collapsing and Replacing.
Developer: Nikita Taranov.
Optimize ORDER BY with LIMIT.
Optimize ORDER BY with single column.
Optimize INSERT into MergeTree with composite ORDER key.
Optimize dictGetChildren, dictGetDescendants.
Optimize cleanup stage of queries with large GROUP BY.
Optimize background CPU usage of large number of tables.
Developer: Maksim Kita, Nikita Taranov.
Optimize COUNT(DISTINCT ...) for low number of GROUP BY keys.
Optimize GROUP BY with CPU prefetcher.
Optimize GROUP BY with better block sizes.
Developer: Nikita Taranov.
— "direct" algorithm:
to join with key-value tables by direct lookups a.k.a. nested loops.
Good if the left table is small, but the right table is like a large dictionary.
Good to use in MATERIALIZED VIEW queries.
— "parallel_hash" algorithm:
speed-up if the right hand table is large.
— "full_sorting_merge" algorithm:
when right hand side is large
and does not fit in memory and does not allow lookups.
— "grace_hash" algorithm:
since in 22.12.
Developer: Vladimir Cherkasov, lgbo-ustc.
... and we are reading from object storage 100 times faster.
Developer: Ksenia Sumarokova.
ClickHouse integrates with everything!
ClickHouse can work as a server (clickhouse-server)
or as a tool without installation (clickhouse-local).
ClickHouse can store the data
or process externally stored data on the fly.
External data:
— remote databases: MySQL, PostgreSQL, MongoDB, ODBC, JDBC...
— object storages: S3, HDFS, Azure, COSN, OSS...
— from URL and local files;
All possible data formats:
— text: CSV, TSV, JSON, Values, MySQLDump, Regexp...
— binary: Parquet, Arrow, ORC, Avro, Protobuf, MsgPack...
— schemaful and schemaless;
Now ClickHouse supports Apache Hudi and Delta Lake
for SELECT queries.
TODO: Apache Iceberg.
Advantages:
— these formats are somewhat resembling MergeTree
allowing incremental data insertion,
approaching to ClickHouse data formats;
Disadvantages:
— alien data formats from Apache/Hadoop/Java world;
— nothing works out of the box
unless you really know how to deal with it;
Now ClickHouse supports Apache Hudi and Delta Lake
for SELECT queries.
SELECT count() FROM deltaLake(
'https://clickhouse-public-datasets.s3.amazonaws.com/delta_lake/hits/')
WHERE URL LIKE '%google%'
-- 4.396 sec.
Developers: Daniil Rubin, Ksenia Sumarokova, Flynn ucasfl. Since 22.11.
— Querying MongoDB and Meilisearch with table functions.
— Streaming data consumption from NATS.
Developer: Anastasia Petrenko, Ksenia Sumarokova. Since 22.7.
Visualizations:
— official ClickHouse plugin for Grafana;
— official support for Superset;
— HEX and Deepnote support.
Data ingestion and processing:
— Kafka Connect integration;
— Airflow, dbt support.
Language drivers:
— official Node.JS driver;
— optimized Go driver;
— a new Python client.
ClickHouse is easy to configure for your needs.
The most simple way to install ClickHouse:
curl https://clickhouse.com/ | sh
Single binary package. Installs the latest version. Includes debug info.
Works on every Linux (x86_64, aarch64, powerpc64le),
macOS (x86_64, M1), FreeBSD and Windows (WSL2).
Was: 2.1 GB.
Now: 446 MB, takes ~5 seconds to decompress on first run.
Developer: Arthur Filatenkov, Yakov Olkhovskiy.
So, ClickHouse supports a lot of protocols:
— HTTP
— HTTPs
— Native TCP
— Native TCP wrapped in PROXYv1
— Native TCP with TLS
— MySQL (with TLS support)
— PostgreSQL (with TLS support)
— GRPC (with TLS)
— Replication protocol over HTTP
— Replication protocol over HTTPs
— Keeper client-server protocol;
— Keeper consensus protocol;
— ...
So, ClickHouse supports a lot of protocols.
How to configure all of them? What if:
— server has multiple network interfaces?
— enable one protocol on multiple ports?
— I want native TCP for localhost only and HTTPs from everywhere?
— I want different TLS certificates for different protocols?
— I want to wrap one protocol in another?
<protocols>
<tcp>
<type>tcp</type>
<host>::</host>
<port>9000</port>
<description>native protocol</description>
</tcp>
<tcp_secure>
<type>tls</type>
<impl>tcp</impl>
<port>9440</port>
<description>secure native protocol</description>
</tcp_secure>
<tcp_endpoint>
<impl>tcp</impl>
<host>0.0.0.0</host>
<port>9001</port>
<description>native protocol, another</description>
</tcp_endpoint>
<tcp_proxy>
<type>proxy1</type>
<impl>tcp</impl>
<port>9100</port>
<description>native protocol with PROXYv1</description>
</tcp_proxy>
The case: ClickHouse under proxy:
Envoy Proxy / HAProxy / CloudFlare.
ClickHouse server will receive connections from the proxy.
But it needs to know the source IP address for quotas, ACL and logging.
Solution: enable PROXYv1 protocol in the proxy
and configure it as a protocol wrapper in ClickHouse.
ClickHouse will read the header and unwrap the network packets.
Developer: Yakov Olkhovskiy. Since 22.10.

+ penetration testing, bug bounty program, audit reports...
— available since Oct 4th;
— free 14-day trial up to 10 TB of data;
Try it! https://clickhouse.cloud/.