— SQL UDF, executable UDF;
— executable table function & dictionary source;
— full disk encryption;
— column-level encryption;
Before:
SELECT IF(number = 0, 0, 123 % number) FROM numbers(10)
— division by zero.
SELECT * FROM numbers(10) WHERE number > 0 AND 10 % number > 0
— division by zero.
— both sides of IF, AND, OR are executed.
SELECT * FROM
(
SELECT * FROM numbers(10)
WHERE number > 0
)
WHERE 10 % number > 0
— division by zero.
Now:
— everything simply works as expected.
Available in 21.9.
Developer — Pavel Kruglov.
Ability to do many frequent INSERTs.
From multiple parallel connections.
Without Kafka and without Buffer tables!
Multiple small INSERTs are combined together
into one batch in memory.
Inserts are reliable by default:
client receives a response when data is written to the table.
Available in 21.11-testing.
Developer: Anton Popov, Ivan Lezhankin.
1. clickhouse-keeper — 100% compatible with ZooKeeper
in protocol and data model.
— compressed logs and snapshots;
— no issues with zxid overflow;
— no issues with large packets;
— better memory usage;
— no issues with GC and Java heap;
2. Can be run embedded in clickhouse-server.
— no need for a separate service;
Preproduction stage, planned for Q4 2021.
Developer: Alexander Sapin.
JSON_EXISTS, JSON_VALUE, JSON_QUERY
SELECT JSON_QUERY(
'$.array[*][0 to 2, 4]',
'{"array":[[0, 1, 2, 3, 4, 5],
[0, -1, -2, -3, -4, -5]]}');
Available in 21.8.
Developers — Ksenia Sumarokova, Konstantin Rudenskii, Denis Semenov.
JSON data type:
CREATE TABLE games (data JSON) ENGINE = MergeTree;
You can insert arbitrary nested JSONs.
Types are automatically inferred on INSERT and merge.
Data is stored in columnar format: columns and subcolumns.
Query nested data naturally.
Example: NBA games dataset
CREATE TABLE games (data String)
ENGINE = MergeTree ORDER BY tuple();
SELECT JSONExtractString(data, 'teams', 1, 'name')
FROM games;
— 0.520 sec.
CREATE TABLE games (data JSON)
ENGINE = MergeTree;
SELECT data.teams.name[1] FROM games;
— 0.015 sec.
DESCRIBE TABLE games
SETTINGS describe_extend_object_types = 1
name: data
type: Tuple( <-- inferred type
`_id.$oid` String,
`date.$date` String,
`teams.abbreviation` Array(String),
`teams.city` Array(String),
`teams.home` Array(UInt8),
`teams.name` Array(String),
`teams.players.ast` Array(Array(Int8)),
`teams.players.blk` Array(Array(Int8)),
`teams.players.drb` Array(Array(Int8)),
`teams.players.fg` Array(Array(Int8)),
`teams.players.fg3` Array(Array(Int8)),
`teams.players.fg3_pct` Array(Array(String)),
`teams.players.fg3a` Array(Array(Int8)),
`teams.players.fg_pct` Array(Array(String)),
`teams.players.fga` Array(Array(Int8)),
`teams.players.ft` Array(Array(Int8)),
`teams.players.ft_pct` Array(Array(String)),
`teams.players.fta` Array(Array(Int8)),
`teams.players.mp` Array(Array(String)),
`teams.players.orb` Array(Array(Int8)),
`teams.players.pf` Array(Array(Int8)),
`teams.players.player` Array(Array(String)),
`teams.players.plus_minus` Array(Array(String)),
`teams.players.pts` Array(Array(Int8)),
`teams.players.stl` Array(Array(Int8)),
`teams.players.tov` Array(Array(Int8)),
`teams.players.trb` Array(Array(Int8)),
`teams.results.ast` Array(Int8),
`teams.results.blk` Array(Int8),
`teams.results.drb` Array(Int8),
`teams.results.fg` Array(Int8),
`teams.results.fg3` Array(Int8),
`teams.results.fg3_pct` Array(String),
`teams.results.fg3a` Array(Int8),
`teams.results.fg_pct` Array(String),
`teams.results.fga` Array(Int16),
`teams.results.ft` Array(Int8),
`teams.results.ft_pct` Array(String),
`teams.results.fta` Array(Int8),
`teams.results.mp` Array(Int16),
`teams.results.orb` Array(Int8),
`teams.results.pf` Array(Int8),
`teams.results.plus_minus` Array(String),
`teams.results.pts` Array(Int16),
`teams.results.stl` Array(Int8),
`teams.results.tov` Array(Int8),
`teams.results.trb` Array(Int8),
`teams.score` Array(Int16),
`teams.won` Array(Int8))
Flexible schema.
You can have columns with strict and flexible schema in one table.
Queries work as fast as with predefined types!
Planned for Q1 2022.
Developer: Anton Popov.
Before:
clickhouse-local --input-format Parquet \
--query 'SELECT * FROM table' \
--structure '
id UInt32, deleted UInt8,
type String, by String, time DateTime,
text String, dead UInt8, parent UInt32,
poll UInt32, kids Array(UInt32),
url String, score Int32, title String,
parts Array(UInt32), descendants Int32'
After:
clickhouse-local --input-format Parquet \
--query 'SELECT * FROM table'
1. No need to specify schema if data already contains it:
Native, Arrow, Parquet, ORC, Avro.
2. Allow to infer schema from text formats:
TSV, CSV, JSONEachRow.
It works for:
— clickhouse-local;
— url;
— file;
— s3.
Planned for Q1 2022.
Developer: Pavel Kruglov.
1. Tokenization, stemming, lemmatization, synonims (21.9..21.11).
— SELECT arrayMap(word -> lemmatize(word), tokens(text)).
Developer: Nikolai Degterinskiy
2. Charset and language detection (Q4 2021).
3. Semi-duplicate text search (21.1).
— min-hash and sim-hash algorithms.
ClickHouse over S3:
— highly concurrent asynchronous reads (21.11);
— parallel processing on multiple compute nodes (Q4);
— caching in RAM or local disk with node affinity (Q4);
— easy coordination of inserts and merges
without ReplicatedMergeTree (Q1 2022);
Lower cost and higher throughput than EBS volumes.
Bonus: "web" disk for static hosted datasets (21.10).
I heard you don't like XML...
Then ClickHouse will support YAML as an alternative for you.
logger:
level: trace
log: /var/log/clickhouse-server/clickhouse-server.log
errorlog: /var/log/clickhouse-server/clickhouse-server.err.log
size: 1000M
count: 10
http_port: 8123 # Don't worry, YAML configuration
tcp_port: 9000 # will also contain comments.
mysql_port: 9004 #
postgresql_port: 9005 # It's not JSON after all.
max_connections: 4096
Developer — Denis Bolonin.
Support for importing and exporting gz, xz, bz2, zst, lz4.
This works for url, file, s3, hdfs
SELECT INTO OUTFILE,
INSERT FROM INFILE.
:: operator for type casting:
SELECT x::UInt64,
['Hello', 'world!']::Array(Nullable(String))
$heredoc$ for string literals:
SELECT
$$
Hello, 'C:\WINDOWS\SYSTEM32'
world!
$$
Developers — Anton Popov, Maksim Kita.
Tons of dictionary improvements:
— array support in dictionaries;
— Nullable support in dictionaries;
— dictionaries can be created with keys only, without attributes;
— complex_key_range_hashed;
— executable_pool source;
— cache dictionary performance;
— dictGetChildren; dictGetDescendants;
— dictGetOrNull;
Developer — Maksim Kita.
— support for both IPv4/IPv6 in ip_trie;
Developer — Vladimir Cherkasov.
Tons of dictionary improvements:
All dictionary sources from external databases
allow specifying explicit queries:
CREATE DICTIONARY dict
... SOURCE(PostgreSQL(
DB 'db' HOST 'host' PORT 5432 USER 'postgres'
QUERY $$
SELECT ... FROM t1 INNER JOIN t2 USING (key)
$$))
Developer — Maksim Kita.
Tons of JOIN improvements:
— Type casting in ON/USING clause.
— Table filters in ON clause.
— CROSS to INNER JOIN transformations and back.
— join_use_nulls support for Array, LowCardinality.
Developer — Vladimir Cherkasov.
Interactive mode for clickhouse-local.
INTERSECT, EXCEPT, ANY, ALL, EXISTS.
Virtual file system over static files: web disk.
ALTER ... MATERIALIZE COLUMN.
SYSTEM RESTORE REPLICA.
SQLite database engine.
Integration with S2 Geometry.
s3Cluster table function.
INFORMATION_SCHEMA.
Storing access rights in ZooKeeper.
ClickHouse already supports window functions!
ClickHouse already supports geographical data types:
Point, Ring, Polygon, MultiPolygon and functions:
polygonsIntersection, polygonArea, polygonsWithin, ...
MaterializedMySQL — change data capture from PostgreSQL.
OPTIONS HTTP request support,
— so Grafana and Observable work without proxy.
Play UI improvements:
— graphs for EXPLAIN;
— sparklines;
— links;
ClickHouse is not a transactional DBMS, isn't it?
But we need transactions to:
— do atomic INSERTs to multiple partitions;
— do atomic INSERTs to multiple tables and materialzied views;
— make multiple SELECT queries from one snapshot.
Planned for Q2 2022
Developer — Alexander Tokmakov.
... And Incremental Aggregation In Memory
Run GROUP BY query continuously.
Aggregated data is accumulated in memory.
It can be queried as a table.
The server can serve key-value requests with Redis protocol.
Example applications:
— realtime antifraud;
— user profiles and personalization.
Planned for summer 2022.
And workload isolation.
Planned for summer 2022.