ClickHouse Roadmap is publicly available on GitHub:
https://github.com/ClickHouse/ClickHouse/issues/17623
I will show you only some highlights and examples.
Provide alternative for ZooKeeper Nested and semistructured data Limited support for transactions Backups Hedged requests Window functions Separation of storage and compute Short-circuit evaluation Projections Lightweight DELETE/UPDATE Workload management User Defined Functions Simplify replication JOIN improvements Embedded documentation Pluggable auth with tokens
Work in progress. Initial support in version 21.1.
Data types:
Tuple(T1, T2...)
Tuple(x1 T1, x2 T2...)
Map(T1, T2)
Nested(x1 T1, x2 T2...)
Support for subcolumns:
SELECT cart.id, cart.price FROM table
— only queried subcolumns will be read from table.
Work in progress. Initial support in version 21.1.
Multiple nesting:
cart Nested( item_id UInt64, item_price Decimal(20, 5), features Nested( ...))
SELECT cart.item_id, cart.features.f1 FROM table
SELECT cart.* FROM table
Maps naturally to nested JSON and Protobuf.
Work in progress. Initial support in version 21.1.
SET allow_experimental_window_functions = 1
Already supported:
— OVER (PARTITION BY ... ORDER BY ...)
— aggregate functions over windows;
— WINDOW clause;
Upcoming:
— non-aggregate window functions (rank, etc...);
— frame specifications;
Multiple data representations inside a single table.
— different data order;
— subset of columns;
— subset of rows;
— aggregation.
Work in progress.
Difference to materialized views:
— projections data is always consistent;
— updated atomically with the table;
— replicated in the same way as the table;
— projection can be automatically used for SELECT query.
Work in Progress.
— ZooKeeper network protocol is implemented;
— Abstraction layer over ZooKeeper is used;
— ZooKeeper data model is implemented for testing;
— TestKeeperServer: a server with ZooKeeper data model for testing;
Benefits:
— less operational complexity;
— fix "zxid overflow" issue;
— fix the issue with max packet size;
— fix "session expired" due to gc pauses;
— improve memory usage;
— allow compressed snapshots;
— allow embedding into clickhouse-server.
SELECT IF(number = 0, 0, 123 % number) FROM numbers(10)
— division by zero.
SELECT * FROM numbers(10) WHERE number > 0 AND 10 % number > 0
— division by zero.
— both branches of IF, AND, OR are always evaluated.
SELECT * FROM ( SELECT * FROM numbers(10) WHERE number > 0 ) WHERE 10 % number > 0
— division by zero.
We are considering five ways to implement UDF, two of them are mandatory:
1. UDF as SQL expressions.
CREATE FUNCTION f AS x -> x + 1
2. UDF as executable script.
Interaction via pipes, data is serialized using supported formats.
Send distributed query to multiple replicas — to mitigate tail latencies.
This is needed for distributed queries on large clusters (with large "fanout").
Work in progress.
* The largest ClickHouse cluster in Yandex is 630+ servers,
but there are many larger clusters in other companies.
Native integration with PostgreSQL
— PostgreSQL table engine and table function;
— PostgreSQL dictionary source;
— PostgreSQL database engine as a view to all tables in PG database;
Available in version 21.2-testing.
In previous versions it was only available via ODBC with many complications.
Read the official roadmap and ask your questions: