1. (45 min) What's new in ClickHouse 22.6.
2. (15 min) Q&A.
ClickHouse "summer" release.
— 19 new features
— 27 performance optimizations
Used with ROLLUP, CUBE or GROUPING SETS.
To distinguish different sets.
SELECT k, GROUPING(k) FROM table GROUP BY k WITH ROLLUP
(demo)
Developer: Dmitriy Novik.
— JSONColumns
— JSONCompactColumns
— JSONColumnsWithMetadata
(demo)
Let's write some JavaScript... https://github.com/leeoniya/uPlot/
async function chart() {
const opts = {
width: 800,
height: 400,
series: [ { label: "time" },
{ label: "pageviews", stroke: "#8DF" } ] };
const response = await fetch(
"https://play.clickhouse.com/?user=play",
{ method: "POST", body: `
SELECT toStartOfDay(time, 'Europe/Amsterdam')::INT AS t,
sum(hits)::INT AS v
FROM wikistat
WHERE path = 'ClickHouse' GROUP BY t ORDER BY t
FORMAT JSONCompactColumns` });
const data = await response.json();
let uplot = new uPlot(opts, data, document.getElementById('chart'));
}
Did you know?
— ClickHouse has support for Open Telemetry
— data is collected in the system.opentelemetry_span_log table
— easy to visualize with Jaeger or Zipkin
— detailed instrumentation since 22.6!
The problem: it is too detailed...
A query can generate ~ 1 mln spans.
utils/trace_visualizer, developer: Sergei Trifonov.
utils/trace_visualizer, developer: Sergei Trifonov.
FPC — a codec for floating point compression.
"High Throughput Compression of Double-Precision Floating-Point Data"
— Martin Burtscher and Paruj Ratanaworabhan,
School of Electrical and Computer Engineering,
Cornell University, Ithaca, NY
Developer: Mikhail Guzov.
(demo)
SELECT DISTINCT repo_name, title FROM github_events WHERE title ILIKE ( repo_name LIKE '%ClickHouse%' ? '%fast%' : '%slow%') AND repo_name IN ('ClickHouse/ClickHouse', 'elastic/elasticsearch')
Now I can put LIKE inside LIKE and looks like you're going to like it.
Developer: Robert Schulze.
INSERT INTO system.zookeeper (name, path, value) VALUES ...
Creates new persistent nodes in ZooKeeper.
Batch INSERTs will create multiple nodes in transaction.
Please don't mess with ZooKeeper and don't use this feature.
Developer: Han Fei.
SELECT my_function('hello')(x)
Parameters will be substituted to the command template:
./myprogram --xyz {param:String}
./myprogram --xyz 'hello'
Note: only allowed for direct "exec" without a shell interpreter.
Because the interpretation of escape sequences inside string literals
in shell interpeter is different than in SQL.
Developer: Maksim Kita.
clickhouse-disks \ --config-file ... \ --disk ... \ list|read|write|remove|link|copy path
A tool for introspection and operations on ClickHouse VFS
Can manipulate files on object storage (S3, HDFS, Azure, Web)
Developer: Artyom Yurkov.
A ton of optimizations for AArch64 with ARM NEON!
(Apple M1, AWS Graviton...)
Almost all low-level vector instructions, LZ4 and ZSTD optimizations,
UTF-8 handling, string functions and sorting, etc, etc...
Now, AArch64 is a first-class citizen of ClickHouse!
Developer: Daniel Kutenin (Google), Alexandra Pilipyuk.
By the way... AArch64 builds got proper stack traces.
(exceptions, system.stack_trace, query profiler)
Developer: Maksim Kita.
Optimize COALESCE.
Optimize multiIf with single branch.
Optimize functions L1/L2/Lp/Linf Distances and Norms.
Optimize number comparison.
Optimize hasAll with AVX-512.
Optimize greatCircleAngle, greatCircleDistance, geoDistance.
Developer: Maksim Kita, Anton Popov, Alexander Gololobov.
Optimize ORDER BY with LIMIT.
Optimize ORDER BY with single column.
Optimize INSERT into MergeTree with composite ORDER key.
Optimize dictGetChildren, dictGetDescendants.
Optimize cleanup stage of queries with large GROUP BY.
Optimize background CPU usage of large number of tables.
Developer: Maksim Kita, Nikita Taranov.
Cold queries from MergeTree will start faster.
— developer: Anton Kozlov, CloudFlare.
Avoid VFS cache thrashing.
— developer: Han Shukai, ICT Beijing.
Move cases for condition pushdown into JOIN.
— developer: Amos Bird, ICT Beijing.
Filtering reads from s3/hdfs/etc with _file and _path.
— developer: Amos Bird, ICT Beijing.
Less copying in compression loop.
— developer: Jasper Zhu, Intel.
CREATE TABLE hits_insert_test
(UserID UInt64, EvenTime DateTime, WatchID UInt64)
ENGINE = MergeTree ORDER BY (UserID, EventTime);
INSERT INTO hits_insert_test
SELECT UserID, EventTime, WatchID
FROM hits_100m_obfuscated;
Before:
0 rows in set. Elapsed: 10.945 sec. Processed 100.00 million rows, 2.00 GB (9.14 million rows/s., 182.73 MB/s.)
After:
0 rows in set. Elapsed: 7.562 sec. Processed 100.00 million rows, 2.00 GB (13.22 million rows/s., 264.48 MB/s.)
ClickHouse Go Driver now supported by ClickHouse, Inc.
— added support for JSON data type;
— added support for insertion of structs and maps;
On the way: merging clickhouse-go and go-faster/ch
will add a low-level API with performance boost.
Developer: Dale McDiarmid, Anton Popov.
Official integration with dbt, passed the test suite, better perf...
— developer: Dmitriy Sokolov + Guy Kohen, Dale McDiarmid.
Official integration with Airbyte, improved for dbt :)
— developer: Mark Zitnik.
Extended Federated Queries:
— querying Redis, MongoDB, Cassandra
and external ZooKeeper clusters with table functions;
Roadmap 2022 is published:
https://github.com/ClickHouse/ClickHouse/issues/32513
— open for discussions.
... all great stuff is discussed there, don't miss.