1. (50 min) What's new in ClickHouse 23.11.
2. (10 min) Q&A.
ClickHouse November Release.
β 25 new features π
β 24 performance optimizations π§€
β 70 bug fixes π¦
... in the same ALTER query
ALTER TABLE test
ADD INDEX ix (col) TYPE minmax,
MATERIALIZE INDEX ix
Was: DB::Exception: Unknown index: ix.
Now: It works!
Developer: Flynn.
:) SELECT concat('Hello ', [1, 2])
ββconcat('Hello, ', [1, 2])ββ
β Hello, [1,2] β
βββββββββββββββββββββββββββββ
:) SELECT concat('Hello ', [1, 2], 3)
ββconcat('Hello ', [1, 2], 3)ββ
β Hello [1,2]3 β
βββββββββββββββββββββββββββββββ
Developer: Serge Klochkov.
:) SELECT concat('Hello')
ββconcat('Hello')ββ
β Hello β
βββββββββββββββββββ
Why do we need it?
Developer: Serge Klochkov.
In external tables: s3, url, file, hdfs, azureBlobStorage...
In addition to the existing virtual columns _file and _path.
SELECT _file, _size, count(), uniq(UserID) FROM s3(
's3://clickhouse-public-datasets/hits_compatible/athena_partitioned/hits*.parquet')
GROUP BY _file, _size ORDER BY _file
Demo
Developer: Pavel Kruglov.
A new table function and table engine, similar to
s3Cluster, urlCluster, hdfsCluster, azureBlobStorageCluster.
SELECT uniq(UserID) FROM fileCluster('partitioned/hits*.parquet')
Processes the set of files, spreading the work across the servers in a cluster.
Requires the usage of network filesystem,
mounted inside the user_files path,
so all the files are visible on every server of a cluster.
Developer: Andrey Zvonov.
SET preferred_optimize_projection_name = 'summaries';
— prefer using the specified projection if it exists
instead of automatic selection.
Acts like a hint.
Developer: Yarik Briukhovetskyi.
CREATE TABLE system.blob_storage_log
(
event_date Date,
event_time DateTime,
event_time_microseconds DateTime64(6),
event_type Enum8(
'Upload' = 1, 'Delete' = 2,
'MultiPartUploadCreate' = 3, 'MultiPartUploadWrite' = 4,
'MultiPartUploadComplete' = 5, 'MultiPartUploadAbort' = 6),
query_id String,
thread_id UInt64,
thread_name String,
disk_name LowCardinality(String),
bucket String,
remote_path String,
local_path String,
data_size UInt64,
error String
)
Developer: Vladimir Cherkasov.
A four-letter command to yield leadership:
echo 'rqld' | nc localhost 2181
Allows to initiate a leader election and wait for a new leader
before removing the node.
Developer: Pradeep Chhetri.
Compression in the network protocol.
Controlled on the client side:
zookeeper:
node:
host: example1
port: 2181
use_compression: true
Developer: Smita Kulkarni.
Information about Availability Zone.
keeper_server:
availability_zone:
# option 1, explicit value:
value: eu-central-1b
# option 2, from the metadata service:
enable_auto_detection_on_cloud: true
The info is published in /keeper/availability-zone
Developer: Jianfei Hu.
:) SELECT * FROM system.numbers_mt
WHERE number BETWEEN 3 AND 5
OR number IN (1000000000, 456)
Demo
Developer: JackyWoo.
Demo
Developer: Dmitriy Novik.
How long does the server take to start up?
... if you have 10 000 replicated tables?
But having 10 000 tables is an anti-pattern...
Enable with async_load_databases in the config.xml.
ClickHouse will start instantly.
It accepts connections and start loading the tables at the same time.
If a query requires a table, its initialization will be put in front,
and the query will wait for it.
Pros: it's amazing! Cons: caveats exist.
Developer: Sergei Trifonov.
Since version 23.11, the concurrency control is enabled by default.
It automatically adjusts the number of threads per query.
concurrent_threads_soft_limit_ratio_to_cores: 2
Result: high QPS without performance degradation.
Developer: Roman Vasin.
Slashing tail latencies.
Developer: Sema Checherinda.
If the data resides in the cache on local disks, read it synchronously
and don't hesitate doing more seeks.
Developer: Nikita Taranov.
SET allow_experimental_statistic = 1;
CREATE TABLE example
(
a Int64 STATISTIC(TDigest),
b String
) ENGINE = MergeTree ORDER BY b;
ALTER TABLE example ADD STATISTIC a TYPE TDigest;
ALTER TABLE example DROP|CLEAR STATISTIC a;
ALTER TABLE example MATERIALIZE STATISTIC a;
Developer: Han Fei.
SET allow_statistic_optimize = 1;
SELECT ... WHERE a = 123 AND ...
Selectivity of conditions is automatically determined.
Then it is used for selecting the best order
in multi-stage filtering (PREWHERE) optimization,
in addition to the information of column sizes.
Developer: Han Fei.
Similar to data-skipping indices
but accumulated in a single small file for each data part.
What's next?
— JOIN reordering;
— implicit LowCardinality.
Developer: Han Fei.
A table engine for streaming consumption from S3:
CREATE TABLE queue ENGINE = S3Queue('https://clickhouse-public-datasets.s3.amazonaws.com/
hits_compatible/athena_partitioned/hits_*.parquet') SETTINGS mode = 'unordered';
CREATE TABLE data
ENGINE = MergeTree ORDER BY ()
EMPTY AS SELECT * FROM queue;
CREATE MATERIALIZED VIEW consumer TO data AS SELECT * FROM queue;
— Consumes data from a bunch of files on S3.
— Consumes new files as soon as they appear.
— Supports parallel and distributed consumption.
Developer: Sergei Katkovskiy, Kseniia Sumarokova.
Spot Instance Data Feed in AWS
aws ec2 create-spot-datafeed-subscription --bucket clickhouse-spot-instance-data-feed
{
"SpotDatafeedSubscription": {
"Bucket": "clickhouse-spot-instance-data-feed",
"OwnerId": "111111111111",
"Prefix": "",
"State": "Active"
}
}
Preview the data:
SELECT *
FROM s3('https://clickhouse-spot-instance-data-feed.s3.amazonaws.com/*.gz', LineAsString)
LIMIT 10
Query id: 78026e69-1cc2-4c82-8be1-c114b1a15a09
ββlineβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β #Version: 1.0 β
β #Fields: Timestamp UsageType Operation InstanceID MyBidID MyMaxPrice MarketPrice Charge Version β
β 2023-09-01 03:31:20 UTC SpotUsage:c5a.8xlarge RunInstances:SV001 i-00b91fe65059b4880 sir-s2eemc1k 1.2320000000 USD 1.0130000000 USD 0.0368620570 USD 1 β
β 2023-09-01 03:27:01 UTC SpotUsage:c5a.8xlarge RunInstances:SV001 i-02fd64d9c11effa34 sir-4ydpp3kj 1.2320000000 USD 1.0130000000 USD 0.0371436710 USD 1 β
β 2023-09-01 03:11:06 UTC SpotUsage:c5a.8xlarge RunInstances:SV001 i-0a4c1c90ac125800e sir-haxppnmk 1.2320000000 USD 1.0130000000 USD 0.0371436710 USD 1 β
β 2023-09-01 03:07:10 UTC SpotUsage:c5d.9xlarge RunInstances:SV006 i-04a2d1d907f42badc sir-haxpmkzg 1.7280000000 USD 0.6275000000 USD 0.2680818050 USD 1 β
β 2023-09-01 03:15:17 UTC SpotUsage:c5d.9xlarge RunInstances:SV006 i-05c084023a8e866f6 sir-t54ppm5k 1.7280000000 USD 0.6275000000 USD 0.1809289575 USD 1 β
β 2023-09-01 03:45:26 UTC SpotUsage:c5d.9xlarge RunInstances:SV006 i-08269560fb6a7444c sir-p6qyq3cg 1.7280000000 USD 0.6275000000 USD 0.6275000000 USD 1 β
β 2023-09-01 03:41:11 UTC SpotUsage:c5d.9xlarge RunInstances:SV006 i-09d71d667650f5c9a sir-m9spnjsj 1.7280000000 USD 0.6275000000 USD 0.0261460425 USD 1 β
β 2023-09-01 03:04:51 UTC SpotUsage:c5d.9xlarge RunInstances:SV006 i-0a94a38b2190e6717 sir-31feqd4g 1.7280000000 USD 0.6275000000 USD 0.0803551400 USD 1 β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Process it as is:
WITH
parseDateTimeBestEffort(c1) AS time,
extractGroups(c2, '^(\w+):(.+)$') AS item_type,
item_type[1] AS item,
item_type[2] AS type,
c4 AS id,
CAST(extract(c6, '^\S+') AS Decimal(10, 6)) AS max_price,
CAST(extract(c7, '^\S+') AS Decimal(10, 6)) AS market_price,
CAST(extract(c8, '^\S+') AS Decimal(10, 6)) AS charge,
charge / market_price AS hours
SELECT
type,
count() AS c,
sum(hours) AS h,
sum(charge) AS cost
FROM s3('https://clickhouse-spot-instance-data-feed.s3.amazonaws.com/*.gz', TSV)
GROUP BY type ORDER BY h DESC
SETTINGS input_format_tsv_skip_first_lines = 2, schema_inference_make_columns_nullable = 0
ββtypeββββββββββββββ¬ββββββcββ¬ββββββββββββhββ¬βββββββββcostββ
β c6id.8xlarge β 875 β 379.432492 β 345.468028 β
β m7g.4xlarge β 912 β 362.599616 β 120.957285 β
β m6idn.2xlarge β 712 β 331.768934 β 87.381505 β
β m5d.2xlarge β 693 β 262.129137 β 65.735759 β
β r5b.2xlarge β 483 β 232.865149 β 59.732247 β
β r6g.2xlarge β 647 β 195.560188 β 42.264367 β
β c6a.4xlarge β 343 β 172.48896 β 46.570093 β
β r6id.4xlarge β 247 β 152.03035 β 77.222521 β
β m6i.8xlarge β 338 β 151.313531 β 105.488635 β
β m5zn.2xlarge β 313 β 146.439172 β 37.66433 β
β m7a.8xlarge β 279 β 128.709207 β 131.700535 β
β m5dn.2xlarge β 206 β 100.624052 β 25.753932 β
β r6gd.2xlarge β 364 β 84.635927 β 23.849737 β
β r6id.2xlarge β 126 β 78.014559 β 19.587254 β
β r5ad.2xlarge β 143 β 69.243293 β 17.757915 β
β m5ad.8xlarge β 122 β 67.560744 β 69.316617 β
β r5d.4xlarge β 172 β 58.593963 β 29.710109 β
β r5dn.2xlarge β 112 β 48.394754 β 12.393548 β
β x2gd.2xlarge β 125 β 45.796468 β 14.833519 β
β c6in.8xlarge β 59 β 23.07633 β 23.274454 β
β r6g.4xlarge β 31 β 16.225792 β 5.742319 β
β r5d.2xlarge β 47 β 16.11658 β 4.121735 β
β m5d.8xlarge β 55 β 14.517439 β 14.38119 β
β r6in.2xlarge β 19 β 13.441091 β 3.444952 β
β r5ad.4xlarge β 40 β 12.415226 β 6.759386 β
β c5n.9xlarge β 28 β 9.814967 β 10.964834 β
β m5n.8xlarge β 27 β 8.510248 β 8.424644 β
β m7i.8xlarge β 33 β 8.375798 β 7.812783 β
β m6in.8xlarge β 14 β 6.189432 β 6.338852 β
β m5a.8xlarge β 19 β 5.311926 β 3.826214 β
β c5ad.4xlarge β 8 β 5.29304 β 1.427534 β
β m6gd.4xlarge β 39 β 4.673265 β 1.631628 β
β r6idn.2xlarge β 10 β 3.247486 β 0.831033 β
β m5dn.8xlarge β 7 β 2.976105 β 3.062444 β
β m7i-flex.8xlarge β 6 β 1.698884 β 1.173879 β
β m6id.8xlarge β 2 β 0.27611 β 0.28293 β
β r7g.4xlarge β 1 β 0.254999 β 0.134818 β
β m6in.2xlarge β 1 β 0.148331 β 0.037958 β
ββββββββββββββββββββ΄βββββββββ΄βββββββββββββββ΄βββββββββββββββ
Subscribe to a queue:
CREATE TABLE queue
ENGINE = S3Queue('https://clickhouse-spot-instance-data-feed.s3.amazonaws.com/*.gz', TSV)
SETTINGS mode = 'unordered', s3queue_enable_logging_to_s3queue_log = 1,
input_format_tsv_skip_first_lines = 2, schema_inference_make_columns_nullable = 0;
CREATE TABLE feed ENGINE = MergeTree ORDER BY time
EMPTY
AS WITH
parseDateTimeBestEffort(c1) AS time,
extractGroups(c2, '^(\w+):(.+)$') AS item_type,
item_type[1] AS item,
item_type[2] AS type,
c4 AS id,
extract(c6, '^\S+')::Decimal(10, 6) AS max_price,
extract(c7, '^\S+')::Decimal(10, 6) AS market_price,
extract(c8, '^\S+')::Decimal(10, 6) AS charge,
charge / market_price AS hours
SELECT time, item, type, id, max_price, market_price, charge, hours
FROM queue;
CREATE MATERIALIZED VIEW consumer TO feed
AS WITH
parseDateTimeBestEffort(c1) AS time,
extractGroups(c2, '^(\w+):(.+)$') AS item_type,
item_type[1] AS item,
item_type[2] AS type,
c4 AS id,
extract(c6, '^\S+')::Decimal(10, 6) AS max_price,
extract(c7, '^\S+')::Decimal(10, 6) AS market_price,
extract(c8, '^\S+')::Decimal(10, 6) AS charge,
charge / market_price AS hours
SELECT time, item, type, id, max_price, market_price, charge, hours
FROM queue;
Monitor the queue:
:) SELECT sum(rows_processed) FROM system.s3queue_log
ββsum(rows_processed)ββ
β 1849211 β
βββββββββββββββββββββββ
:) DESCRIBE system.s3queue_log
ββnameβββββββββββββββββββ¬βtypeββββββββββββββββββββββββββββββββββ¬βdefault_typeββ¬βdefault_expressionββ¬βcommentββ¬βcodec_expressionββ¬βttl_expressionββ
β event_date β Date β β β β β β
β event_time β DateTime β β β β β β
β table_uuid β String β β β β β β
β file_name β String β β β β β β
β rows_processed β UInt64 β β β β β β
β status β Enum8('Processed' = 0, 'Failed' = 1) β β β β β β
β processing_start_time β Nullable(DateTime) β β β β β β
β processing_end_time β Nullable(DateTime) β β β β β β
β ProfileEvents β Map(String, UInt64) β β β β β β
β exception β String β β β β β β
βββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββ΄βββββββββββββββ΄βββββββββββββββββββββ΄ββββββββββ΄βββββββββββββββββββ΄βββββββββββββββββ
The data is continuously streamed into the table:
:) SELECT * FROM feed LIMIT 25
βββββββββββββββββtimeββ¬βitemβββββββ¬βtypeβββββββββββ¬βidβββββββββββββββββββ¬βmax_priceββ¬βmarket_priceββ¬βββchargeββ¬ββββhoursββ
β 2023-09-01 03:00:16 β SpotUsage β r6in.4xlarge β i-03eaee0ffa54380c1 β 1.394 β 0.5143 β 0.47187 β 0.917499 β
β 2023-09-01 03:01:57 β SpotUsage β m7a.2xlarge β i-0c63ae42ec8e55a2e β 0.463 β 0.2565 β 0.2565 β 1 β
β 2023-09-01 03:02:22 β SpotUsage β c5d.9xlarge β i-0e299eb8d2d6b6fc5 β 1.728 β 0.6275 β 0.157921 β 0.251666 β
β 2023-09-01 03:03:03 β SpotUsage β i4g.2xlarge β i-0d66a79dc81c7ae43 β 0.617 β 0.2563 β 0.020361 β 0.079442 β
β 2023-09-01 03:03:10 β SpotUsage β m6a.4xlarge β i-0d04c29fff9fa827f β 0.691 β 0.496 β 0.374893 β 0.755832 β
β 2023-09-01 03:03:16 β SpotUsage β r5dn.4xlarge β i-08d6d2e38b6cc1d8f β 1.336 β 0.5 β 0.207916 β 0.415832 β
β 2023-09-01 03:04:11 β SpotUsage β m6a.4xlarge β i-08015b849286b0d88 β 0.691 β 0.496 β 0.347475 β 0.700554 β
β 2023-09-01 03:04:12 β SpotUsage β m6a.4xlarge β i-031d72c1f42ef8d3d β 0.691 β 0.509 β 0.509 β 1 β
β 2023-09-01 03:04:12 β SpotUsage β r5dn.4xlarge β i-0dd7c5f1e4ae54981 β 1.336 β 0.5017 β 0.014911 β 0.02972 β
β 2023-09-01 03:04:13 β SpotUsage β m7a.4xlarge β i-0da7bb5b5cad27dfb β 0.927 β 0.5134 β 0.5134 β 1 β
β 2023-09-01 03:04:13 β SpotUsage β r6in.4xlarge β i-09677fc629e7bb177 β 1.394 β 0.5143 β 0.5143 β 1 β
β 2023-09-01 03:04:14 β SpotUsage β r6in.4xlarge β i-0be80b35d91b112fc β 1.394 β 0.5143 β 0.126289 β 0.245555 β
β 2023-09-01 03:04:51 β SpotUsage β c5d.9xlarge β i-0a94a38b2190e6717 β 1.728 β 0.6275 β 0.080355 β 0.128055 β
β 2023-09-01 03:05:13 β SpotUsage β m6a.4xlarge β i-04c756e5bc4550fdc β 0.691 β 0.509 β 0.509 β 1 β
β 2023-09-01 03:05:14 β SpotUsage β m7a.4xlarge β i-09a01aba465ddaf89 β 0.927 β 0.5134 β 0.5134 β 1 β
β 2023-09-01 03:05:14 β SpotUsage β r5n.4xlarge β i-0f895dcb4e96a20d4 β 1.192 β 0.5096 β 0.5096 β 1 β
β 2023-09-01 03:05:22 β SpotUsage β r5n.4xlarge β i-057e425617cdc86b2 β 1.192 β 0.4871 β 0.017725 β 0.036388 β
β 2023-09-01 03:05:32 β SpotUsage β r5dn.4xlarge β i-02c2a1a327e967b5d β 1.336 β 0.5017 β 0.041111 β 0.081943 β
β 2023-09-01 03:05:52 β SpotUsage β m6a.4xlarge β i-0f05728d26088e256 β 0.691 β 0.509 β 0.022056 β 0.043332 β
β 2023-09-01 03:06:10 β SpotUsage β r5n.4xlarge β i-0fa61797331b03cde β 1.192 β 0.4871 β 0.213512 β 0.438332 β
β 2023-09-01 03:06:18 β SpotUsage β m6idn.4xlarge β i-0a9883f45dddac9dc β 1.272 β 0.5118 β 0.457918 β 0.89472 β
β 2023-09-01 03:07:10 β SpotUsage β c5d.9xlarge β i-04a2d1d907f42badc β 1.728 β 0.6275 β 0.268081 β 0.42722 β
β 2023-09-01 03:07:15 β SpotUsage β r5dn.4xlarge β i-0fcd109ec0a5d72c9 β 1.336 β 0.4926 β 0.026271 β 0.053331 β
β 2023-09-01 03:08:00 β SpotUsage β m5ad.2xlarge β i-0d03ba891a7be45ef β 0.412 β 0.2576 β 0.010232 β 0.03972 β
β 2023-09-01 03:08:14 β SpotUsage β r5n.4xlarge β i-07cf673d6efc02bb8 β 1.192 β 0.4871 β 0.017995 β 0.036943 β
βββββββββββββββββββββββ΄ββββββββββββ΄ββββββββββββββββ΄ββββββββββββββββββββββ΄ββββββββββββ΄βββββββββββββββ΄βββββββββββ΄βββββββββββ
Single binary installation:
curl https://clickhouse.com/ | sh
Was: over 500 MB :(
Now: 107 MB :)
dbt:
— compatibility with v1.5 and v1.6;
— added a macro for Materialized View;
Tableau:
— Tableau Online works with ClickHouse!
using the MySQL compatibility protocol;
Kafka Connector:
— a new StringConverter to support text-based formats (CSV, TSV);
— added support for Bytes and Decimal data types;
— recognize retriable error codes
Updates for Python, Node.js, and Go clients.
— RAG pipelines with ClickHouse;
— Unbundling the Cloud Data Warehouse;
— Security data platform in ClickHouse;
— Enhancing Google Analytics;
Video Recordings: https://www.youtube.com/c/ClickHouseDB