1. (55 min) What's new in ClickHouse 24.5.
2. (5 min) Q&A.
ClickHouse Spring/Summer Release.
β 19 new features πΏ
β 20 performance optimizations π΄β
β 68 bug fixes π
CREATE TEMPORARY TABLE test (
x Map(Array(Map(Float64, String)), String));
INSERT INTO test VALUES
({[]: 'Hello'}),
({[{1.1: 'test'}]: 'World'});
SELECT * FROM test1;
ββxβββββββββββββββββββββββββ
1. β {[]:'Hello'} β
2. β {[{1.1:'test'}]:'World'} β
ββββββββββββββββββββββββββββ
Developer: TaiYang Li.
It was available for data import since 23.10.
Since 24.5 it also supports data export.
SELECT [number, number * 2]
FROM numbers(10) INTO OUTFILE 'test.npy' -- or FORMAT Npy
SELECT * FROM 'test.npy'
python3 -c "
import numpy as np;
array = np.load('test.npy');
print(array)"
Demo.
Developer: HowePa.
:) SELECT greatest(5, least(10, x))
:) SELECT max2(5, min2(10, x))
:) SELECT clamp(x, 5, 10)
Developer: skyoct.
SELECT * FROM s3(
's3://umbrella-static/top-1m-2024-01-*.csv.zip :: *.csv')
LIMIT 10;
SELECT _path, c1 FROM s3(
's3://umbrella-static/top-1m-2024-01-*.csv.zip :: *.csv')
WHERE c2 = 'clickhouse.com' ORDER BY _path;
The contents of container formats, such as *.tar.*, *.zip, *.7z
can be read directly from S3.
For the local filesystem — already supported since 23.8.
Developer: Dan Ivanik.
INSERT INTO test (c1, c2,) VALUES ('Hello', 'World',);
INSERT INTO test
(
c1,
c2,
)
VALUES
(
'Hello',
'World',
);
Developer: Alexey Milovidov.
ch -q "SELECT * FROM 'test.parquet' INTO OUTFILE 'test.csv.gz'"
Since version 24.5 it works even with shell redirects!
ch -q "SELECT * FROM 'test.parquet'" > test.csv.gz
Developers: v01dXYZ.
clickhouse-local --query "SELECT 1"
clickhouse-local -q "SELECT 1"
echo "SELECT 1" | clickhouse local
ch -q "SELECT 1"
echo "SELECT 1" | ch
ch "SELECT 1" # since 24.5
Developers: Alexey Milovidov.
SELECT 'Hello, world';
SELECT βHello, worldβ;
Why??? Demo.
Developers: Alexey Milovidov.
UTF8 functions have shortcuts when a string is ASCII-only:
— substringUTF8, substringIndexUTF8, reverseUTF8,
lower/upperUTF8, left/rightPadUTF8.
Developer: TaiYang Li.
For indices of type set.
Index analysis can take a long time, delaying query processing
especially for tables with trillions of rows or with a low index_granularity.
Now up to 1.5 times faster!
Developer: Alexey Milovidov.
When DateTime is compared with a constant DateTime64.
Developer: Alexey Milovidov.
prefix{1,2,3}/file*.csv
Was: make a regular expression
and match it against all paths for a prefix.
Now: interpret {1,2,3} as lazy generators,
only match file*.csv inside prefix1/, prefix2/, prefix3/.
Faster when there is a huge number of files.
Developer: Andrey Zvonov.
A new experimental data type for semistructured data.
Similar to the Variant data type, but dynamic.
Variant(String, UInt64, Array(String)) — anything from the list of types.
Dynamic — anything, automatically extending the list of types.
Developer: Pavel Kruglov
Using Dynamic type in table's definition:
CREATE TABLE test (d Dynamic) ENGINE = Memory;
INSERT INTO test VALUES (NULL), (42), ('Hello, World!'), ([1, 2, 3]);
SELECT d, dynamicType(d) FROM test;
ββdββββββββββββββ¬βdynamicType(d)ββ
β α΄Ία΅α΄Έα΄Έ β None β
β 42 β Int64 β
β Hello, World! β String β
β [1,2,3] β Array(Int64) β
βββββββββββββββββ΄βββββββββββββββββ
Developer: Pavel Kruglov
CAST from an ordinary column:
SELECT 'Hello, World!'::Dynamic as d, dynamicType(d);
ββdββββββββββββββ¬βdynamicType(d)ββ
β Hello, World! β String β
βββββββββββββββββ΄βββββββββββββββββ
CAST from a Variant column:
SET allow_experimental_variant_type = 1, use_variant_as_common_type = 1;
SELECT multiIf(number % 3 = 0, number,
number % 3 = 1, range(number + 1), NULL)::Dynamic AS d, dynamicType(d)
FROM numbers(3)
ββdββββββ¬βdynamicType(d)ββ
β 0 β UInt64 β
β [0,1] β Array(UInt64) β
β α΄Ία΅α΄Έα΄Έ β None β
βββββββββ΄βββββββββββββββββ
Reading subcolumns of the Dynamic value:
SELECT d, dynamicType(d), d.String, d.`Array(Int64)` FROM test;
ββdββββββββββββββ¬βdynamicType(d)ββ¬βd.Stringβββββββ¬βd.Array(Int64)ββ
β α΄Ία΅α΄Έα΄Έ β None β α΄Ία΅α΄Έα΄Έ β [] β
β 42 β Int64 β α΄Ία΅α΄Έα΄Έ β [] β
β Hello, World! β String β Hello, World! β [] β
β [1,2,3] β Array(Int64) β α΄Ία΅α΄Έα΄Έ β [1,2,3] β
βββββββββββββββββ΄βββββββββββββββββ΄ββββββββββββββββ΄βββββββββββββββββ
SELECT toTypeName(d.String) FROM test;
ββtoTypeName(d.String)ββ
β Nullable(String) β
ββββββββββββββββββββββββ
SELECT d, dynamicType(d), dynamicElement(d, 'String') FROM test;
ββdββββββββββββββ¬βdynamicType(d)ββ¬βdynamicElement(d, 'String')ββ
β α΄Ία΅α΄Έα΄Έ β None β α΄Ία΅α΄Έα΄Έ β
β 42 β Int64 β α΄Ία΅α΄Έα΄Έ β
β Hello, World! β String β Hello, World! β
β [1,2,3] β Array(Int64) β α΄Ία΅α΄Έα΄Έ β
βββββββββββββββββ΄βββββββββββββββββ΄ββββββββββββββββββββββββββββββ
Type inference from data formats:
SELECT d, dynamicType(d)
FROM format(JSONEachRow, 'd Dynamic',
'
{"d" : "Hello, World!"},
{"d" : 42},
{"d" : 42.42},
{"d" : "2020-01-01"},
{"d" : [1, 2, 3]}
')
ββdββββββββββββββ¬βdynamicType(d)ββ
β Hello, World! β String β
β 42 β Int64 β
β 42.42 β Float64 β
β 2020-01-01 β Date β
β [1,2,3] β Array(Int64) β
βββββββββββββββββ΄βββββββββββββββββ
Support for "workload identity" — Vinay Suryadevara.
Optimizing backups with the server-side copy — Alexander Sapin.
Support for plain_rewritable metadata for azure blob storage — Julia Kartseva.
Azure is production ready since 24.5!
What for?
What for?
We launched ClickHouse Cloud on Azure (beta)!
https://clickhouse.com/blog/clickhouse-cloud-is-now-on-azure-in-public-beta
ClickHouse Cloud: now on AWS, GCP, and Azure,
... and a partner service on AliCloud.
INSERT INTO test FORMAT Form
name=Vasya&[email protected]&text=I%20like%20ClickHouse%20so%20much!
Inserting directly from HTML forms (x-www-form-urlencoded MIME type)
Direct integration with JavaScript libraries, such as Boomerang.js.
Supports inserting a single record
(asynchronous inserts are recommended).
Developer: Shaun Struwig.
From now on, you will see JOIN improvements in every ClickHouse release.
In-memory compression for CROSS (comma) JOIN:
— fast processing of a large table's product.
On-disk processing for CROSS (comma) JOIN:
— generate a large table's product even when
the right hand side is too large.
Developer: Maksim Alekseev.
SET allow_experimental_join_condition = 1;
And now ClickHouse supports not only equality conditions in ON.
Developer: Lgbo-USTC.
Fast processing of Pandas Dataframes.
Developer: Auxten
Custom data types: any user-provided data type could be processed:
class myReader(chdb.PyReader):
def __init__(self, data):
self.data = data
self.cursor = 0
super().__init__(data)
def read(self, col_names, count):
block = [self.data[col] for col in col_names]
block = [col[self.cursor : self.cursor + count] for col in block]
self.cursor += block[0].shape[0]
return block
reader = myReader(df_old)
ret = chdb.query("""
SELECT RegionID, SUM(AdvEngineID), COUNT(*) AS c
FROM Python(reader) GROUP BY RegionID ORDER BY c DESC LIMIT 10""",
"DataFrame")
Developer: Auxten
Now we have:
— x86_64 (production);
— aarch64 (production);
— risc-v (experimental);
— powerpc64-le (experimental);
— s390x (experimental);
— loongarch64 (experimental);
Developer: QiangXuhui
ClickPipes:
— Kinesis support;
— allow to set offsets for Kafka;
— meta-columns key, timestamp, headers, topic, offset for Kafka;
Metabase — directly import CSV into ClickHouse.
Updates for dbt and Tableau integrations,
as well as Java, Go, and Python drivers.
+ Thanks for many fixes to our contributors:
R. Joel Norgren, Guo Qiang Wang,
JiΕΓ KozlovskΓ½, Abhishek Gahlot, Frederik EycheniΓ©, Cal Herries.
— Real-world AWS Graviton 4 performance;
— The new SQL console;
— API endpoints;
— Flexible backups;
— How to become a certified ClickHouse developer;
— Why ClickHouse is faster than Elasticsearch;