1. (55 min) What's new in ClickHouse 22.9.
2. (5 min) Q&A.
ClickHouse September release.
— 12 new features
— 18 performance optimizations
— 82 bug fixes 🔥
UNION DISTINCT
INTERSECT DISTINCT
EXCEPT DISTINCT
Developer: Duc Canh Le.
SET insert_quorum = 'auto'
Wait for insertion on majority of replicas.
Developers: Sachin, Vladimir Cherkasov.
users.d/user_name.xml:
<profiles>
<user_name>
<constraints>
<max_threads>
<min>10</min>
<max>20</max>
<changeable_in_readonly/>
</max_threads>
</constraints>
</user_name>
</profiles>
SQL:
ALTER ROLE role SETTINGS max_threads = 16
MIN 10 MAX 20 CHANGEABLE_IN_READONLY
Developer: Sergei Trifonov.
Examples:
— Allow readonly user to lower the number of threads;
— additional_table_filters;
— additional_result_filter settings;
— offset and limit settings.
Developer: Sergei Trifonov.
Examples:
clickhouse-client \
--input_format_allow_errors_num 1000 \
--input_format_record_errors_file_path /tmp/parse_errors \
--errors_output_format CSV ... < data.tsv
curl https://clickhouse.cloud/\
?input_format_allow_errors_num=1000\
&input_format_record_errors_file_path=parse_errors\
&query=INSERT...
--data-binary @data.tsv
SET input_format_record_errors_file_path = 'parse_errors';
Developers: Zjial, Pavel Kruglov.
— works both on client side and on server side;
— saved to user_files directory on server-side;
— data is saved in structured format:
— time;
— database;
— table;
— error description;
— offset;
— raw data;
— data can be further analyzed with the file table function;
Developers: Zjial, Pavel Kruglov.
If ZooKeeper is unavailable at server startup,
the ReplicatedMergeTree tables will start in read-only mode
and initialize asynchronously in background
as soon as ZooKeeper will be available.
— the same applicable for ClickHouse Keeper as well;
— especially useful for embedded ClickHouse Keeper;
Developers: Antonio Andelic.
Data import from JSON, JSONCompact, JSONColumnsWithMetadata;
New controls:
— input_format_json_read_numbers_as_strings;
— output_format_json_quote_decimals;
— input_format_json_validate_utf8;
— input_format_json_validate_types_from_metadata;
A new JSONObjectEachRow format;
Developer: Pavel Kruglov.
Demo
???
Use Virtual Filesystems!
And host readonly table on HTTP server with disk web!
SELECT data_paths FROM system.tables
WHERE name = 'github_events' AND database = 'default'
sudo -u clickhouse clickhouse static-files-disk-uploader \
--metadata-path \
/var/lib/clickhouse/store/127/127f4241-4a9b-4ecd-8a84-846b88069cb5/ \
--link --output-dir github_events_web
find web -not -type d | xargs -P10 -I{} \
aws s3 cp {} s3://clickhouse-public-datasets/{}
Demo
— Asynchronous loading of marks.
Developer: Kseniia Sumarokova.
— Compressed marks and primary key.
Developer: Zhong Yuan Kai.
— Composable cache.
Developer: Kseniia Sumarokova.
— Speed up GROUP BY with CPU prefetcher:
— Speed up GROUP BY with better block sizes.
Demo
Developer: Nikita Taranov.
Approximate Nearest Neighbor Indices
for Vector Search queries.
ClickHouse is not a vector search database... but why not?
We tried to integrate Faiss, HNSW, ANNoy, DiskANN, ScaNN...
... most of the attempts failed.
INDEX idx_name embedding_column
TYPE annoy(100) GRANULARITY 1
Developers: Arthur Filatenkov, Vladimir Makarov, Danila Mishin,
Nikita Vasilenko, Alexander Piachonkin, Nikita Evsiukov, Hakob Sagatelyan.
Approximate Nearest Neighbor Indices
for Vector Search queries.
INDEX idx_name embedding_column
TYPE annoy(100) GRANULARITY 1
Speeds up:
WHERE L2Distance(embedding_column, const) < threshold LIMIT n;
ORDER BY L2Distance(embedding_column, const) LIMIT n;
Developers: Arthur Filatenkov, Vladimir Makarov, Danila Mishin,
Nikita Vasilenko, Alexander Piachonkin, Nikita Evsiukov, Hakob Sagatelyan.
<clickhouse>
<keeper_map_path_prefix>/test_keeper_map</keeper_map_path_prefix>
</clickhouse>
CREATE TABLE map (key String, value UInt32)
ENGINE = KeeperMap('/path/in/zk') PRIMARY KEY (key);
Key-value table for small volumes of data.
Prefers ClickHouse Keeper but can use ZooKeeper as well.
Developers: Antonio Andelic.
Official ClickHouse driver for Node.JS:
https://github.com/ClickHouse/clickhouse-js
Developer: Sergei Klochkov.
Grafana Plugin 2.0:
https://clickhouse.com/blog/clickhouse-grafana-plugin-2.0
Support for HTTP protocol.
Semistructured data support.
ClickHouse native filters and variables.
Developers: Andrew Hackmann, Dale McDiarmid, Ryadh Dahimene.
There are a ton of integrations for data visualization and exploration.
Dashboards: Grafana, Superset, Metabase, Redash.
Notebooks: Jupyter, Observable, Deepnote, HEX.
What about... building my own dashboard tool in a weekend? 🤔
Demo
https://pastila.nl/?00399939/04b0fc80fdaaa2dbbc1783f98f4614a6.html
https://pastila.nl/?00399939/ee598c63b98acfe4939898c686b64df5.html