ClickHouse: Release 22.9 Webinar

ClickHouse
Release 22.9

Release 22.9 Webinar

1. (55 min) What's new in ClickHouse 22.9.

2. (5 min) Q&A.

🌤

Release 22.9

ClickHouse September release.

— 12 new features

— 18 performance optimizations

— 82 bug fixes 🔥

Warm Up

UNION DISTINCT INTERSECT DISTINCT EXCEPT DISTINCT

Developer: Duc Canh Le.

Warm Up

SET insert_quorum = 'auto'

Wait for insertion on majority of replicas.

Developers: Sachin, Vladimir Cherkasov.

Settings Changeable For Readonly Users

users.d/user_name.xml:

<profiles> <user_name> <constraints> <max_threads> <min>10</min> <max>20</max> <changeable_in_readonly/> </max_threads> </constraints> </user_name> </profiles>

SQL:

ALTER ROLE role SETTINGS max_threads = 16 MIN 10 MAX 20 CHANGEABLE_IN_READONLY

Developer: Sergei Trifonov.

Settings Changeable For Readonly Users

Examples:

— Allow readonly user to lower the number of threads;

additional_table_filters;

additional_result_filter settings;

offset and limit settings.

Developer: Sergei Trifonov.

Saving Parsing Errors

Examples:

clickhouse-client \ --input_format_allow_errors_num 1000 \ --input_format_record_errors_file_path /tmp/parse_errors \ --errors_output_format CSV ... < data.tsv

curl https://clickhouse.cloud/\ ?input_format_allow_errors_num=1000\ &input_format_record_errors_file_path=parse_errors\ &query=INSERT... --data-binary @data.tsv

SET input_format_record_errors_file_path = 'parse_errors';

Developers: Zjial, Pavel Kruglov.

Saving Parsing Errors

— works both on client side and on server side;

— saved to user_files directory on server-side;

— data is saved in structured format:
  — time;
  — database;
  — table;
  — error description;
  — offset;
  — raw data;

— data can be further analyzed with the file table function;

Developers: Zjial, Pavel Kruglov.

Asynchronous Initialization Of Tables

If ZooKeeper is unavailable at server startup,
the ReplicatedMergeTree tables will start in read-only mode
and initialize asynchronously in background
as soon as ZooKeeper will be available.

— the same applicable for ClickHouse Keeper as well;

— especially useful for embedded ClickHouse Keeper;

Developers: Antonio Andelic.

New And Improved JSON Formats

Data import from JSON, JSONCompact, JSONColumnsWithMetadata;

New controls:
— input_format_json_read_numbers_as_strings;
— output_format_json_quote_decimals;
— input_format_json_validate_utf8;
— input_format_json_validate_types_from_metadata;

A new JSONObjectEachRow format;

Developer: Pavel Kruglov.

New And Improved JSON Formats

Demo

How To Make ClickHouse Slow?

???

How To Make ClickHouse Slow?

Use Virtual Filesystems!

And host readonly table on HTTP server with disk web!

SELECT data_paths FROM system.tables WHERE name = 'github_events' AND database = 'default' sudo -u clickhouse clickhouse static-files-disk-uploader \ --metadata-path \ /var/lib/clickhouse/store/127/127f4241-4a9b-4ecd-8a84-846b88069cb5/ \ --link --output-dir github_events_web find web -not -type d | xargs -P10 -I{} \ aws s3 cp {} s3://clickhouse-public-datasets/{}

How To Make ClickHouse Slow?

Demo

Performance Improvements

— Asynchronous loading of marks.

Developer: Kseniia Sumarokova.

— Compressed marks and primary key.

Developer: Zhong Yuan Kai.

— Composable cache.

Developer: Kseniia Sumarokova.

Performance Improvements

— Speed up GROUP BY with CPU prefetcher:

— Speed up GROUP BY with better block sizes.

Demo

Developer: Nikita Taranov.

Experimental Features

Approximate Nearest Neighbor Indices
for Vector Search queries.

ClickHouse is not a vector search database... but why not?

We tried to integrate Faiss, HNSW, ANNoy, DiskANN, ScaNN...
... most of the attempts failed.

INDEX idx_name embedding_column TYPE annoy(100) GRANULARITY 1

Developers: Arthur Filatenkov, Vladimir Makarov, Danila Mishin,
Nikita Vasilenko, Alexander Piachonkin, Nikita Evsiukov, Hakob Sagatelyan.

Experimental Features

Approximate Nearest Neighbor Indices
for Vector Search queries.

INDEX idx_name embedding_column TYPE annoy(100) GRANULARITY 1

Speeds up:

WHERE L2Distance(embedding_column, const) < threshold LIMIT n;

ORDER BY L2Distance(embedding_column, const) LIMIT n;

Developers: Arthur Filatenkov, Vladimir Makarov, Danila Mishin,
Nikita Vasilenko, Alexander Piachonkin, Nikita Evsiukov, Hakob Sagatelyan.

KeeperMap Table Engine

<clickhouse> <keeper_map_path_prefix>/test_keeper_map</keeper_map_path_prefix> </clickhouse>

CREATE TABLE map (key String, value UInt32) ENGINE = KeeperMap('/path/in/zk') PRIMARY KEY (key);

Key-value table for small volumes of data.

Prefers ClickHouse Keeper but can use ZooKeeper as well.

Developers: Antonio Andelic.

Integrations

Official ClickHouse driver for Node.JS:

https://github.com/ClickHouse/clickhouse-js

Developer: Sergei Klochkov.

Grafana Plugin 2.0:

https://clickhouse.com/blog/clickhouse-grafana-plugin-2.0

Support for HTTP protocol.
Semistructured data support.
ClickHouse native filters and variables.

Developers: Andrew Hackmann, Dale McDiarmid, Ryadh Dahimene.

Bonus

There are a ton of integrations for data visualization and exploration.

Dashboards: Grafana, Superset, Metabase, Redash.

Notebooks: Jupyter, Observable, Deepnote, HEX.

What about... building my own dashboard tool in a weekend? 🤔

Embedded Dashboards

Demo

https://pastila.nl/?00399939/04b0fc80fdaaa2dbbc1783f98f4614a6.html

https://pastila.nl/?00399939/ee598c63b98acfe4939898c686b64df5.html

Q&A