ClickHouse: Release 22.5 Webinar

ClickHouse
Release 22.5

Release 22.5 Webinar

1. (30 min) What's new in ClickHouse 22.5.

2. (30 min) Q&A.

Release 22.5

May's stable release.

GROUPING SETS

GROUP BY modifiers:

— WITH TOTALS;

— WITH ROLLUP;

— WITH CUBE;

— GROUPING SETS;

Developers: Dmitriy Novik, taylor12805, Maksim Sipliviy.

GROUPING SETS

GROUP BY modifiers:

(demo)

GROUPING SETS

— WITH TOTALS:
  add a row with total values;

— WITH ROLLUP;
  aggregate by all prefixes of keys:
  given k1, k2, aggregate by (), (k1), (k1, k2);

— WITH CUBE;
  aggregate by all subsets of keys;
  given k1, k2, aggregate by (), (k1), (k2), (k1, k2);

new: GROUPING SETS;
  aggregate by any sets of keys;
  generalization of all the above;

GROUPING SETS

Upcoming:

GROUPING function to distinguish different sets.

rollup_use_null setting for standard compatibility.

New Formats

MySQLDump as input format:

-- -- Table structure for table `test` -- DROP TABLE IF EXISTS `test`; /*!40101 SET @saved_cs_client = @@character_set_client */; /*!50503 SET character_set_client = utf8mb4 */; CREATE TABLE `test` ( `x` int DEFAULT NULL, `y` int DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci; /*!40101 SET character_set_client = @saved_cs_client */; -- -- Dumping data for table `test` -- LOCK TABLES `test` WRITE; /*!40000 ALTER TABLE `test` DISABLE KEYS */; INSERT INTO `test` VALUES (1,NULL),(2,NULL),(3,NULL),(3,NULL),(4,NULL), (5,NULL),(6,7); /*!40000 ALTER TABLE `test` ENABLE KEYS */; UNLOCK TABLES;

New Formats

MySQLDump as input format:

Use case:

— you cannot connect to MySQL for direct import;

— you cannot create a TSV dump from MySQL;

— all that you have is a table dump from MySQL;

Developer: Pavel Kruglov.

New Formats

MySQLDump as input format:

Options:

input_format_mysql_dump_table_name
— select one of the tables from the dump;

input_format_mysql_dump_map_column_names
— match corresponding column names;

Bonus:

MySQLDump format supports schema inference!

SELECT * FROM file('dump.sql', MySQLDump);

New Formats

Prometheus as output format

Use case:

— output data exactly as Prometheus expects;

— define a custom HTTP handler for a query with custom metrics;

Developer: Vladimir Cherkasov.

New Formats

Prometheus as output format

Result should have columns:

— name;
— value;
— type (counter, gauge, histogram, summary, untyped or empty);
— labels (optional);
— help (optional);

New Formats

Prometheus as output format

# HELP metric0 info 0 # TYPE metric0 counter metric0 0 metric1 1 # HELP metric2 info 2 metric2 2 1395066363000 # TYPE metric3 counter metric3 3

Bonus:
— histograms are supported.

New Formats

Prometheus as output format

global: scrape_interval: 3s scrape_configs: - job_name: node metrics_path: '/' params: query: ["SELECT ... FORMAT Prometheus"] static_configs: - targets: ['localhost:8123']

Security And Compliance

FIPS 140-2 compliance:

— "crypto" module of the SSL library (BoringSSL)
  is switched to FIPS 140-2 compliant version.

Developer: Meena-Renganathan.

system.certificates table:

— introspection of the root certificates that ClickHouse can use.

— check that no malicious certificates are used;
  check issues with SSL certificates.

Developer: Yakov Olkhovskiy.

Flexible Memory Limits

Previous versions:

— rigid max_memory_usage, set to 10 GB by default;
max_memory_usage_for_user can be set;
— when memory limit exceeded, exception is thrown;

Since 22.5:

max_memory_usage is not set by default;
— new settings memory_overcommit_ratio_denominator, memory_overcommit_ratio_denominator_for_user;

Developers: Dmitriy Novik.

Flexible Memory Limits

Since 22.5:

max_memory_usage is not set by default;

— new settings memory_overcommit_ratio_denominator,
memory_overcommit_ratio_denominator_for_user;

— in case of memory shortage,
  the most "overcommitted" query will be killed;

— other queries will wait for up to
memory_usage_overcommit_max_wait_microseconds;

max_memory_usage and max_memory_usage_for_user
  can act as hard limits;

Developers: Dmitriy Novik.

Flexible Memory Limits

TLDR:

Heavy query can use as much memory as available
as long as no smaller queries need memory.

Developers: Dmitriy Novik.

Performance Optimizations

avg and sum without GROUP BY:

SELECT sum(revenue), avg(price) FROM table

Up to 1.5 times faster!

Developers: Maksim Kita.

Performance Optimizations

Unary arithmetic functions
bitCount, bitNot, abs, sign, intExp*, negate, round*:

Up to 7 times faster!!!

Developers: Maksim Kita.

Performance Optimizations

JIT compilation for sorting comparator.

For ORDER BY, INSERT INTO MergeTree, Merge JOIN.

SELECT * FROM test ORDER BY value

— 0 rows in set. Elapsed: 5.299 sec. Processed 100.00 million rows, 1.70 GB (18.87 million rows/s., 320.81 MB/s.)

Performance Optimizations

SET compile_sort_description=1; SELECT * FROM test ORDER BY value

— 0 rows in set. Elapsed: 4.131 sec. Processed 100.00 million rows, 1.70 GB (24.21 million rows/s., 411.50 MB/s.)
+30% performance improvement!

Developers: Maksim Kita.

Bonus: Vector Search

Indices for vector embeddings.

Search for nearest objects: images, texts, etc...

FAISS, Annoy, HNSW, DiskAnn.

Prototype stage, expected in Autumn 2022.

What Else?

ClickHouse Cloud Private Preview

Already tested by first customers!

Sign up for wait-list: https://clickhouse.com/cloud/

Serverless operation, no infrastructure to manage.

What's Next

Roadmap 2022 is published:

https://github.com/ClickHouse/ClickHouse/issues/32513

— open for discussions.

... all great stuff is discussed there, don't miss.

Q&A