CREATE VIEW wiki
AS SELECT toStartOfMonth(time),
          sum(hits) AS h,
          bar(h, 0, max(h) OVER (), 100)
    FROM wikistat
    WHERE path = {page:String}
    GROUP BY 1
    ORDER BY 1;
SELECT * FROM wiki(page = 'ClickHouse');
Developer: Smita Kulkarni.
CREATE DICTIONARY user_agent
(
    regexp String,
    name String,
    version UInt16
)
PRIMARY KEY(regexp)
SOURCE(YAMLRegExpTree(PATH '/.../regexp_tree.yaml'))
LAYOUT(regexp_tree)
Developers: Vage Ogannisian, Han Fei.
- regexp: 'Linux/(\d+[\.\d]*).+tlinux'
  name: 'TencentOS'
  version: '\1'
- regexp: '\d+/tclwebkit(?:\d+[\.\d]*)'
  name: 'Andriod'
  versions:
    - regexp: '33/tclwebkit'
      version: 13
    - regexp: '3[12]/tclwebkit'
      version: 12
    - regexp: '30/tclwebkit'
      version: 11
    - regexp: '29/tclwebkit'
      version: 10
SELECT dictGet('user_agent', ('name', 'version'), UserAgent);
— Traverses the tree, and determines the values of the attributes.
— The tree can be arbitrarily deep, 
  and each nested level can override the values.
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36
— All regular expressions are checked in a single pass for performance!
— Can be loaded from YAML file or from a table in any source.
$ cat /etc/clickhouse-server/config.d/query_cache.yaml
query_result_cache:
    size: 1073741824
    max_entries: 1024
    max_entry_size: 104857600
    max_entry_records: 30000000
SET enable_experimental_query_result_cache = 1;
Developers: Mikhail Stetsyuk, Robert Schulze.
Allows to control on per-query basis:
— min query runs to cache the result;
— min query runtime to cache the result;
— max result size to put into cache;
— max staleness to use the cached entry;
— passive usage of the existing entries in cache;
— caching of queries with non-deterministic functions;
— sharing the cache between users;
Developers: Mikhail Stetsyuk, Robert Schulze.
Next steps: compressed cache; on disk cache; cache of intermediate data.
ALTER TABLE hackernews_indexed
  ADD INDEX inv_idx(text) TYPE inverted;
ALTER TABLE hackernews_indexed MATERIALIZE INDEX inv_idx;
Supports tokens and n-grams.
Optimizes hasToken, multiSearchAny, equality comparison...
Developer: Larry Luo, Harry Lee, Robert Schulze.
:) SELECT count() FROM hackernews WHERE hasToken(text, 'ClickHouse')
┌─count()─┐
│     948 │
└─────────┘
1 row in set. Elapsed: 0.579 sec. Processed 33.95 million rows,
11.61 GB (58.63 million rows/s., 20.05 GB/s.)
:) SELECT count() FROM hackernews_indexed WHERE hasToken(text, 'ClickHouse')
┌─count()─┐
│     948 │
└─────────┘
1 row in set. Elapsed: 0.206 sec. Processed 3.71 million rows,
1.32 GB (18.00 million rows/s., 6.43 GB/s.)
Developer: Larry Luo, Harry Lee.
— full and incremental backups;
— tables, partitions, databases, all databases;
— full or partial restore;
— a bunch of files or a single archive;
— atomic snapshot for MergeTree,
  best effort snapshot for multiple tables;
Developer — Vitaliy Baranov.
BACKUP TABLE t TO File('backup_20220629');
BACKUP TABLE t TO File('backup_20220629.zip');
BACKUP TABLE t TO File('backup_20220630')
    SETTINGS base_backup = File('backup_20220629');
BACKUP TABLE system.users, system.grants TO ...
BACKUP TABLE system.functions TO ...
Developer — Vitaliy Baranov.
BACKUP|RESTORE
  TABLE [db.]table_name [AS [db.]table_name_in_backup]
    [PARTITION[S] partition_expr [,...]] |
  DICTIONARY [db.]dictionary_name [AS [db.]name_in_backup] |
  DATABASE database_name [AS database_name_in_backup]
    [EXCEPT TABLES ...] |
  TEMPORARY TABLE table_name [AS table_name_in_backup] |
  ALL TEMPORARY TABLES [EXCEPT ...] |
  ALL DATABASES [EXCEPT ...] } [,...]
  [ON CLUSTER 'cluster_name']
  TO|FROM File('path/') | Disk('disk_name', 'path/') | S3(...)
  [SETTINGS base_backup = File(...) | Disk(...)]
Developer — Vitaliy Baranov.
ClickHouse should always work.
If ZooKeeper is unavailable at server startup,
the ReplicatedMergeTree tables will start in read-only mode
and initialize asynchronously in background
as soon as ZooKeeper will be available.
— the same applicable for ClickHouse Keeper as well;
— especially useful for embedded ClickHouse Keeper;
Developers: Antonio Andelic. Since 22.9.
Session expired. Table is in readonly mode 😠
Never again:
SET insert_keeper_max_retries = 10;
INSERT will survive restarts of ClickHouse Keeper or ZooKeeper
and reconnections.
Developer: Igor Nikonov. Since 22.11.
Table is in readonly mode 😠
Was: around one minute for reconnection, for no reason.
Now: milliseconds 🥲
Developer: Raul Marin. Since 22.10.
Was: strict per-query limit, max_memory_usage = 10 GB by default.
Now: query can use the memory if it's available;
in case of memory shortage, the most "overcommitted" query is terminated with exception.
Developer: Dmitry Novik.
ClickHouse should support everything you expect.
WITH example AS (
    SELECT '2021-01-01' AS date,
        1 AS node, 1 AS user)
SELECT extra_data FROM (
    SELECT join1.*
    FROM example
    LEFT JOIN (
        SELECT '2021-01-01' AS date,
            1 AS extra_data) AS join1
    ON example.date = join1.date
    LEFT JOIN (
        SELECT '2021-01-01' AS date) AS join2
    ON example.date = join2.date)
Was: Missing columns: 'extra_data' while processing query...
Now: SET allow_experimental_analyzer = 1;
- works perfectly.
SELECT * FROM (SELECT SUM(COALESCE(SEQ_VONR_MO_CALL_CONN_FAIL_TIMES_C, 0)) AS VONR_MO_CALL_CONN_FAIL_TIMES, MT.`102520001` AS `102520001`, MT.`181361814368` AS `181361814368`, MT.`102520102` AS `102520102`, MT.`102520101` AS `102520101`, MT.`102520104` AS `102520104`, MT.`183111861371` AS `183111861371`, MT.`102530101` AS `102530101`, MT.`102540101` AS `102540101`, MT.`102520103` AS `102520103`, MT.`102510101` AS `102510101` FROM ( SELECT COALESCE(SUM(VONR_MO_CALL_CONN_FAIL_TIMES), 0) AS SEQ_VONR_MO_CALL_CONN_FAIL_TIMES_C, COM_FAIL_CAUSE AS `102520001`, NULL AS `102520102`, COM_FAIL_CAUSE AS `102510101`, NULL AS `102520101`, D183111570684_H101.`183111861371` AS `183111861371`, NULL AS `102520104`, NULL AS `102520103`, H_COMPREHENSIVE_FAILURE_CAUSE.`102540101` AS `102540101`, H_COMPREHENSIVE_FAILURE_CAUSE.`102530101` AS `102530101`, concat('14', '-', '255', '-', '255', '-', SIP_RELEASE_CODE) AS `181361814368` FROM TEST_DATABASE.SDR_TEST LEFT JOIN ( SELECT concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID) AS `102510101`, UNIFIED_CAUSE_ID AS `183111861371`, concat(FAILCAUSE, '(', PD, ')') AS NAME_102510101 FROM TEST_DATABASE.DIM_TEST GROUP BY concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID), UNIFIED_CAUSE_ID, concat(FAILCAUSE, '(', PD, ')') ) AS D183111570684_H101 ON COM_FAIL_CAUSE = D183111570684_H101.`102510101` LEFT JOIN ( SELECT concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID) AS `102520001`, SCENE_ID AS `102540101`, CLASS_ID AS `102530101`, SCENE_NAME AS NAME_102540101 FROM TEST_DATABASE.DIM_TEST GROUP BY concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID), SCENE_ID, CLASS_ID, SCENE_NAME ) AS H_COMPREHENSIVE_FAILURE_CAUSE ON COM_FAIL_CAUSE = H_COMPREHENSIVE_FAILURE_CAUSE.`102520001` LEFT JOIN ( SELECT concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID) AS `181361814368`, CAUSE AS NAME_181361814368 FROM TEST_DATABASE.DIM_TEST GROUP BY concat(PROTOCOL_ID, '-', FIRFAILMSG_ID, '-', PD_ID, '-', CAUSE_ID), CAUSE ) AS DIM_FAILCAUSE_ALL_181361814368 ON concat('14', '-', '255', '-', '255', '-', SIP_RELEASE_CODE) = DIM_FAILCAUSE_ALL_181361814368.`181361814368` WHERE (H_COMPREHENSIVE_FAILURE_CAUSE.NAME_102540101 IS NOT NULL) AND (D183111570684_H101.NAME_102510101 IS NOT NULL) AND (DIM_FAILCAUSE_ALL_181361814368.NAME_181361814368 IS NOT NULL) GROUP BY `102520001`, `102520102`, `102510101`, `102520101`, D183111570684_H101.`183111861371`, `102520104`, `102520103`, H_COMPREHENSIVE_FAILURE_CAUSE.`102540101`, H_COMPREHENSIVE_FAILURE_CAUSE.`102530101`, `181361814368` ) AS MT GROUP BY MT.`102520001`, MT.`181361814368`, MT.`102520102`, MT.`102520101`, MT.`102520104`, MT.`183111861371`, MT.`102530101`, MT.`102540101`, MT.`102520103`, MT.`102510101` ) AS ST WHERE ST.VONR_MO_CALL_CONN_FAIL_TIMES > 0 ORDER BY VONR_MO_CALL_CONN_FAIL_TIMES DESC LIMIT 0, 5000
SELECT
    y::String || '.'
        || (y < toYear(today()) - 2000 - 1 ? '*' : m::String) AS Version,
    (n <= 3 OR (is_lts AND lts_n <= 2)) ? '✔️' : '❌' AS Supported
FROM (
    SELECT y, m,
        count() OVER (ORDER BY y DESC, m DESC) AS n,
        m IN (3, 8) AS is_lts,
        countIf(is_lts) OVER (ORDER BY y DESC, m DESC) AS lts_n
    FROM (
        WITH extractGroups(version, 'v(\d+)\.(\d+)') AS v,
             v[1]::INT AS y, v[2]::INT AS m
        SELECT y, m
        FROM file('version_date.tsv', TSV, 'version String, date String')
        ORDER BY y DESC, m DESC
        LIMIT 1 BY y, m)
)
LIMIT 1 BY Version
FORMAT Markdown
SET mutations_sync = 1;
ALTER TABLE hits
DELETE WHERE Title LIKE '%Mongo%';
— 205 sec (for a table with 100 million records).
DELETE FROM hits
WHERE Title LIKE '%Mongo%';
— ??? sec.
Developers: Alexander Gololobov, Jianmei Zhang.
SELECT DISTINCT repo_name, title
FROM github_events
WHERE title ILIKE (
  repo_name LIKE '%ClickHouse%' ? '%fast%' : '%slow%')
AND repo_name IN ('ClickHouse/ClickHouse', 'elastic/elasticsearch')
Now I can put LIKE inside LIKE and looks like you're going to like it.
Developer: Robert Schulze. Since 22.6.
ClickHouse never slows down!
Speed-up of SELECT with FINAL modifier.
It "simply" improves performance up to 4 times.
Especially for complex transforms like Collapsing and Replacing.
Developer: Nikita Taranov.
Optimize COUNT(DISTINCT ...) for low number of GROUP BY keys.
Optimize GROUP BY with CPU prefetcher.
Optimize GROUP BY with better block sizes.
Developer: Nikita Taranov.
— "direct" algorithm:
  to join with key-value tables by direct lookups a.k.a. nested loops.
  Good if the left table is small, but the right table is like a large dictionary.
  Good to use in MATERIALIZED VIEW queries.
— "parallel_hash" algorithm:
  speed-up if the right hand table is large.
— "full_sorting_merge" algorithm:
  when right hand side is large
  and does not fit in memory and does not allow lookups.
— "grace_hash" algorithm:
  since in 22.12.
Developer: Vladimir Cherkasov, lgbo-ustc.
 
ClickHouse can work as a server (clickhouse-server)
or as a tool without installation (clickhouse-local).
ClickHouse can store the data
or process externally stored data on the fly.
External data:
— remote databases: MySQL, PostgreSQL, MongoDB, ODBC, JDBC...
— object storages: S3, HDFS, Azure, COSN, OSS...
— from URL and local files;
All possible data formats:
— text: CSV, TSV, JSON, Values, MySQLDump, Regexp...
— binary: Parquet, Arrow, ORC, Avro, Protobuf, MsgPack...
— schemaful and schemaless;
Now ClickHouse supports Apache Hudi, Delta Lake, and Iceberg
for SELECT queries.
SELECT count() FROM deltaLake(
  'https://clickhouse-public-datasets.s3.amazonaws.com/delta_lake/hits/')
  WHERE URL LIKE '%google%'
-- 4.396 sec.
Developers: Daniil Rubin, Ksenia Sumarokova, Flynn ucasfl. Since 22.11.
Visualizations:
— official ClickHouse plugin for Grafana;
— official support for Superset;
— HEX and Deepnote support.
Data ingestion and processing:
— Kafka Connect integration;
— Airflow, dbt support.
Language drivers:
— official Node.JS driver;
— optimized Go driver;
— a new Python client.
ClickHouse is easy to configure for your needs.
Was:
storage_configuration:
  disks:
    web:
      type: web
      endpoint: 'https://clickhouse-public-datasets.s3.amazonaws.com/web/'
  policies:
    web:
      volumes:
        main:
          disk: web
ATTACH TABLE hits ...
ENGINE = MergeTree ORDER BY CounterID
SETTINGS storage_policy = 'web'
Slightly better:
storage_configuration:
  disks:
    web:
      type: web
      endpoint: 'https://clickhouse-public-datasets.s3.amazonaws.com/web/'
ATTACH TABLE hits ...
ENGINE = MergeTree ORDER BY CounterID
SETTINGS disk = 'web'
No need for "storage policy" in simple cases.
Much better:
ATTACH TABLE hits ...
ENGINE = MergeTree ORDER BY CounterID
SETTINGS disk = disk(
  type = 'web',
  endpoint = 'https://clickhouse-public-datasets.s3.amazonaws.com/web/')
100% dynamic configuration, no need to touch the configuration files.
Developers: Ksenia Sumarokova.
So, ClickHouse supports a lot of protocols:
— HTTP
— HTTPs
— Native TCP
— Native TCP wrapped in PROXYv1
— Native TCP with TLS
— MySQL (with TLS support)
— PostgreSQL (with TLS support)
— GRPC (with TLS)
— Replication protocol over HTTP
— Replication protocol over HTTPs
— Keeper client-server protocol;
— Keeper consensus protocol;
— ...
So, ClickHouse supports a lot of protocols.
How to configure all of them? What if:
— server has multiple network interfaces?
— enable one protocol on multiple ports?
— I want native TCP for localhost only and HTTPs from everywhere?
— I want different TLS certificates for different protocols?
— I want to wrap one protocol in another?
Developer: Yakov Olkhovskiy. Since 22.10.
<protocols>
  <tcp>
    <type>tcp</type>
    <host>::</host>
    <port>9000</port>
    <description>native protocol</description>
  </tcp>
  <tcp_secure>
    <type>tls</type>
    <impl>tcp</impl>
    <port>9440</port>
    <description>secure native protocol</description>
  </tcp_secure>
  <tcp_endpoint>
    <impl>tcp</impl>
    <host>0.0.0.0</host>
    <port>9001</port>
    <description>native protocol, another</description>
  </tcp_endpoint>
  <tcp_proxy>
    <type>proxy1</type>
    <impl>tcp</impl>
    <port>9100</port>
    <description>native protocol with PROXYv1</description>
  </tcp_proxy>