ClickHouse Meetup in Beijing

Author: Alexey Milovidov, 2018-10-31.

New Features of ClickHouse

JOIN enhancements

JOIN enhancements

Support for ON.

— AND chains of equality comparisons.

Support for qualified names and aliases
for both left and right tables.

Available since version 18.6.

JOIN enhancements

Before:

SELECT C_REGION, sum(LO_EXTENDEDPRICE * LO_DISCOUNT)
FROM lineorder
ANY INNER JOIN
(
    SELECT C_REGION, C_CUSTKEY AS LO_CUSTKEY
    FROM customer
) USING (LO_CUSTKEY)
WHERE (toYear(LO_ORDERDATE) = 1993) AND ((LO_DISCOUNT >= 1)
    AND (LO_DISCOUNT <= 3)) AND (LO_QUANTITY < 25)
GROUP BY C_REGION;

JOIN enhancements

After:

SELECT C_REGION, sum(LO_EXTENDEDPRICE * LO_DISCOUNT)
FROM lineorder ANY INNER JOIN customer
    ON LO_CUSTKEY=C_CUSTKEY
WHERE (toYear(LO_ORDERDATE) = 1993) AND ((LO_DISCOUNT >= 1)
    AND (LO_DISCOUNT <= 3)) AND (LO_QUANTITY < 25)
GROUP BY C_REGION;

JOIN enhancements

SET join_default_strictness = 'any';

SELECT C_REGION, sum(LO_EXTENDEDPRICE * LO_DISCOUNT)
FROM lineorder INNER JOIN customer
    ON LO_CUSTKEY=C_CUSTKEY
WHERE (toYear(LO_ORDERDATE) = 1993) AND ((LO_DISCOUNT >= 1)
    AND (LO_DISCOUNT <= 3)) AND (LO_QUANTITY < 25)
GROUP BY C_REGION;

JOIN enhancements

Correct behaviour of .

You can leave old behaviour with the 'asterisk_left_columns_only' setting.

Correct behaviour of qualified asterisks is implemented by zhang2014.

Available since 18.12.

Improved in 18.14.

DECIMAL

SELECT 1 - 0.9 - 0.1 AS x ┌───────────────────────x─┐ │ -2.7755575615628914e-17 │ └─────────────────────────┘

DECIMAL

Decimal numbers with fixed point.

Example:
— DECIMAL(16, 6);
— Decimal128(6);


Production ready. Available in 18.14.10.
(beta in 18.12)

DECIMAL

Number of digits — up to 38.

Stored as integer number Int32, Int64 or Int128 containing all the digits.

For most of the operations, performance is about the same as for integers.

There are overflow checks
— can be turned off with 'decimal_check_overflow' setting.

Performance introspection

Performance introspection

What we already have had before:

Information about single queries:
— system.processes, SHOW PROCESSLIST;
— system.query_log;
— server text log.

Global performance metrics:
— system.events;
— system.metrics;
— system.asynchronous_metrics;
— also sent to Graphite;

Performance introspection

We've added info about real CPU and IO usage:

RealTime, UserTime, SystemTime,
SoftPageFaults, HardPageFaults,
VoluntaryContextSwitches,
InvoluntaryContextSwitches

IOWait, CPUWait,
CPUVirtualTime,
ReadBytes, WriteBytes,
ReadChars, WriteChars

Performance introspection

We've added info about real CPU and IO usage:

Half of them — from getrusage,
another half — from Linux kernel, obtained through AF_NETLINK socket.

Requires Linux and CAP_NET_ADMIN capability
— set at the package install time.

Performance introspection

All performance counters became hierarchical:

— global counters
  — in system.events table;

— counters per each query
  — in system.processes and system.query_log;

— counters per each query execution thread
  — in system.query_thread_log table;

Performance introspection

All performance counters became hierarchical:

— in system.processes and system.query_log
we've added ProfileEvents nested data structure
with all the counters;

— added Settings nested structure
with all changed settings;

— added thread_numbers column
with numbers of all query executing threads;

— added system.query_thread_log table;

— settings: log_profile_events, log_query_settings, log_query_threads;

Performance introspection

Information about queries in server text log:

— every line in log related to query contains query_id.

Receive query execution log directly in clickhouse-client:

SET send_logs_level = 'trace';

It works independent of server global log level.

For distributed queries it will accumulate logs from all participating servers.

example.yandex.ru :) SET send_logs_level = 'debug'
example.yandex.ru :) SELECT SearchPhrase, count() c FROM test.hits GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10

[example.yandex.ru] 2018.08.16 02:06:17.302711 {e70872c6-30e5-4774-a965-ab9fcabedf02} [ 45 ]
<Debug> executeQuery: (from [::1]:56440) SELECT SearchPhrase, count() c FROM test.hits GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10
[example.yandex.ru] 2018.08.16 02:06:17.303662 {e70872c6-30e5-4774-a965-ab9fcabedf02} [ 45 ]
<Debug> test.hits (SelectExecutor): Key condition: unknown
[example.yandex.ru] 2018.08.16 02:06:17.303697 {e70872c6-30e5-4774-a965-ab9fcabedf02} [ 45 ]
<Debug> test.hits (SelectExecutor): MinMax index condition: unknown
[example.yandex.ru] 2018.08.16 02:06:17.303716 {e70872c6-30e5-4774-a965-ab9fcabedf02} [ 45 ]
<Debug> test.hits (SelectExecutor): Selected 1 parts by date, 1 parts by key, 1084 marks to read from 1 ranges
[example.yandex.ru] 2018.08.16 02:06:17.304632 {e70872c6-30e5-4774-a965-ab9fcabedf02} [ 45 ]
<Debug> executeQuery: Query pipeline:
Limit
 Expression
  MergeSorting
   PartialSorting
    Expression
     ParallelAggregating
      Expression × 16
       MergeTreeThread

┌─SearchPhrase────────────┬───────c─┐
│ яндекс                  │    1655 │
│ весна 2014 мода         │    1549 │
│ анджелина джоли         │    1245 │
  ...
[example.yandex.ru] 2018.08.16 02:06:17.395362 {e70872c6-30e5-4774-a965-ab9fcabedf02} [ 45 ] <Debug>
MemoryTracker: Peak memory usage (total): 86.74 MiB.

10 rows in set. Elapsed: 0.093 sec. Processed 8.87 million rows, 108.44 MB (94.92 million rows/s., 1.16 GB/s.)

Performance introspection

Available in 18.12.

Performance introspection

What we want to do:

— resource pools (workload management);

— throttling resource usage for proper share;

Performance introspection

What we want to do:

— write ClickHouse text logs into system table in ClickHouse;

— write global system metrics (like CPU, iowait, LA)
  into system table in ClickHouse;

— calculated histogram/quantile metrics for certain processes
  (example: 90% query execution time);

— add per-column and per-file statistics in query_log;

— add counters for all error codes (system.errors table);

— send all metrics to the client;

Table functions

— url;

— file;

— cluster;

— mysql;

— odbc.

Security

Encryption (https) for replication traffic

Authentication between replicas

Encryption for traffic of Distributed tables

Safe usage of ODBC drivers for external sources

How to "ruin" your data

How to "ruin" your data

SELECT substringUTF8(Title, 1, 50) AS k, count() AS c
FROM hits_1000m_transformed
WHERE match(k, '^[\x01-\x7F]+$') GROUP BY k ORDER BY c DESC LIMIT 100

┌─k──────────────────────────────────────────────────┬───────c─┐
│ Kunstkamera Club-Pickup Bai Porta.ru               │  211122 │
│ Kanye Kedax^                                       │  187249 │
│ OWAProfiles. Colletina Portal Digested Gabbas - IR │  186538 │
│ Verver Board                                       │   87895 │
│ OWAProfit-Parts, Bless Hotels Centre (Lassilev.24l │   74841 │
│ Skins Resort & France.Ru / Compilation             │   60256 │
│ Demio - Livejour ownitsubishi - World of Tanks     │   56643 │
│ Ford Megane Shot comparen Payne Money Bannette for │   54032 │
│ OWAProfit-Partner: Kalus30,eos3D.com :: Nokia!     │   53593 │
│ open shoe (Calence 2 64Gb Black (Sol               │   36314 │
│ Osunshine Player.                                  │   33316 │
│ OWAProfit-Parts Body Big Natural Dead: A Hauptmand │   28153 │
│ PAOLO CONTRACK Atlant, Buckle | WWWW.PHOTODOM.ria. │   24317 │
│ Opel (Count of Tanks - World of Tanks official for │   23525 │
│ Kanuni (Palacticarefunds official forum - Page | W │   23340 │
│ Ahmed the sidence (46E)                            │   22812 │
│ Lexus - DimKrollsky feasantech tanks official foru │   18929 │
│ Nokia Acrosoft                                     │   17973 │
│ Iveco x Hundai ; Toyota RAV-4 Black Of Thalassia.r │   17908 │
│ OWAProfitween Espace Wolf' Pacification Port and T │   16393 │

JIT expression compiler

SELECT count() FROM hits WHERE ((EventDate >= '2018-08-01') AND (EventDate <= '2018-08-03') AND (CounterID >= 34)) OR ((EventDate >= '2018-08-04') AND (EventDate <= '2018-08-05') AND (CounterID <= 101500))

— 649 533 033 rows/sec.

JIT expression compiler

SET compile_expressions = 1; SELECT count() FROM hits WHERE ((EventDate >= '2018-08-01') AND (EventDate <= '2018-08-03') AND (CounterID >= 34)) OR ((EventDate >= '2018-08-04') AND (EventDate <= '2018-08-05') AND (CounterID <= 101500))

— 865 491 052 rows/sec.
+33% speed up!

.

.

Web site: https://clickhouse.com/

Google groups: https://groups.google.com/forum/#!forum/clickhouse

Maillist: [email protected]

WeChat: ask William to join

Telegram chat: https://telegram.me/clickhouse_en

GitHub: https://github.com/ClickHouse/ClickHouse/

Twitter: https://twitter.com/ClickHouseDB

Up next: ClickHouse Meetup in Amsterdam on November 15.