ClickHouse Meetup in Saint Petersburg

Author: Alexey Milovidov, 2018-08-20.

New ClickHouse Features

JOIN Improvements

JOIN Improvements

Support for ON.

— chains of AND with expression equalities.

Support for qualified names and aliases
for both left and right tables.

Starting from version 18.6.0.

JOIN Improvements

Correct behavior of «asterisk».

You can keep the old behavior with the asterisk_left_columns_only setting.

Already in master.

DECIMAL

SELECT 1 - 0.9 - 0.1 AS x ┌───────────────────────x─┐ │ -2.7755575615628914e-17 │ └─────────────────────────┘

DECIMAL

Decimal numbers with fixed point.

Example: DECIMAL(16, 6).


Already in master, but development is in progress.

DECIMAL

Number of decimal digits — up to 38.

Stored as integer Int32, Int64 or Int128, containing all significant digits.

Performance for most operations is the same as when using integers.

There are overflow checks — possibly will be disabled.

Performance Introspection

Performance Introspection

Was

Information per individual query:
— system.processes, SHOW PROCESSLIST;
— system.query_log;
— server text log.

Global performance metrics:
— system.events;
— system.metrics;
— system.asynchronous_metrics;
— also sent to Graphite;

Performance Introspection

Adding information about actual CPU and IO consumption:

RealTime, UserTime, SystemTime,
SoftPageFaults, HardPageFaults,
VoluntaryContextSwitches,
InvoluntaryContextSwitches

IOWait, CPUWait,
CPUVirtualTime,
ReadBytes, WriteBytes,
ReadChars, WriteChars

Performance Introspection

Adding information about actual CPU and IO consumption:

Half of metrics — getrusage,
the other half — from Linux kernel, obtained through AF_NETLINK socket.

Requires Linux and CAP_NET_ADMIN capability
— set during ClickHouse installation.

Performance Introspection

Metrics become hierarchical:

— global
  — in system.events table;

— per query
  — in system.processes, system.query_log tables;

— per query execution thread
  — in system.query_thread_log table;

Performance Introspection

Metrics become hierarchical:

— in system.processes and system.query_log tables
a nested ProfileEvents data structure is added
with all metrics;

— a nested Settings data structure is added
with all changed settings;

— a thread_numbers column is added
with numbers of all threads that executed the query;

— system.query_thread_log table is added;

— settings log_profile_events, log_query_settings, log_query_threads;

Performance Introspection

Query information in regular text log:

— each log line outputs the query identifier.

Getting query execution log in command-line client:

SET send_logs_level = 'trace';

Works independently of server logging level.

example.yandex.ru :) SET send_logs_level = 'debug'
example.yandex.ru :) SELECT SearchPhrase, count() c FROM test.hits GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10

[example.yandex.ru] 2018.08.16 02:06:17.302711 {e70872c6-30e5-4774-a965-ab9fcabedf02} [ 45 ]
<Debug> executeQuery: (from [::1]:56440) SELECT SearchPhrase, count() c FROM test.hits GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10
[example.yandex.ru] 2018.08.16 02:06:17.303662 {e70872c6-30e5-4774-a965-ab9fcabedf02} [ 45 ]
<Debug> test.hits (SelectExecutor): Key condition: unknown
[example.yandex.ru] 2018.08.16 02:06:17.303697 {e70872c6-30e5-4774-a965-ab9fcabedf02} [ 45 ]
<Debug> test.hits (SelectExecutor): MinMax index condition: unknown
[example.yandex.ru] 2018.08.16 02:06:17.303716 {e70872c6-30e5-4774-a965-ab9fcabedf02} [ 45 ]
<Debug> test.hits (SelectExecutor): Selected 1 parts by date, 1 parts by key, 1084 marks to read from 1 ranges
[example.yandex.ru] 2018.08.16 02:06:17.304632 {e70872c6-30e5-4774-a965-ab9fcabedf02} [ 45 ]
<Debug> executeQuery: Query pipeline:
Limit
 Expression
  MergeSorting
   PartialSorting
    Expression
     ParallelAggregating
      ConvertColumnWithDictionaryToFull × 16
       Expression
        MergeTreeThread

┌─SearchPhrase────────────┬───────c─┐
│ yandex                  │    1655 │
│ spring 2014 fashion     │    1549 │
│ angelina jolie          │    1245 │
  ...
[example.yandex.ru] 2018.08.16 02:06:17.395362 {e70872c6-30e5-4774-a965-ab9fcabedf02} [ 45 ] <Debug>
 MemoryTracker: Peak memory usage (total): 86.74 MiB.

10 rows in set. Elapsed: 0.093 sec. Processed 8.87 million rows, 108.44 MB (94.92 million rows/s., 1.16 GB/s.)

Performance Introspection

Not yet in master. All this is being merged now.

Performance Introspection

Want to implement:

— resource pools;

— throttle resource consumption when exceeding quota;

Performance Introspection

Want to implement:

— write regular text log to system table;

— write global system metrics to system table;

— calculate histogram-type metrics;

— add metrics for all error codes;

— send all query execution metrics to client;

Table Functions

— url;

— file;

— cluster;

— mysql;

— odbc.

Security

Encryption for replication

Authentication when transferring data between replicas

Encryption for Distributed tables

Secure connection for ODBC drivers

How to «Corrupt» Data

How to «Corrupt» Data

SELECT substringUTF8(Title, 1, 50) AS k, count() AS c
FROM hits_1000m_transformed
WHERE k != '' GROUP BY k ORDER BY c DESC LIMIT 100

┌─k──────────────────────────────────────────────────┬────────c─┐
│ Photos and goods in online store                   │ 12780666 │
│ world map                                          │ 12064769 │
│ Histor Bazar automobiles with cutout, l/b, popads T│ 10990408 │
│ Blocks                                             │ 10359444 │
│ Stroy bank in rossonnia and Cat EuroCargorod geevich│  6763901 │
│ AUTO.ria.ua Bazar auto - Yandex.Weather, retail. P │  4346693 │
│ world — Yandex.Pog                                 │  4297599 │
│ Inexpensive, 21 catalog                            │  3985733 │
│ 5211 — Afisha: what is cholmskaya in translated    │  3574826 │
│ @diaries, Chaikon Autosearch                       │  3501227 │
│ Vacancies in Moscow - Pulse ce                     │  3119487 │
│ Histor Bazar automobile on Love.QIP.ru affordable  │  3065342 │
│ Dress or buy in                                    │  2839515 │
│ Spain - MIR APARTMENT, KORETTEONOVA - predacted    │  2629122 │
│ New cheap : Selection serving for work             │  2379676 │
│ LADA, LLC "AH, EVACUATORS, Greece, Ukraine Auto    │  2139439 │
│ Alexa | Buy bicycle Smart (Peugeot) PajeroII -     │  1639138 │
│ Mods and tourists                                  │  1523733 │
│ Work, handex.RU - fashion forums AQUASTOK          │  1226220 │
│ Games to secrets online store                      │  1068327 │

JIT Compilation of Expressions

SELECT count() FROM hits WHERE ((EventDate >= '2018-08-01') AND (EventDate <= '2018-08-03') AND (CounterID >= 34)) OR ((EventDate >= '2018-08-04') AND (EventDate <= '2018-08-05') AND (CounterID <= 101500))

— 649,533,033 rows per second.

JIT Compilation of Expressions

SET compile_expressions = 1; SELECT count() FROM hits WHERE ((EventDate >= '2018-08-01') AND (EventDate <= '2018-08-03') AND (CounterID >= 34)) OR ((EventDate >= '2018-08-04') AND (EventDate <= '2018-08-05') AND (CounterID <= 101500))

— 865,491,052 rows per second.
+33% performance!

.

.

Web site: https://clickhouse.com/

Google groups: https://groups.google.com/forum/#!forum/clickhouse

Maillist: [email protected]

Telegram chat: https://telegram.me/clickhouse_ru (more than 1800 participants) and https://telegram.me/clickhouse_en

GitHub: https://github.com/ClickHouse/ClickHouse/

Twitter: https://twitter.com/ClickHouseDB

Next: meetup in Paris on October 2nd.