ClickHouse: Release 21.10 Webinar

ClickHouse
Release 21.10
Webinar

Release 21.10 Webinar

1. (5 min) The news about ClickHouse, Inc.

2. (35 min) What's new in ClickHouse 21.10.

3. (20 min) Q&A.

The News

ClickHouse has grown a lot in 2020..2021.

Especially in the US and Europe.

...

We decided to create a company:

ClickHouse, Inc.

What Is ClickHouse, Inc?

International, independent company.

Incorporated in the US. Headquarters in Amsterdam and Bay Area.

Distributed team. We are hiring in Europe, the US and all over the world.

ClickHouse, Inc: People

Aaron Katz, CEO - ex. Elastic, Salesforce.
Alexey Milovidov, CTO.
Yuri Izrailevsky, President of Engineering - ex. Google, Netflix.




...
the list is incomplete -->
while I was preparing the slides,
we hired even more people

What's New in 21.10

Monthly stable release for October.

User Defined Functions

SQL User Defined Functions (since 21.10)

CREATE FUNCTION ch AS (x) -> (exp(x) + exp(-x)) / 2; CREATE FUNCTION getRegionName AS (x) -> dictGet('region', 'name', x);

— the functions are polymorphic;
— they are substituted as expressions;
— queries are optimized for CSE and JIT compilation;

Limitations:
— no loops;
— no recursion;

User Defined Functions

Introspection:

SELECT name, create_query FROM system.functions WHERE origin = 'SQLUserDefined' ┌─name─────┬─create_query────────────────────────────────────────┐ │ ch │ CREATE FUNCTION ch AS x -> ((exp(x) + exp(-x)) / 2) │ └──────────┴─────────────────────────────────────────────────────┘

Persistence:
— functions are stored in /var/lib/clickhouse/user_defined/.

Distributed queries:
— functions should be available on initiator node.

Scriptable Tables

Data generation and import:

CREATE TABLE test ... ENGINE = Executable('my_script', TSV);

— the script should return data in stdout, similar to "file" engine.

Data transformation:

CREATE TABLE test ... ENGINE = Executable('my_script', TSV, (SELECT ...));

— the script will get source stream to stdin and return data in stdout.

Custom JOIN logic:

CREATE TABLE test ... ENGINE = Executable('my_script', TSV, (SELECT ...), (SELECT ...), ...);

— the script will get source stream to stdin and additional streams in fd 3, 4...

Scriptable Tables

Two variants:

Executable: script is run on every query;

ExecutablePool: a pool of persistent processes;

Table function:

SELECT * FROM executable('my_script', TSV, 'x UInt64...'); SELECT * FROM executable('my_script', TSV, 'x UInt64...', (SELECT ...) ...);

Scriptable Tables

Applications:

— loading data from foreign data source;

— applying machine-learned models to dataset;

— custom processing, e.g. DNS resolution, whois...

User Defined Functions

Upcoming:

— Scriptable User Defined Functions (since 21.11-testing)

<function> <type>executable</type> <name>myPlus</name> <return_type>UInt64</return_type> <argument><type>UInt64</type></argument> <argument><type>UInt64</type></argument> <format>TabSeparated</format> <command>awk '{ print $1 + $2 }'</command> <execute_direct>0</execute_direct> </function>

— User Defined Types (21.11 .. 21.12)

Small and Neat Things

Transparent support for import & export of compressed data.

Now we support gz, xz, zst (21.1), bz2 (21.9) and lz4 (21.10).

It works for url, file, s3, hdfs,

HTTP Content-Encoding,

SELECT INTO OUTFILE,
and INSERT FROM INFILE (21.11).

Small and Neat Things

Partitioned export to S3:

INSERT INTO TABLE FUNCTION s3('http://test.s3.amazonaws.com/bucket/file{_partition_id}.csv.gz', CSV, 'name String, time DateTime') PARTITION BY toYYYYMMDD(time) VALUES ...

Upcoming:

Partitioned export to hdfs, file, url.

Small and Neat Things

ALTER ... MATERIALIZE COLUMN:

ALTER TABLE t MATERIALIZE COLUMN c; -- write column if it was not materialized. ALTER TABLE t MATERIALIZE COLUMN c FINAL; -- unconditionally rewrite column on disk.

Before:

ALTER TABLE t UPDATE c = c; -- unconditionally rewrite column; OPTIMIZE TABLE t FINAL; -- rewrite and merge all table parts.

Small and Neat Things

Positional arguments:

SELECT domain(URL) AS dom, uniq(UserID) AS u FROM pageviews GROUP BY dom ORDER BY u DESC; SET enable_positional_arguments = 1; SELECT domain(URL), uniq(UserID) FROM pageviews GROUP BY 1 ORDER BY 2 DESC;

What's Next?

Text Classification and NLP

1. Tokenization, stemming, lemmatization, synonims (21.9..21.11).
— SELECT arrayMap(word -> lemmatize(word), tokens(text)).

2. Charset and language detection (Q4 2021).

3. Semi-duplicate text search (21.1).
— min-hash and sim-hash algorithms.

Asynchronous INSERTs

Allow to do many small INSERTS.

From many concurrent connections.

Without Kafka and Buffer tables!

Many small INSERTs are combined together
and batched in memory.

Safe by default: every client get response when data is inserted.

Available in 21.11-testing

What Else?

Learning portal: https://clickhouse.com/learn/

— step-by-step tutorials and examples!

Full changelog:

https://github.com/ClickHouse/ClickHouse/blob/master/CHANGELOG.md

ClickHouse Is Hiring!

Calling all visionaries and builders!
ClickHouse is growing and expects to double in size. Get in early:

Apply here: clickhouse.com/careers or [email protected]

Global, fully remote roles available:
Cloud Software Engineer (Go, Kubernetes)
— Support Engineer
— Core Software Engineer (C++)

Send feedback: [email protected]
Sign up to our newsletter.

Q&A