1. (5 min) The news about ClickHouse, Inc.
2. (35 min) What's new in ClickHouse 21.10.
3. (20 min) Q&A.
ClickHouse has grown a lot in 2020..2021.
Especially in the US and Europe.
...
We decided to create a company:
International, independent company.
Incorporated in the US. Headquarters in Amsterdam and Bay Area.
Distributed team. We are hiring in Europe, the US and all over the world.
Aaron Katz, CEO - ex. Elastic, Salesforce.
Alexey Milovidov, CTO.
Yuri Izrailevsky, President of Engineering - ex. Google, Netflix.












Monthly stable release for October.
SQL User Defined Functions (since 21.10)
CREATE FUNCTION ch AS (x) -> (exp(x) + exp(-x)) / 2;
CREATE FUNCTION getRegionName AS (x)
-> dictGet('region', 'name', x);
— the functions are polymorphic;
— they are substituted as expressions;
— queries are optimized for CSE and JIT compilation;
Limitations:
— no loops;
— no recursion;
Introspection:
SELECT name, create_query
FROM system.functions
WHERE origin = 'SQLUserDefined'
┌─name─────┬─create_query────────────────────────────────────────┐
│ ch │ CREATE FUNCTION ch AS x -> ((exp(x) + exp(-x)) / 2) │
└──────────┴─────────────────────────────────────────────────────┘
Persistence:
— functions are stored in /var/lib/clickhouse/user_defined/.
Distributed queries:
— functions should be available on initiator node.
Data generation and import:
CREATE TABLE test ... ENGINE =
Executable('my_script', TSV);
— the script should return data in stdout, similar to "file" engine.
Data transformation:
CREATE TABLE test ... ENGINE =
Executable('my_script', TSV, (SELECT ...));
— the script will get source stream to stdin and return data in stdout.
Custom JOIN logic:
CREATE TABLE test ... ENGINE =
Executable('my_script', TSV, (SELECT ...), (SELECT ...), ...);
— the script will get source stream to stdin and additional streams in fd 3, 4...
Two variants:
— Executable: script is run on every query;
— ExecutablePool: a pool of persistent processes;
Table function:
SELECT * FROM
executable('my_script', TSV, 'x UInt64...');
SELECT * FROM
executable('my_script', TSV, 'x UInt64...', (SELECT ...) ...);
Applications:
— loading data from foreign data source;
— applying machine-learned models to dataset;
— custom processing, e.g. DNS resolution, whois...
Upcoming:
— Scriptable User Defined Functions (since 21.11-testing)
<function>
<type>executable</type>
<name>myPlus</name>
<return_type>UInt64</return_type>
<argument><type>UInt64</type></argument>
<argument><type>UInt64</type></argument>
<format>TabSeparated</format>
<command>awk '{ print $1 + $2 }'</command>
<execute_direct>0</execute_direct>
</function>
— User Defined Types (21.11 .. 21.12)
Transparent support for import & export of compressed data.
Now we support gz, xz, zst (21.1), bz2 (21.9) and lz4 (21.10).
It works for url, file, s3, hdfs,
HTTP Content-Encoding,
SELECT INTO OUTFILE,
and INSERT FROM INFILE (21.11).
Partitioned export to S3:
INSERT INTO TABLE FUNCTION
s3('http://test.s3.amazonaws.com/bucket/file{_partition_id}.csv.gz',
CSV, 'name String, time DateTime')
PARTITION BY toYYYYMMDD(time)
VALUES ...
Upcoming:
Partitioned export to hdfs, file, url.
ALTER ... MATERIALIZE COLUMN:
ALTER TABLE t MATERIALIZE COLUMN c;
-- write column if it was not materialized.
ALTER TABLE t MATERIALIZE COLUMN c FINAL;
-- unconditionally rewrite column on disk.
Before:
ALTER TABLE t UPDATE c = c; -- unconditionally rewrite column;
OPTIMIZE TABLE t FINAL; -- rewrite and merge all table parts.
Positional arguments:
SELECT domain(URL) AS dom, uniq(UserID) AS u
FROM pageviews
GROUP BY dom
ORDER BY u DESC;
SET enable_positional_arguments = 1;
SELECT domain(URL), uniq(UserID)
FROM pageviews
GROUP BY 1
ORDER BY 2 DESC;
1. Tokenization, stemming, lemmatization, synonims (21.9..21.11).
— SELECT arrayMap(word -> lemmatize(word), tokens(text)).
2. Charset and language detection (Q4 2021).
3. Semi-duplicate text search (21.1).
— min-hash and sim-hash algorithms.
Allow to do many small INSERTS.
From many concurrent connections.
Without Kafka and Buffer tables!
Many small INSERTs are combined together
and batched in memory.
Safe by default: every client get response when data is inserted.
Available in 21.11-testing
Learning portal: https://clickhouse.com/learn/
— step-by-step tutorials and examples!
Full changelog:
https://github.com/ClickHouse/ClickHouse/blob/master/CHANGELOG.md
Calling all visionaries and builders!
ClickHouse is growing and expects to double in size. Get in early:
Apply here: clickhouse.com/careers or [email protected]
Global, fully remote roles available:
— Cloud Software Engineer (Go, Kubernetes)
— Support Engineer
— Core Software Engineer (C++)
Send feedback: [email protected]
Sign up to our newsletter.