1. (30 min) What's new in ClickHouse 21.11.
2. (30 min) Q&A.
Monthly stable release for November.
clickhouse-local is like clickhouse-client and clickhouse-server together
but without the server.
Demo.
Allows to do many small INSERTS.
From many concurrent connections.
Without Kafka and Buffer tables!
Many small INSERTs are combined together
and batched in memory.
Safe by default: every client get response when data is inserted.
Developer: Anton Popov.
Demo.
In 21.10 we have introduced SQL UDFs
and scriptable tables.
Now we have scriptable UDFs.
<function>
<type>executable</type>
<name>myPlus</name>
<return_type>UInt64</return_type>
<argument><type>UInt64</type></argument>
<argument><type>UInt64</type></argument>
<format>TabSeparated</format>
<command>awk '{ print $1 + $2 }'</command>
<execute_direct>0</execute_direct>
</function>
Developer: Maksim Kita.
Two variants:
executable — a worker process is run for every block of data;
executable_pool — a pool of long running worker processes.
Multiple processes are run concurrently — as the number of query threads.
Data is serialized/deserialized in specified format via stdin/stdout.
#!/usr/bin/python3
import sys
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer
if __name__ == '__main__':
sentiment_analyzer = SentimentIntensityAnalyzer()
# Read chunk length
for number in sys.stdin:
length = int(number)
# Read lines from chunk
for _ in range(0, length):
line = sys.stdin.readline()
score = sentiment_analyzer.polarity_scores(line)
print(str(score['compound']) + '\n', end='')
# Flush results to stdout
sys.stdout.flush()
<function>
<type>executable_pool</type>
<name>sentenceScore</name>
<return_type>Float64</return_type>
<argument>
<type>String</type>
</argument>
<format>TabSeparated</format>
<command>user_scripts/sentence_analyzer</command>
<send_chunk_header>1</send_chunk_header>
</function>
function:
type: executable_pool
name: sentenceScore
return_type: Float64
arguments:
- argument:
type: UInt64
format: TSV
command: user_scripts/sentence_analyzer
SELECT sentenceScore('ClickHouse is fast') as score
┌─score─┐
│ 0 │
└───────┘
SELECT avg(sentenceScore(sentence)) AS avg_sentence_score
FROM executable('sentence_generator.py 10000',
'TabSeparated',
'sentence String')
┌────────────avg_score─┐
│ 0.030663238759543694 │
└──────────────────────┘
Upcoming:
— User Defined Types (21.12)
For external data sources.
Was:
SELECT * FROM mysql('myserver:3306', 'Northwind', 'Orders', 'user', 'password')
Now:
SELECT * FROM mysql(mydb, table = 'Orders')
<named_collections>
<postgres1>
<user>postgres</user>
<password>mysecretpassword</password>
<host>postgres1</host>
<port>5432</port>
<database>postgres</database>
<table>test_table</table>
</postgres1>
<postgres2>
<user>postgres</user>
<password>mysecretpassword</password>
<host>postgres1</host>
</postgres2>
</named_collections>
— Allow to predefine connection parameters in config file.
— Any parameters (address, database/table name, user/password, etc).
— Parameters can be overridden in the query.
— Work for table engines, table functions and dictionaries.
— Work for mysql, postgresql, mongo, s3, remote, hdfs...
— Configuration is updated on the fly.
Developer: Ksenia Sumarokova.
SQL standard way for database introspection
— for compatibility with tools and protocols.
Demo
Developer: Alexander Tokmakov.
1. Tokenization, stemming, lemmatization, synonims (21.9..21.11).
— SELECT arrayMap(word -> lemmatize(word), tokens(text)).
New function: tokens.
SELECT tokens('I 💖 ClickHouse')
['I','💖','ClickHouse']
Developers: Nikolai Degterinskiy, Maksim Kita.
2. Unicode normalization:
normalizeUTF8NFC, normalizeUTF8NFD,
normalizeUTF8NFKC, normalizeUTF8NFKD.
Demo
Developer: Slava Boben.
— L1, L2, Lp, L∞ spaces, any dimension;
— distance, norm, cosine distance functions;
— scalar product on tuples and arrays.
Applications:
— analysis of scientific datasets;
— analysis of vector embeddings;
Example:
SELECT L2Norm((1, 2, 3)), LpNorm((1, 2, 3), 10), L1Distance((1, 2, 3), (3, 2, 1))
Developer: Alexey Boykov.
— EXISTS (subquery).
— JOIN condition supports OR.
— CapNProto as output format.
— OPTIONS HTTP request for CORS.
— Binary literals: SELECT 0b101010.
— Allow user to change log levels without restart.
— experimental feature!
Streming consumption of log files.
It's like Kafka and RabbitMQ engine
but for local append-only logs, e.g. application logs.
Supports log rotation, tracks offsets...
Developers: ucasFL (flynn) and Ksenia Sumarokova.
Preemptive scheduler for background merges.
— lower the chance to get "Too many parts" on insertion.
Developer: Nikita Mikhaylov.
Make reading from MergeTree over s3 to work properly.
— it's x100 performance improvement,
like 40 sec -> 394 ms for point query with 100 columns.
Developers: Alexander Sapin, Ksenia Sumarokova, Alexey Milovidov.
Learning portal: https://clickhouse.com/learn/
— step-by-step tutorials and examples!
Full changelog:
https://github.com/ClickHouse/ClickHouse/blob/master/CHANGELOG.md
Calling all visionaries and builders!
ClickHouse is growing and expects to double in size. Get in early:
Apply here: clickhouse.com/careers or [email protected]
Global, fully remote roles available:
— Cloud Software Engineer (Go, Kubernetes)
— Support Engineer
— Core Software Engineer (C++)
Send feedback: [email protected]
Sign up to our newsletter.