Author: Alexey Milovidov, 2017-05-17.
ClickHouse - distributed analytical column-oriented DBMS
How row-oriented systems work:

How column-oriented systems work:

Nothing ready-made was suitable.
So we made ClickHouse.
«Evolution of Data Structures in Yandex.Metrica»
* If you want to try ClickHouse, one server is enough.
We managed to make the system relatively user-friendly.
From the very beginning we had detailed documentation.
Within a couple of years, ClickHouse spread to other Yandex departments.
Mail, Market, Direct, Webmaster, AdFox, Infrastructure, Business Analytics...
There are cases when analysts independently installed ClickHouse on virtual machines and successfully used it without any questions.
Then we decided — ClickHouse is too good a system for us to keep it to ourselves.
To make it more fun, we need to get people outside hooked on ClickHouse, let them enjoy it. We decided to go open-source.
Apache 2.0 license — minimum restrictions.
Goal — maximum product distribution.
We want Yandex's product to be used worldwide.
See "Yandex Opens ClickHouse"
Well-structured, cleaned, immutable events.
Click stream. Web analytics. Ad networks. RTB. E-commerce.
Online game analytics. Sensor and monitoring data. Telecom data.
Financial transactions. Stock market analytics.
OLTP
ClickHouse has no UPDATE and full-fledged transactions.
Key-Value
If you need frequent update queries by key, use another solution.
Blob-store, document oriented
ClickHouse is designed for a large number of fine-grained data.
Over-normalized data
Better to make a wide fact table.
— out of desperation.
Yandex.Metrica must work.
Algorithmic optimization.
MergeTree, data locality on disk
— fast range queries.
Example: the uniqCombined function consists of a combination of three different data structures suitable for different cardinality ranges.
Low-level optimization.
Example: vectorized query execution.
Specialization and attention to detail.
Example: we have 17 different GROUP BY algorithms. The best one is chosen for your query.
Hundreds of companies in Russia and nearby
Yandex, Mail.ru, Rambler, SKB Kontur…
Dozens of companies in Europe, USA, China
Cloudflare, Wikimedia, Lifestreet…
Search engine and analytics for Bitcoin transactions:
https://blockchair.com/
"Pretty large tables are running, only one server is used and everything works very quickly — with any filters and sorting, everything is almost instantaneous."
Bioinformatics - evolutionary genetics:
https://github.com/msestak/FindOrigin
"We are exploring evolution of novel genes in genomes because if seems that genomes are far from being static as previously believed and what actually happens is that new genes are constantly being added and old genes are lost."
LHCb experiment at LHC:
https://www.yandex.com/company/press_center/press_releases/2012/2012-04-10/
Type casting for Merge-type tables
input_format_allow_errors_* settings
Ability to create more than 16 dictionaries with ODBC source
Loading part of configuration from ZK
OPTIMIZE DEDUPLICATE
ALTER of primary key: Enum, Date <-> UInt16, DateTime <-> UInt32
clickhouse --extract-from-config
Disabling lagging replicas
Disabling replicas without the table
Original query source in system.processes, system.query_log
Type casting in INSERT SELECT
INSERT SELECT: by positions instead of names
pointInEllipses
greatCircleDistance
-ForEach combinator
groupArrayInsertAt
topK (beta)
ODBC driver - build and functionality on Windows
HTTPS server
system.parts - exact row count
system.columns - uncompressed size
system.part_log
NULL for JOIN: join_use_nulls setting
NULLS FIRST, LAST for ORDER BY
NULL support in IN
NULL support in higher-order functions
if, multiIf, ifNull, nullIf, coalesce
toNullable, assumeNotNull
Nullable type support in aggregate functions
NULL as result of subquery returning empty set
ALTER ... DROP COLUMN ... FROM PARTITION
preferred_block_size_bytes setting
Ability to enable result buffering in HTTP interface
KILL QUERY
LIMIT BY
SELECT INTO OUTFILE
clickhouse-local
cross-replication
UUID and MAC encoding functions
Proper HTTP response codes
Proper comparison operation logic
Progress in HTTP headers
Cache stampede fix
system.build_options, system.graphite
Distributed query tracing in system.processes, system.query_log
Ability to skip errors in text formats
fsync_metadata setting
timezone config parameter, timezone() function
decodeURLComponent
Faster gzip in HTTP interface
max_table_size_to_drop
proper build and packages
DISTINCT speedup
Buffer table optimization
FixedString optimization
RIGHT/FULL JOIN improvements
Distributed DDL queries
Dictionary table engine, Dictionary database engine
Reloading dictionaries using user-defined state query
SYSTEM queries
Session concept in HTTP interface
Limit on concurrent replica downloads
NULLs: fix almost all remaining issues
SELECT `db`.`table`.`column`
Custom partitioning key for MergeTree
Ability to write JOIN as in regular SQL
Resource pools (CPU, disk IO, network bandwidth) for queries
Initial UPDATE/DELETE support
Website: https://clickhouse.com/
Google groups: https://groups.google.com/forum/#!forum/clickhouse
Mailing list: [email protected]
Telegram chat: https://telegram.me/clickhouse_en and https://telegram.me/clickhouse_ru (already 668 participants)
GitHub: https://github.com/ClickHouse/ClickHouse/
+ meetings. Moscow, Saint Petersburg, Novosibirsk,
Ekaterinburg, San Francisco... Next: Kiev, Minsk...