Author: Alexey Milovidov, 2017-07-07.
Alexey, ClickHouse developer.
Since 2008, I've been working on the data processing engine for Yandex.Metrica.
Yandex.Metrica is a web analytics service.
First in Russia, second in the world.

About ~25 billion events arrive daily.
We need to show reports in real time.
Everything worked great. Users could get about 50 different reports.
But there's a problem. We wanted more. For each report to be arbitrarily customizable.
We quickly made a prototype and implemented a "Report Builder" based on it.
This was in 2010.
It became clear where to move forward.
We needed a good column-oriented DBMS.
This is how row-oriented systems work:

This is how column-oriented systems work:

Nothing ready-made was suitable.
So we made ClickHouse.
"Evolution of Data Structures in Yandex.Metrica"
* If you want to try ClickHouse, one server is enough.
We managed to make a relatively convenient system.
From the very beginning, we had detailed documentation.
Within a couple of years, ClickHouse spread to other departments of Yandex.
Mail, Market, Direct, Webmaster, AdFox, Infrastructure, Business Analytics...
There are cases where analysts independently installed ClickHouse on virtual machines and successfully used it without any questions.
Then we decided — ClickHouse is too good a system for us to keep to ourselves.
To make it more fun, we need to get people outside to use ClickHouse, let them enjoy it. We decided to make it open-source.
Apache 2.0 license — minimal restrictions.
Goal — maximum product distribution.
We want Yandex's product to be used worldwide.
See "Yandex Opens ClickHouse"
Well-structured, cleaned, immutable events.
Click stream. Web analytics. Ad networks. RTB. E-commerce.
Online game analytics. Sensor and monitoring data. Telecom data.
Financial transactions. Stock market analytics.
OLTP
ClickHouse has no UPDATE and full-fledged transactions.
Key-Value
If you need frequent update queries by key, use another solution.
Blob-store, document oriented
ClickHouse is intended for large amounts of fine-grained data.
Hundreds of companies in Russia and nearby
Yandex, Mail.ru, Rambler, SKB Kontur, SMI2…
Dozens of companies in Europe, USA, China
Cloudflare, Wikimedia, Lifestreet, Vertamedia, Carto, Nvidia, Infinidat…
How do we analyze over O(100B) DNS requests daily.
"ClickHouse enables us and our customers to explore the dataset in real time to get operational insights. Due to many of the optimizations built into ClickHouse we are able to store the data for a long time allowing us to look at events in perspective and at historical trends."
"clickhouse is a columnar datastore that we are using as an aid to run complex SQL queries on the edit data "lake" that we have as a result of the edit reconstruction project. It is similar to Druid but faster for complex queries."
Search engine and analytics for Bitcoin transactions:
https://blockchair.com/
"Pretty large tables are running, only one server is used and everything works very fast — with any filters and sorting everything is almost instant."
Bioinformatics - evolutionary genetics:
https://github.com/msestak/FindOrigin
"We are exploring evolution of novel genes in genomes because it seems that genomes are far from being static as previously believed and what actually happens is that new genes are constantly being added and old genes are lost."
LHCb experiment at LHC:
https://www.yandex.com/company/press_center/press_releases/2012/2012-04-10/
— out of necessity.
Yandex.Metrica must work.
To quickly process an analytical query, the system must:
1. Read quickly.
2. Compute quickly.
1. Read quickly.
– locality by primary key;
– columns - we read only what's needed;
– strong typing;
– data compression.
2. Compute quickly.
– vectorized engine;
– specialization of data structures;
– low-level optimizations.
Algorithmic optimization.
MergeTree, data locality on disk
— fast range queries.
Example: the uniqCombined function consists of a combination of three different data structures, suitable for different cardinality ranges.
Low-level optimization.
Example: vectorized query execution.
Specialization and attention to detail.
Example: we have 17 different algorithms for executing GROUP BY. The best one is chosen for your query.
ClickHouse vs. Spark & MariaDB Column Store
https://www.percona.com/blog/2017/02/13/clickhouse-new-opensource-columnar-database/
ClickHouse vs. Redshift
https://www.altinity.com/blog/2017/6/20/clickhouse-vs-redshift
Limited JOIN support
No UPDATE, DELETE
Weak SQL standard compatibility
Timur Shenkao:
«ClickHouse is extremely fast at simple SELECTs without joins, much faster than Vertica».
Ömer Osman Koçak:
«When we evaluated ClickHouse the results were great compared to Prestodb. Even though the columnar storage optimizations for ORC and Clickhouse is quite similar, Clickhouse uses CPU and Memory resources more efficiently (Presto also uses vectorized execution but cannot take advantage of hardware level optimizations such as SIMD instruction sets because it's written in Java so that's fair) so we also wanted to add support for Clickhouse for our open-source analytics platform Rakam (https://github.com/rakam-io/rakam)»
«ClickHouse shows comparable speed on this query for 30 days and 8 times faster (!) on this query. We plan to test other queries as well, haven't gotten to it yet.
Query execution speed is stable. In Google BigQuery during peak loads, for example at 4:00 p.m. PDT or at the beginning of the month, query execution time can noticeably increase».
«This year we deployed a Druid-based build — Imply Analytics Platform, as well as Tranquility, and were ready to launch into production… But after ClickHouse came out, we immediately abandoned Druid, even though we spent two months studying and implementing it».
«结论:clickhouse速度更快!»
«In conclusion, ClickHouse is faster!»
http://verynull.com/2016/08/22/infinidb与clickhouse对比/

HTTP REST
clickhouse-client
JDBC (production), ODBC (beta)
Python, PHP, Perl, Go,
Node.js, Ruby, C++, .NET, Scala, R, Julia, Rust
Tabix (tabix.io) — developed specifically for ClickHouse.
And also:
Grafana, Redash, Apache Zeppelin,
Superset, Power BI…
FDW for PostgreSQL from Infinidat
(based on Multicorn)
Connecting external dictionary tables from PostgreSQL
using ODBC data source
Website: https://clickhouse.com/
Google groups: https://groups.google.com/forum/#!forum/clickhouse
Mailing list: [email protected]
Telegram chat: https://telegram.me/clickhouse_en and https://telegram.me/clickhouse_ru (already 805 members)
GitHub: https://github.com/ClickHouse/ClickHouse/
+ meetups. Moscow, St. Petersburg, Novosibirsk,
Yekaterinburg, San Francisco, Minsk... Next: Nizhny Novgorod, Kazan, Berlin...