ClickHouse

— open-source realtime analytical DBMS

Introduction

About me

Alexey, developer of ClickHouse.

I work on data processing engine of Yandex.Metrica since 2008.

About Yandex

Yandex is one of the largest internet companies in Europe
operating Russia’s most popular search engine.

About Yandex.Metrica

Yandex.Metrica (https://metrica.yandex.com/) is a service for web analytics.

Largest in Russia, second largest in the world (just after Google Analytics).

We are processing about ~25 billions of events (page views, conversions, etc).

We must generate and show reports in realtime.

History

How to store data?

Big data processing is not a problem.

The challenge is how to store data in that way to allow both:

- efficient ingestion of click stream in realtime;

- efficient generation of reports;

Let review our historical solutions first...

MySQL (MyISAM) 2008-2011

We have had about 50 predefined report types.

We create a table for each of them.

Each table has primary key in form of:

site_id, date, key -> aggregated statistics.

The data was inserted in mini-batches of aggregated deltas,
using ON DUPLICATE KEY UPDATE.

... but this just don't work.

Data locality on disk (artistic view)

The main concern is data locality.

MySQL (MyISAM) 2008-2011

We use HDD (rotational drives).
We cannot afford petabytes of SSDs.

Each seek is ~12 ms of latency,
usually no more than 1000 random reads/second in RAID array.

Time to read data from disk array is dependent on:
- number of seeks;
- total amount of data;

Example: read 100 000 rows, randomly scattered on disk:
- at least 100 seconds in worst case.
User won't wait hundred seconds for the report.

The only way to read data from disk array in appropriate amount of time is to minimize number of seek by maintaining data locality.

MySQL (MyISAM) 2008-2011

Fundamental problem:

Data is inserted almost in time order:
- each second we have new portion data for this second;
- but data for different web sites are comes in random order in a stream;

Data is selected by ranges for specified web site and date period:
- in ranges of completely different order;

MySQL (MyISAM) 2008-2011

MyISAM stores data in MYD and MYI files.
MYD contains data almost in order of insertion.
MYI contains B-tree index that maps a key to offset in MYD file.

Insertion of data is almost fine.
But selecting of data by range of primary key was non-practical.

Nevertheless, we made it work by:

- tricky partitioning;
- organizing data in few generations with different partitioning scheme;
- moving data between tables by scheduled scripts;
- report generation becomes ugly UNION ALL queries.

Aggregated vs. raw data

Users are not satisfied with 50 predefined reports.

Everyone wants deep ad-hoc analytics with custom reports
— to slice and dice data by any dimension.

This is only possible with non-aggregated data.

The report builder, 2010

We had quickly made a prototype of so-called "report builder".

This was 2010 year. It was just simple specialized column-oriented data structure.

It worked fine and we got understanding, what the right direction to go.

We need good column-oriented DBMS.

Why column-oriented?

This is how "traditional" row-oriented databases work:

Why column-oriented?

And this is how column-oriented databases work:

Why column-oriented?

Hypothesis:

If we have good enough column-oriented DBMS,
we could store all our data in non-aggregated form
(raw pageviews and sessions) and generate all the reports on the fly,
to allow infinite customization.

To check this hypothesis, we started to evaluate existing solutions.

MonetDB, InfiniDB, Vertica, Infobright and so on...

No appropriate solutions were exist in 2010.

ClickHouse

As an experimental project, we started to develop
our own column-oriented DBMS: ClickHouse.

In 2012 it was in production state.

In 2014 we re-lauched Yandex.Metrica as Metrica 2.

All data is stored in ClickHouse and in non-aggregated form
and every report is generated on the fly.

In Metrika 2 the user could create it's own report with
- custom dimensions, metrics, filters, user-centric segmentation...
- and to dig through data to the detail of individual visitors.

ClickHouse

The main target for ClickHouse is query execution speed.

In Yandex.Metrika, users could analyze data for their web sites of any volume.

Biggest classifieds and e-commerce sites with hundreds millions PV/day are using Yandex.Metrika (e.g. ru.aliexpress.com).

In contrast to GA*, in Yandex.Metrika, you could get data reports for large web sites without sampling.

As data is processed on the fly, ClickHouse must be able to crunch all that pageviews in sub second time.

* in Google Analytics you could get reports without sampling only in "premium" version.

The main cluster of Yandex.Metrica

22 trillions of rows (as of May 2017)
472 servers
total throughput of query processing is up to two terabytes per second

* If you want to try ClickHouse, one server or VM is enough.

ClickHouse

column-oriented
distributed (including cross-DC replication)
linearly scalable
fault-tolerant
data ingestion in realtime
realtime (sub-second) queries
support of SQL dialect + extensions
(arrays, nested data types, domain-specific functions, approximate query execution)

When to use ClickHouse

For well structured, clean, immutable events.

Click stream. Web analytics. Adv. networks. RTB. E-commerce.

Analytics for online games. Sensor and monitoring data. Telecom data.

Structured logs.

When not to use ClickHouse

OLTP
ClickHouse doesn't have UPDATE statement and full-featured transactions.

Key-Value
If you want high load of small single-row queries, please use another system.

Blob-store, document oriented
ClickHouse is intended for vast amount of fine-grained data.

Over-normalized data
Better to make up single wide fact table with pre-joined dimensions.

Open-source (since June 2016)

We think ClickHouse is too good to be used solely by Yandex.

We made it open-source. License: Apache 2.0.

https://github.com/ClickHouse/ClickHouse/

Open-source

More than 100 companies are already using ClickHouse.

In Russia: Mail.ru, Kaspersky, Kontur...

In Europe, USA, China: CloudFlare, Wikimedia Foundation...

CloudFlare

How do we analyze over O(100B) DNS requests daily.

"ClickHouse enables us and our customers to explore the the dataset in real time to get operational insights. Due to many of the optimizations built into ClickHouse we are able to store the data for a long time allowing us to look events is perspective and at historical trends."

Wikimedia.org

"clickhouse is a columnar datastore that we are using as an aid to run complex SQL queries on the edit data "lake" that we have as a result of the edit reconstruction project. It is similar to Druid but faster for complex queries."

SMI2 (news aggregator)

Throwed off InfiniDB, Cassandra and Druid
just after public release of ClickHouse.

Rakam.io Analytics Platform

"When we evaluated ClickHouse the results were great compared to Prestodb. Even though the columnar storage optimizations for ORC and Clickhouse is quite similar, Clickhouse uses CPU and Memory resources more efficiently."

Unusual applications of ClickHouse

Bioinformatics - evolutionary genetics:
https://github.com/msestak/FindOrigin

"We are exploring evolution of novel genes in genomes because if seems that genomes are far from being static as previously believed and what actually happens is that new genes are constantly being added and old genes are lost."

Search engine and analytics for Bitcoin transactions:
https://blockchair.com/

"We have quite large tables on just single server and everything works really fast &mdash with any filters and sorting everything is processed just instantly."

ClickHouse and CERN

ClickHouse was already used in CERN: in LHCb experiment back in 2012.

Back then ClickHouse was proprietary system, not available to general public.

Now ClickHouse is mature open-source product.

What a great time to live!

ClickHouse and CERN

Possible usages:

- storage and fast analytics for non-aggregated events;

- telemetry, sensor data;

- analytics on experiments metadata.

Possible drawbacks:

- no integration with Hadoop.

- don't use for large BLOBs (better to store them out of DB).

Performance

ClickHouse vs. Spark

https://www.percona.com/blog/2017/02/13/clickhouse-new-opensource-columnar-database/

ClickHouse vs. PrestoDB

Ömer Osman Koçak:

«When we evaluated ClickHouse the results were great compared to Prestodb. Even though the columnar storage optimizations for ORC and Clickhouse is quite similar, Clickhouse uses CPU and Memory resources more efficiently (Presto also uses vectorized execution but cannot take advantage of hardware level optimizations such as SIMD instruction sets because it's written in Java so that's fair) so we also wanted to add support for Clickhouse for our open-source analytics platform Rakam (https://github.com/rakam-io/rakam)»

ClickHouse vs. Vertica

Timur Shenkao:

«ClickHouse is extremely fast at simple SELECTs without joins, much faster than Vertica».

ClickHouse vs. InfiniDB

«结论：clickhouse速度更快！»

«In conclusion, ClickHouse is faster!»

http://verynull.com/2016/08/22/infinidb与clickhouse对比/

Why ClickHouse is so fast?

— we just cannot make it slower.

Yandex.Metrica must work.

Why ClickHouse is so fast?

Algorithmic optimizations.

MergeTree, locality of data on disk
— fast range queries.

Example: uniqCombined function is a combination of three different data structures, used for different ranges of cardinalities.

Low-level optimizations.

Example: vectorized query execution.

Specialization and attention to detail.

Example: we have 17 different algorithms for GROUP BY. Best one is selected for your query.

Interfaces

How to connect to ClickHouse

HTTP REST

clickhouse-client

JDBC (production), ODBC (beta)

Python, PHP, Perl, Go,
Node.js, Ruby, C++, .NET, Scala, R, Julia, Rust

Integrations with data visualization tools

Tabix (tabix.io) — specifically developed for ClickHouse.

And also:

Grafana, Redash, Apache Zeppelin,
Superset, Power BI…

Community

Web site: https://clickhouse.com/

Google groups: https://groups.google.com/forum/#!forum/clickhouse

Maillist: [email protected]

Telegram channel: https://telegram.me/clickhouse_en and https://telegram.me/clickhouse_ru (now with 701 members)

GitHub: https://github.com/ClickHouse/ClickHouse/

+ meetups.
Moscow, Saint-Petersburg, Novosibirsk, Ekaterinburg, San-Francisco...
Upcoming: Minsk.

Thank you. Questions.

Bonus

ClickHouse vs. typical row-oriented DBMS

Itai Shirav:

«I haven't made a rigorous comparison, but I did convert a time-series table with 9 million rows from Postgres to ClickHouse.

Under ClickHouse queries run about 100 times faster, and the table takes 20 times less disk space. Which is pretty amazing if you ask me».

InfluxDB

- no cluster in free version;

- not capable for complex analytical queries;

- slow ingestion performance;

- many complaints about memory usage, startup time, consistency issues.

Elasticsearch

- fine if all indices fit in RAM;

- high disk usage for data and indices;

- almost not possible to use after some data volume threshold;

- slow scans.

Spark SQL

- Spark is fine as a platform for data transformations;

- not possible to ingest data in realtime while maintaining fast range scans - need to apply tricky partitioning and manual merging;

- scans are relatively fast, though slower than ClickHouse;

Cassandra

- good for semistructured data;

- slow scans, no complex analytical queries;

Kudu

- lambda architecture - complicated data model;