ClickHouse Meetup: Introduction

ClickHouse Meetup
in Shenzhen

Source: https://sharov43.livejournal.com/1357995.html

Why do we need ClickHouse?

— interactive analytic queries;
— on constantly appended data.

«Double realtime»

The most valuable feature of ClickHouse — it's speed.

How we developed ClickHouse

2008 — first commits in codebase;

2010 — start of research and development;

2012 — use in production for some tasks;

2014 — use in production as core service technology;

2015 — widespread inside Yandex;

2016 — released in open-source!

Now — more than 1000 companies are using ClickHouse.

How we developed ClickHouse

2008 — first commits in codebase;

I was a developer of Yandex.Metrica service.

I had neither ideas or plans about ClickHouse
but some commits are dated back then.

How we developed ClickHouse

2008 — first commits in codebase;

2010 — start of research and development;

As an experimental side project.

We have tested multiple existing solutions (as of 2010):
MonetDB, Infobright, InfiniDB...

How we developed ClickHouse

2008 — first commits in codebase;

2010 — start of research and development;

Nothing was good enough.

Hypothesis:

If we have good enough column-oriented DBMS,
we could store all our data in non-aggregated form
(raw pageviews and sessions) and generate all the reports on the fly,
to allow infinite customization.

Why column-oriented?

This is how "traditional" row-oriented databases work:

Why column-oriented?

And this is how column-oriented databases work:

How we developed ClickHouse

2008 — first commits in codebase;

2010 — start of research and development;

2012 — use in production for some tasks;

(for intermediate data processing)

How we developed ClickHouse

2008 — first commits in codebase;

2010 — start of research and development;

2012 — use in production for some data processing tasks;

2014 — use in production as core service technology;

(for realtime reporting)

The hypothesis was proved!

The main cluster of Yandex.Metrica

* If you want to try ClickHouse, one server or VM is enough.

How we developed ClickHouse

2008 — first commits in codebase;

2010 — start of research and development;

2012 — use in production for some tasks;

2014 — use in production as core service technology;

2015 — widespread inside Yandex;

In multiple departments: search, e-commerce, advertisement, personalized news, infrastructure, NOC, business analytics...

How we developed ClickHouse

2008 — first commits in codebase;

2010 — start of research and development;

2012 — use in production for some tasks;

2014 — use in production as core service technology;

2015 — widespread inside Yandex;

2016 — released in open-source!

How we developed ClickHouse

2008 — first commits in codebase;

2010 — start of research and development;

2012 — use in production for some tasks;

2014 — use in production as core service technology;

2015 — widespread inside Yandex;

2016 — released in open-source!

Now — more than 1000 companies are using ClickHouse.

Why open-source?

  1. The technology is too good to be used just inside Yandex.
  2. To have more fun!




Press here please :) ---^

Our goals as ClickHouse developers

1. Development and support for use cases in our company (Yandex).

2. Maximum widespread of ClickHouse as open-source product.

Goal: widespread as open-source product

Rules of successful open-source:

— the product must solve actual problem;
— and do it better than others.

Our goal:

ClickHouse must be the default choice
as an open-source analytical DBMS

— the first and the only right solution
for appropriate use cases;

— and everyone should be aware of it :)

Three years in open-source

No time to celebrate.

Notable users

Web/App Analytics Adv Networks
E-Commerce Telecom Social News
Monitoring/Telemetry Banking/Finance
Government Adult Online Games
Info Sec Agriculture Blockchain

— VK;
— nVidia;
— Spotify;
— Yandex;
— Amadeus;
— Bloomberg;
— CloudFlare;
— ContentSquare;
— Deutsche Bank;
— A-word fruit company;
— C-word telecom company;
— e-word ecommerce company;
— National payment system of (undisclosed) country.

Notable users in China

— JD;
— Sina;
— Tencent;
— ByteDance;
— KuaiShou;
— Analysys;
— QingCloud;
— OneAPM;
— HUYA Internet;

... your company?

ClickHouse

ClickHouse

не тормозит

ClickHouse

is not slow*

Why ClickHouse is fast?

We really need it.

Yandex.Metrica must work!

Why ClickHouse is so fast?

High level architecture:

— Scale-out shared nothing;

— Massive Parallel Processing;

Why ClickHouse is so fast?

Data storage optimizations:

— Column-oriented storage;

— Merge Tree;

— Sparse index;

— Data compression;

Why ClickHouse is so fast?

Algorithmic optimizations:

Best algorithms in the world...
... are happy to be used in ClickHouse.

— Volnitsky substring search

— Hyperscan and RE2

— SIMD JSON

— HDR Histograms

— Roaring Bitmaps

...

Why ClickHouse is so fast?

Low-level optimizations:

Optimizations for CPU instruction sets
using SIMD processing.

— SIMD text parsing

— SIMD data filtering

— SIMD decompression

— SIMD string operations

...

Why ClickHouse is so fast?

Specializations of algorithms...
... and attention to detail:

— uniq, uniqExact, uniqCombined, uniqUpTo;

— quantile, quantileTiming, quantileExact, quantileTDigest, quantileWeighted;

— 40+ specializations of GROUP BY;

— algorithms optimize itself for data distribution:
LZ4 decompression with Bayesian Bandits.

Advantages of open-source

— better product quality;

— better Yandex representation as tech company;

— for hiring developers more easily;

— motivation of developers;


Drawbacks:

— it's hard work;

— and we're working really a lot;

Interfaces

HTTP REST

clickhouse-client

JDBC, ODBC

(new) MySQL protocol compatibility

 

Python, PHP, Perl, Go,
Node.js, Ruby, C++, .NET, Scala, R, Julia, Rust

Community

Web site: https://clickhouse.com/

GitHub: https://github.com/ClickHouse/ClickHouse/

Maillist: [email protected]

Wechat: 4 groups (ask your friend to invite)

+ meetups. Moscow, Saint-Petersburg, Novosibirsk, Ekaterinburg, Minsk...
... Berlin, Paris, Amsterdam, Madrid, Munich, San-Francisco,
... Beijing, Shenzhen, Shanghai, Hong Kong, Singapore, Tokyo.

Next: ClickHouse for Machine Learning