Source: https://sharov43.livejournal.com/1357995.html
— interactive analytic queries;
— on constantly appended data.
«Double realtime»
The most valuable feature of ClickHouse — it's speed.
2008 — first commits in codebase;
2010 — start of research and development;
2012 — use in production for some tasks;
2014 — use in production as core service technology;
2015 — widespread inside Yandex;
2016 — released in open-source!
Now — more than 1000 companies are using ClickHouse.
2008 — first commits in codebase;
I was a developer of Yandex.Metrica service.
I had neither ideas or plans about ClickHouse
but some commits are dated back then.
2008 — first commits in codebase;
2010 — start of research and development;
As an experimental side project.
We have tested multiple existing solutions (as of 2010):
MonetDB, Infobright, InfiniDB...
2008 — first commits in codebase;
2010 — start of research and development;
Nothing was good enough.
Hypothesis:
If we have good enough column-oriented DBMS,
we could store all our data in non-aggregated form
(raw pageviews and sessions) and generate all the reports on the fly,
to allow infinite customization.
This is how "traditional" row-oriented databases work:
And this is how column-oriented databases work:
2008 — first commits in codebase;
2010 — start of research and development;
2012 — use in production for some tasks;
(for intermediate data processing)
2008 — first commits in codebase;
2010 — start of research and development;
2012 — use in production for some data processing tasks;
2014 — use in production as core service technology;
(for realtime reporting)
The hypothesis was proved!
* If you want to try ClickHouse, one server or VM is enough.
2008 — first commits in codebase;
2010 — start of research and development;
2012 — use in production for some tasks;
2014 — use in production as core service technology;
2015 — widespread inside Yandex;
In multiple departments: search, e-commerce, advertisement, personalized news, infrastructure, NOC, business analytics...
2008 — first commits in codebase;
2010 — start of research and development;
2012 — use in production for some tasks;
2014 — use in production as core service technology;
2015 — widespread inside Yandex;
2016 — released in open-source!
2008 — first commits in codebase;
2010 — start of research and development;
2012 — use in production for some tasks;
2014 — use in production as core service technology;
2015 — widespread inside Yandex;
2016 — released in open-source!
Now — more than 1000 companies are using ClickHouse.
Press here please :) ---^
1. Development and support for use cases in our company (Yandex).
2. Maximum widespread of ClickHouse as open-source product.
Rules of successful open-source:
— the product must solve actual problem;
— and do it better than others.
Our goal:
ClickHouse must be the default choice
as an open-source analytical DBMS
— the first and the only right solution
for appropriate use cases;
— and everyone should be aware of it :)
No time to celebrate.
— VK;
— nVidia;
— Spotify;
— Yandex;
— Amadeus;
— Bloomberg;
— CloudFlare;
— ContentSquare;
— Deutsche Bank;
— A-word fruit company;
— C-word telecom company;
— e-word ecommerce company;
— National payment system of (undisclosed) country.
— JD;
— Sina;
— Tencent;
— ByteDance;
— KuaiShou;
— Analysys;
— QingCloud;
— OneAPM;
— HUYA Internet;
... your company?
ClickHouse
не тормозит
ClickHouse
is not slow*
We really need it.
Yandex.Metrica must work!
High level architecture:
— Scale-out shared nothing;
— Massive Parallel Processing;
Data storage optimizations:
— Column-oriented storage;
— Merge Tree;
— Sparse index;
— Data compression;
Algorithmic optimizations:
Best algorithms in the world...
... are happy to be used in ClickHouse.
— Volnitsky substring search
— Hyperscan and RE2
— SIMD JSON
— HDR Histograms
— Roaring Bitmaps
...
Low-level optimizations:
Optimizations for CPU instruction sets
using SIMD processing.
— SIMD text parsing
— SIMD data filtering
— SIMD decompression
— SIMD string operations
...
Specializations of algorithms...
... and attention to detail:
— uniq, uniqExact, uniqCombined, uniqUpTo;
— quantile, quantileTiming, quantileExact, quantileTDigest, quantileWeighted;
— 40+ specializations of GROUP BY;
— algorithms optimize itself for data distribution:
LZ4 decompression with Bayesian Bandits.
— better product quality;
— better Yandex representation as tech company;
— for hiring developers more easily;
— motivation of developers;
Drawbacks:
— it's hard work;
— and we're working really a lot;
HTTP REST
clickhouse-client
JDBC, ODBC
(new) MySQL protocol compatibility
Python, PHP, Perl, Go,
Node.js, Ruby, C++, .NET, Scala, R, Julia, Rust
Web site: https://clickhouse.com/
GitHub: https://github.com/ClickHouse/ClickHouse/
Maillist: [email protected]
Wechat: 4 groups (ask your friend to invite)
+ meetups. Moscow, Saint-Petersburg, Novosibirsk, Ekaterinburg, Minsk...
... Berlin, Paris, Amsterdam, Madrid, Munich, San-Francisco,
... Beijing, Shenzhen, Shanghai, Hong Kong, Singapore, Tokyo.