An open-source analytic DBMS.
Used everywhere: Microsoft, Spotify, Cloudflare, Lyft, Deutsche Bank,
+ thousands companies.
Open-source since 2016, currently the most popular analytic DBMS.
Scalable from a laptop or a server to datacenters.
1,482 contributors and 34,700 stars.
ClickHouse is like Postgres, but for analytics.
Fast SQL queries with low latency and high concurrency
with real-time insertion.
Available in ClickHouse Cloud, on AWS Marketplace.
AWS has a lot of instance types:
— m, c, r, x, z, u; t, a; i, im, d; g, p, f, inf, trn, vt, hpc, mac;
With tweaks:
— -d with local disks; -n with faster network; -flex;
Different types and generations of CPU:
— i - Intel, a - AMD; up to 7th generation as of May 2024
Different CPU architectures:
— x86_64 - Intel and AMD; AArch64 (aka ARM64) - Graviton;
Example: r6idn-24xlarge:
RAM-optimized (8 GB per vCPU), 6th-generation of Intel CPU,
with local SSD, network optimized, 4*24 vCPU.
AWS Graviton is an Aarch64 CPU, custom built by AWS.
2018 - 1st generation; ... 2024 - 4th generation (currently in preview!)
It is a different architecture, so not all software is going to work.
Some have to be adapted, compiled, and tested on ARM.
Good news: ClickHouse works! deb, rpm, tgz are available for AArch64
Quick installation autodetects the architecture:
curl https://clickhouse.com/ | sh
Step 1: run a benchmark on every generation.
ClickBench: https://benchmark.clickhouse.com/hardware
— fully automated benchmark, runs on all instance types.
— attempts to mimic a clickstream analytics workload;
Btw, you can run it by yourself and submit results.
Results (demo): https://pastila.nl/?000b1ba6/c224ddf960900f4f2d0d9e100cef5445.html
Throughput on a single query:
— how quickly a massively-parallel query runs, e.g. 1 second vs 1.5 seconds;
— depends on the number of CPU; aggregate performance; and mem bw;
and on the software optimization for a particular instruction set;
Latency on short queries:
— how quickly a small query runs, e.g. 25 ms vs 50 ms.
— depends on the speed of a single CPU core;
Total load capacity:
— how many concurrent users and QPS can we sustain.
— depends on the aggregate CPU performance and memory bw;
Availability of the instances:
— in particular regions; in particular configurations.
— ask your AWS architect.
Software compatibility:
— and how well it is tested on this architecture.
Cost/performance:
— this also depends on what performance metric is in comparison.
Step 1: estimate total capacity.
By running all ClickBench's queries in parallel:
clickhouse-benchmark -c32 -i1000 < queries.sql
machine | QPS | cost |
---|---|---|
r7i.8xlarge | 2.800 | $2.0160 |
r7g.8xlarge | 3.500 (+25%) | $1.7136 (-15%) |
r8g.8xlarge | 4.595 (+64%) | preview |
Summary comparison with contemporary Intel/AMD machines:
Graviton 1 (2018):
— low powerful machines, not comparable in performance.
Graviton 2 (2020):
— comparable throughput, but single-core performance is lower.
Graviton 3 (2022):
— better throughput and comparable single-core performance.
Graviton 4 (2024):
— even better throughput, lower latency, and more cores 😋.
In ClickHouse Cloud.
It looks obvious: we can get more power for lower price!
But it is not so obvious...
Availability of disk instances
ClickHouse Cloud uses S3 and local SSDs for cache. But Graviton 3 instances with local SSDs started to be available in required regions only recently*.
* we are introducing a "distributed cache" to decouple disks and remove this requirement.
Live migration
A cluster should be able to run in a hybrid mode — some replicas x86_64, some AArch64.
Orchestration and infrastructure
All components have to be ported to AArch64 as well.
Full continuous integration with all test suites
had to be enabled on Graviton instance types.
Feature parity
Every existing feature should work on AArch64, even rarely used ones.
Especially our own debugging and introspection capabilities.
Pricing and performance consistency
We cannot randomly give 2x powerful machines in a subset of regions,
as it could lead into surprises for customers.
We have to do it! The advantages are overwhelming.
So we prepared everything and migrated
our staging environment to Graviton :) (m7gd)
We use staging environment for testing and for personal, internal,
and demo projects.
A cluster in the Cloud, that collects logs from all builds and tests.
We run ~2,000,000 tests every day, and each test generates a lot of logs.
4.3 trillion rows, 65 TiB compressed data, 1.39 PiB uncompressed.
Let's run a heavy query... Scan a table with 1.08 trillion reconds.
SELECT sum(cityHash64(*)) AS x FROM build_time_trace
Let's run a heavy query... Scan a table with 1.08 trillion reconds:
clickhouse-cloud :) SELECT sum(cityHash64(*)) AS x FROM build_time_trace
┌────────────────────x─┐
│ 18424954377991503633 │
└──────────────────────┘
Elapsed: 3745.202 sec. Processed 1.07 trillion rows, 374.14 TB
(285.45 million rows/s., 99.90 GB/s.)
This is before the migration to Graviton.
How much faster is it after the migration?
How much faster is it after the migration to Graviton?
clickhouse-cloud :) SELECT sum(cityHash64(*)) AS x FROM build_time_trace
┌────────────────────x─┐
│ 18424954377991503633 │
└──────────────────────┘
Elapsed: 3395.191 sec. Processed 1.08 trillion rows, 376.63 TB
(316.99 million rows/s., 110.93 GB/s.)
— about 10% faster.
It was mostly network bound, reading from S3,
and we rather should have used network-optimized instances.
Demo: https://adsb.exposed/
Demo: https://adsb.exposed/
r6i.metal: 16.27 GB/sec;
r8g.24xlarge (Graviton 4): 26.71 GB/sec;
— 64% faster!
If you can use Graviton, you should already do so.
Graviton 4 is going to be amazing... the only question is availability.
We will make ClickHouse Cloud even faster!