What for?
— Testing.
— An internal company contest.
— I always like to play with ClickHouse and have fun :)
We will need:
— an idea.
— a nice dataset.
— a way to analyze and visialize it.
We want an "internet-scale" dataset. How to get it? π€
— get it from the Internet π€«
Forward:
$ host google.com
google.com has address 142.250.179.174
google.com has IPv6 address 2a00:1450:400e:802::200e
google.com mail is handled by 10 smtp.google.com.
Reverse:
$ host 142.250.179.174
174.179.250.142.in-addr.arpa domain name pointer
ams15s41-in-f14.1e100.net.
Forward:
$ dig @1.1.1.1 google.com -t A
;; ANSWER SECTION:
google.com. 159 IN A 142.250.179.174
Reverse:
$ dig @1.1.1.1 174.179.250.142.in-addr.arpa -t PTR
;; ANSWER SECTION:
174.179.250.142.in-addr.arpa. 81998 IN PTR
ams15s41-in-f14.1e100.net.net.
$ mtr google.com
1. 2a02-a210-1111-a500-aef8-ccff-fedc-1234.cable.dynamic.v6.ziggo.nl
2. (waiting for reply)
3. 2a02:a200:0:12::1
4. asd-tr0021-cr101-lo10.core.as9143.net
5. nl-ams14a-ri1-ae51-0.core.as9143.net
6. 2001:4860:1:1::2ac
7. 2a00:1450:8012::1
8. 2001:4860:0:1::e36
9. 2001:4860:0:f8b::c
10. 2001:4860::c:4002:c2d0
11. 2001:4860::1:0:cd13
12. 2001:4860:0:f8c::1
13. 2001:4860:0:1::5a85
14. ams17s12-in-x0e.1e100.net
... to perform reverse DNS request on all of them.
There are only 232 IP addresses.
Slightly less than 4 billion belongs to Internet.
How to do the scan?
... to perform reverse DNS request on all of them.
There are tools for that:
— MassDNS: https://github.com/blechschmidt/massdns
— GNU adns library: https://www.gnu.org/software/adns/
Advantages: they are fast :)
Disadvantages: you will be blocked :(
Where to do it?
From home?
— your ISP will be surprised by unusual traffic.
From a datacenter?
— my friend tried it from Digital Ocean and was blocked.
In the cloud?
— if you do it in AWS, it is ok if you communicate in advance.
What option did I choose?
DNS can work:
— over UDP (unreliable and not secure);
— over TCP (reliable but not secure);
— over TLS (reliable and secure, but heavyweight);
— over HTTPS (reliable and secure, but heavyweight);
— DNSCrypt over TCP or UDP (very different);
curl -H 'accept: application/dns-json' \
'https://cloudflare-dns.com/dns-query?name=174.179.250.142.in-addr.arpa&type=PTR'
curl -H 'accept: application/dns-json' \
'https://1.1.1.1/dns-query?name=174.179.250.142.in-addr.arpa&type=PTR'
$ curl -H 'accept: application/dns-json' \
'https://1.1.1.1/dns-query?name=174.179.250.142.in-addr.arpa&type=PTR' \
| jq
{
"Status": 0,
"TC": false, "RD": true, "RA": true, "AD": false, "CD": false,
"Question": [
{
"name": "174.179.250.142.in-addr.arpa",
"type": 12
}
],
"Answer": [
{
"name": "174.179.250.142.in-addr.arpa",
"type": 12,
"TTL": 77114,
"data": "ams15s41-in-f14.1e100.net."
}
]
}
Bonus: it can use HTTP 1.0, 1.1, 2, 3.
curl --http2 -H 'accept: application/dns-json' \
'https://1.1.1.1/dns-query?name=174.179.250.142.in-addr.arpa&type=PTR' \
--next --http2 -H 'accept: application/dns-json' \
'https://1.1.1.1/dns-query?name=3.121.82.140.in-addr.arpa&type=PTR'
CloudFlare, Google, OpenDNS have this service.
Using it for a mass scan could violate ToS,
but at least CloudFlare does not restrict it.
They are processing trillions of requests every day anyway :)
CREATE TABLE dns
(
time DateTime DEFAULT now(),
json String
)
ENGINE = MergeTree
ORDER BY ();
We'll write the responses as is, and parse later.
clickhouse-local --max_block_size 1 --progress --query "
WITH number DIV 65536 AS a, number DIV 256 MOD 256 AS b, number MOD 256 AS c
SELECT c::String || '.' || b::String || '.' || a::String
FROM numbers(16777216)
WHERE (a > 0 AND a < 127 AND a != 10)
OR (a >= 128 AND a < 192 AND (a != 172 OR b < 16 OR b > 31)
AND (a != 169 AND b != 254))
OR (a >= 192 AND a < 224 AND (a != 192 OR b != 168))
" | xargs -P1000 -I{} bash -c "
seq 0 255 | sed -r -e '
s@(.*)@--next --http2 -H \"accept: application/dns-json\" \
\"https://1.1.1.1/dns-query?name=\\1.{}.in-addr.arpa\\&type=PTR\"@' \
| tr '\n' ' ' | sed 's/--next/curl -sS/' | bash \
| clickhouse-client --send_timeout 60000 --receive_timeout 60000 --query '
INSERT INTO dns (json) FORMAT JSONAsString'"
2005, ZoΓ« Roth family.
I tried to be gentle and run it from a single machine...
How long did it take?
I tried to be gentle and run it from a single machine...
How long did it take?
— around ten days.
— it could be just a few hours with MassDNS.
There are open datasets of historical DNS scans:
Project Sonar: https://opendata.rapid7.com/sonar.rdns_v2/
:) SELECT formatReadableQuantity(count()) FROM dns
ββformatReadableQuantity(count())ββ
β 3.69 billion β
βββββββββββββββββββββββββββββββββββ
:) SELECT formatReadableSize(total_bytes)
FROM system.tables WHERE name = 'dns'
ββformatReadableSize(total_bytes)ββ
β 13.69 GiB β -- we are storing JSON as a string
βββββββββββββββββββββββββββββββββββ
:) SELECT
name,
formatReadableSize(data_compressed_bytes) AS compr,
formatReadableSize(data_uncompressed_bytes) AS raw,
round(data_uncompressed_bytes / data_compressed_bytes, 2) AS ratio
FROM system.columns
WHERE table = 'dns' AND name = 'json'
ββnameββ¬βcomprββββββ¬βrawβββββββββ¬βratioββ
β json β 13.66 GiB β 896.18 GiB β 65.6 β
ββββββββ΄ββββββββββββ΄βββββββββββββ΄ββββββββ
CREATE TABLE dns_parsed ENGINE = MergeTree ORDER BY ip
AS
WITH JSONExtract(json,
'Tuple(Status UInt8, TC Bool, RD Bool, RA Bool, AD Bool, CD Bool,
Question Array(Tuple(name String)),
Answer Array(Tuple(name String, type UInt8, TTL UInt32, data String)),
Authority Array(Tuple(name String, type UInt8, TTL UInt32, data String)),
Comment Array(String))') AS t
SELECT time, t.Status, t.TC, t.RD, t.RA, t.AD, t.CD,
toIPv4(byteSwap(IPv4StringToNum(extract(
tupleElement(t.Question[1], 'name'), '^(\d+\.\d+\.\d+\.\d+)\.')))) AS ip,
replaceRegexpOne(tupleElement(t.Answer[1], 'data'), '\.$', '') AS domain
FROM dns WHERE t.Answer != ''
SETTINGS allow_experimental_analyzer = 1;
CREATE TABLE dns_parsed
(
`time` DateTime,
`t.Status` UInt8,
`t.TC` Bool,
`t.RD` Bool,
`t.RA` Bool,
`t.AD` Bool,
`t.CD` Bool,
`ip` IPv4,
`domain` String,
)
ENGINE = MergeTree
ORDER BY ip
:) SELECT formatReadableSize(total_bytes) FROM system.tables
WHERE name = 'dns_parsed'
ββformatReadableSize(total_bytes)ββ
β 4.88 GiB β
βββββββββββββββββββββββββββββββββββ
:) SELECT name,
formatReadableSize(data_compressed_bytes) AS compr,
formatReadableSize(data_uncompressed_bytes) AS raw,
round(data_uncompressed_bytes / data_compressed_bytes, 2) AS ratio
FROM system.columns WHERE table = 'dns_parsed'
ORDER BY data_compressed_bytes DESC
ββnameββββββ¬βcomprβββββββ¬βrawββββββββ¬βββratioββ
β ip β 3.24 GiB β 4.78 GiB β 1.47 β
β domain β 1.54 GiB β 42.09 GiB β 27.41 β
β time β 31.68 MiB β 4.78 GiB β 154.43 β
β t.RD β 859.97 KiB β 1.19 GiB β 1456.31 β
β t.RA β 859.97 KiB β 1.19 GiB β 1456.31 β
β t.Status β 549.66 KiB β 11.28 MiB β 21.02 β
β t.AD β 151.60 KiB β 6.03 MiB β 40.7 β
β t.TC β 96.74 KiB β 1.50 MiB β 15.9 β
β t.CD β 96.74 KiB β 1.50 MiB β 15.9 β
ββββββββββββ΄βββββββββββββ΄ββββββββββββ΄ββββββββββ
:) SELECT ip, domain FROM dns_parsed ORDER BY ip LIMIT 10
ββipβββββββββ¬βdomainββββββββββββββββββββββ
β 1.0.0.1 β one.one.one.one β
β 1.0.4.4 β ns1.gtelecom.com.au β
β 1.0.4.5 β auth1.gtelecom.com.au β
β 1.0.5.5 β ns2.gtelecom.com.au β
β 1.0.6.6 β ns3.gtelecom.com.au β
β 1.0.16.70 β smtp.kakeibo.tepco.co.jp β
β 1.0.16.71 β smtp01.kakeibo.tepco.co.jp β
β 1.0.16.72 β smtp02.kakeibo.tepco.co.jp β
β 1.0.16.73 β smtp03.kakeibo.tepco.co.jp β
β 1.0.16.74 β smtp04.kakeibo.tepco.co.jp β
βββββββββββββ΄βββββββββββββββββββββββββββββ
What are the most popular TLDs? Orgs?
:) SELECT topLevelDomain(domain) AS d, count() AS c, bar(c, 0, 1e9, 50)
FROM dns_parsed GROUP BY d ORDER BY c DESC LIMIT 10
ββdββββ¬βββββββββcββ¬βbar(count(), 0, 1000000000., 50)ββ
β net β 406985266 β βββββββββββββββββββββ β
β com β 281925343 β ββββββββββββββ β
β jp β 77920764 β ββββ β
β br β 54343724 β βββ β
β de β 53843959 β βββ β
β it β 34083123 β ββ β
β fr β 24368263 β ββ β
β mx β 21451597 β β β
β cn β 18691219 β β β
β ru β 17262632 β β β
βββββββ΄ββββββββββββ΄βββββββββββββββββββββββββββββββββββ
:) SELECT cutToFirstSignificantSubdomain(domain) AS d, count() AS c,
bar(c, 0, 1e8, 50)
FROM dns_parsed GROUP BY d ORDER BY c DESC LIMIT 10
ββdβββββββββββββββββ¬ββββββββcββ¬βbar(count(), 0, 100000000., 50)ββ
β amazonaws.com β 56667330 β βββββββββββββββββββββββββββββ β
β comcast.net β 40638365 β βββββββββββββββββββββ β
β bbtec.net β 39534782 β ββββββββββββββββββββ β
β spectrum.com β 31682763 β ββββββββββββββββ β
β sbcglobal.net β 31406669 β ββββββββββββββββ β
β myvzw.com β 30122993 β βββββββββββββββ β
β t-ipconnect.de β 25722620 β βββββββββββββ β
β rr.com β 23476357 β ββββββββββββ β
β telecomitalia.it β 19438417 β ββββββββββ β
β verizon.net β 16205490 β ββββββββ β
ββββββββββββββββββββ΄βββββββββββ΄ββββββββββββββββββββββββββββββββββ
:) SELECT domain AS d, count() AS c
FROM dns_parsed WHERE domain LIKE '%clickhouse%'
GROUP BY d ORDER BY d
ββdβββββββββββββββββββββββββββββββββββββββββββββββββββ¬βcββ
β appodeal-clickhouse-staging β 1 β
β bettrade.prod.clickhouse-1 β 1 β
β bettrade.stage.clickhouse-1 β 1 β
β bhs-clickhouse-0.justuno.com β 1 β
β bhs-clickhouse-1.justuno.com β 1 β
β bhs-clickhouse-2.justuno.com β 1 β
β bhs-clickhouse-3.justuno.com β 1 β
β bhs-clickhouse-4.justuno.com β 1 β
β bi-clickhouse-playground.dev.scraperapi.com β 1 β
β bitmedia.clickhouse-distribution-nyc1 β 1 β
β bitmedia.clickhouse-log-nyc1 β 1 β
β bitmedia.clickhouse-r1s1-nyc1 β 1 β
β bitmedia.clickhouse-r1s2-nyc1 β 1 β
β bitmedia.clickhouse-r1s3-nyc1 β 1 β
β bitmedia.clickhouse-r2s1-nyc1 β 1 β
β bitmedia.clickhouse-r2s2-nyc1 β 1 β
β bitmedia.clickhouse-r2s3-nyc1 β 1 β
β bitmedia.clickhouse-r3s1-nyc1 β 1 β
β bitmedia.clickhouse-r3s2-nyc1 β 1 β
β bitmedia.clickhouse-r3s3-nyc1 β 1 β
β bpce-production-clickhouse01.kameleoon.net β 1 β
β bpce-production-clickhouse02.kameleoon.net β 1 β
β cld-clickhouse-log-new.q.smailru.net β 1 β
β cld-clickhouse-log.q.smailru.net β 1 β
β cld-clickhouse-proxy.q.smailru.net β 1 β
β cld-graphite-clickhouse.q.smailru.net β 1 β
β clickhouse-001-08.adpartner.pro β 1 β
β clickhouse-001-09.adpartner.pro β 1 β
β clickhouse-002-08.adpartner.pro β 1 β
β clickhouse-002-09.adpartner.pro β 1 β
β clickhouse-01-airflow.platform.aviata.team β 1 β
β clickhouse-01.dev.fxcintel.com β 1 β
β clickhouse-01.ele26126.jscrambler.com β 1 β
β clickhouse-01.hipler.me β 1 β
β clickhouse-01.lmd91999.jscrambler.com β 1 β
β clickhouse-01.nps00666.jscrambler.com β 1 β
β clickhouse-01.platform.aviata.team β 1 β
β clickhouse-01.staging.fxcintel.com β 1 β
β clickhouse-02.dev.fxcintel.com β 1 β
β clickhouse-02.platform.aviata.team β 1 β
β clickhouse-03.platform.aviata.team β 1 β
β clickhouse-1-fi.subh.net β 1 β
β clickhouse-1-u633.vm.elestio.app β 1 β
β clickhouse-1.cdn77.com β 1 β
β clickhouse-1.dorg.cc β 1 β
β clickhouse-1.man.exa.net.uk β 1 β
β clickhouse-1.stofa.net β 1 β
β clickhouse-12.dorg.cc β 1 β
β clickhouse-13.dorg.cc β 1 β
β clickhouse-14.dorg.cc β 1 β
β clickhouse-15.dorg.cc β 1 β
β clickhouse-16.dorg.cc β 1 β
β clickhouse-17.dorg.cc β 1 β
β clickhouse-2-de.subh.net β 1 β
β clickhouse-2.man.exa.net.uk β 1 β
β clickhouse-21.dorg.cc β 1 β
β clickhouse-22.dorg.cc β 1 β
β clickhouse-23.dorg.cc β 1 β
β clickhouse-24.dorg.cc β 1 β
β clickhouse-25.dorg.cc β 1 β
β clickhouse-3-fi.subh.net β 1 β
β clickhouse-3.man.exa.net.uk β 1 β
β clickhouse-4-de.subh.net β 1 β
β clickhouse-4.dorg.cc β 1 β
β clickhouse-4.man.exa.net.uk β 1 β
β clickhouse-5.dorg.cc β 1 β
β clickhouse-5.man.exa.net.uk β 1 β
β clickhouse-6.man.exa.net.uk β 1 β
β clickhouse-7.dorg.cc β 1 β
β clickhouse-78f2-c27b61 β 1 β
β clickhouse-8.dorg.cc β 1 β
β clickhouse-9.dorg.cc β 1 β
β clickhouse-backup0d.dt07.net β 1 β
β clickhouse-backup1.dt07.net β 1 β
β clickhouse-ch-dev-1-public.htz.appfollow.io β 1 β
β clickhouse-ch-dev-2-public.htz.appfollow.io β 1 β
β clickhouse-ch-dev-3-public.htz.appfollow.io β 1 β
β clickhouse-ch-dev-4-public.htz.appfollow.io β 1 β
β clickhouse-ch-prod-1-public.htz.appfollow.io β 1 β
β clickhouse-ch-prod-2-public.htz.appfollow.io β 1 β
β clickhouse-ch-prod-3-public.htz.appfollow.io β 1 β
β clickhouse-ch-prod-4-public.htz.appfollow.io β 1 β
β clickhouse-control.corp.cloudlinux.com β 1 β
β clickhouse-dev.do.stp.local β 1 β
β clickhouse-eval.es.net β 1 β
β clickhouse-fsn-1.tradesoft.ru β 1 β
β clickhouse-fsn-2.tradesoft.ru β 1 β
β clickhouse-fsn-3.tradesoft.ru β 1 β
β clickhouse-graphite-staging.innogames.de β 1 β
β clickhouse-graphite-testing.innogames.de β 1 β
β clickhouse-graphite.innogames.de β 1 β
β clickhouse-hel-1.tradesoft.ru β 1 β
β clickhouse-idrac.find.smapps.org β 1 β
β clickhouse-ixc-1.hoztnode.net β 1 β
β clickhouse-logs02 β 2 β
β clickhouse-logs03 β 1 β
β clickhouse-logs04 β 1 β
β clickhouse-proactive1.cc-omsk.ertelecom.ru β 1 β
β clickhouse-proactive2.cc-perm.ertelecom.ru β 1 β
β clickhouse-prodev.ele26126.jscrambler.com β 1 β
β clickhouse-scfnx-u4604.vm.elestio.app β 1 β
β clickhouse-stage.mapper β 1 β
β clickhouse-stats.eudc.cloud β 1 β
β clickhouse-tds-01.do.stp-cis.local β 1 β
β clickhouse-tds-01.do.stp-int.local β 1 β
β clickhouse-tds-01.do.stp.local β 1 β
β clickhouse-tds-02.do.stp-cis.local β 1 β
β clickhouse-tds-02.do.stp-int.local β 1 β
β clickhouse-tds-02.do.stp.local β 1 β
β clickhouse-tv-analytics-1.stofa.net β 1 β
β clickhouse-u466.vm.elestio.app β 1 β
β clickhouse-uploader01.42n.dk β 1 β
β clickhouse-west1-ipmi.es.net β 1 β
β clickhouse-west1.es.net β 1 β
β clickhouse-west2-ipmi.es.net β 1 β
β clickhouse-west2.es.net β 1 β
β clickhouse-zk-dev-1-public.htz.appfollow.io β 1 β
β clickhouse-zk-dev-2-public.htz.appfollow.io β 1 β
β clickhouse-zk-dev-3-public.htz.appfollow.io β 1 β
β clickhouse-zk-prod-1-public.htz.appfollow.io β 1 β
β clickhouse-zk-prod-2-public.htz.appfollow.io β 1 β
β clickhouse-zk-prod-3-public.htz.appfollow.io β 1 β
β clickhouse.adgoaltech.net β 1 β
β clickhouse.com β 1 β
β clickhouse.cruise.smapps.org β 1 β
β clickhouse.cryptoboom.com β 1 β
β clickhouse.db.eait.uq.edu.au β 1 β
β clickhouse.dev β 1 β
β clickhouse.do.stp.local β 1 β
β clickhouse.ecomru.ru β 1 β
β clickhouse.ellada.space β 1 β
β clickhouse.enaza.ru β 1 β
β clickhouse.equeo.de β 1 β
β clickhouse.find.smapps.org β 1 β
β clickhouse.flashphoner.com β 1 β
β clickhouse.growadvertising.com β 1 β
β clickhouse.gsn.do-00 β 1 β
β clickhouse.gsn.do-01 β 1 β
β clickhouse.h.loggie.co β 1 β
β clickhouse.lbl.gov β 1 β
β clickhouse.m.af.de β 1 β
β clickhouse.mahjong.smapps.org β 1 β
β clickhouse.outofcloud.ru β 1 β
β clickhouse.pulkovo-airport.com β 1 β
β clickhouse.qencode.com β 1 β
β clickhouse.realty.p.smailru.net β 1 β
β clickhouse.smapps.net β 1 β
β clickhouse.solitaire.smapps.org β 1 β
β clickhouse.testomat.io β 1 β
β clickhouse.toadman.io β 1 β
β clickhouse.youlahdp.p.smailru.net β 1 β
β clickhouse.zav.org.ua β 1 β
β clickhouse00.gpn β 1 β
β clickhouse001.fra.eu.mktgoo.net β 1 β
β clickhouse01-dev.node.dc01.int.timo.group β 1 β
β clickhouse01.adapex.io β 1 β
β clickhouse01.dnx.lu β 1 β
β clickhouse01.gb.smailru.net β 1 β
β clickhouse01.het.mrf.io β 1 β
β clickhouse01.ovh.mrf.io β 1 β
β clickhouse01.stat.ams2.conversify.com β 1 β
β clickhouse02.dnx.lu β 1 β
β clickhouse02.het.mrf.io β 1 β
β clickhouse02.ovh.mrf.io β 1 β
β clickhouse02.stat.ams2.conversify.com β 1 β
β clickhouse03.dnx.lu β 1 β
β clickhouse03.het.mrf.io β 1 β
β clickhouse03.ovh.mrf.io β 1 β
β clickhouse03.stat.ams2.conversify.com β 1 β
β clickhouse04.dnx.lu β 1 β
β clickhouse04.het.mrf.io β 1 β
β clickhouse05.het.mrf.io β 1 β
β clickhouse06.het.mrf.io β 1 β
β clickhouse07.het.mrf.io β 1 β
β clickhouse08.het.mrf.io β 1 β
β clickhouse09.het.mrf.io β 1 β
β clickhouse09.sdkl.info β 1 β
β clickhouse1-1.ssel2.imcmdb.net β 1 β
β clickhouse1-1.sselp15.imcmdb.net β 1 β
β clickhouse1.cluster.zebralution.it β 1 β
β clickhouse1.nyc1.crafty.ner β 1 β
β clickhouse10.het.mrf.io β 1 β
β clickhouse10.sdkl.info β 1 β
β clickhouse11.het.mrf.io β 1 β
β clickhouse11.sdkl.info β 1 β
β clickhouse12.het.mrf.io β 1 β
β clickhouse12.sdkl.info β 1 β
β clickhouse13.het.mrf.io β 1 β
β clickhouse13.sdkl.info β 1 β
β clickhouse14.het.mrf.io β 1 β
β clickhouse14.sdkl.info β 1 β
β clickhouse15.sdkl.info β 1 β
β clickhouse16.sdkl.info β 1 β
β clickhouse2.cluster.zebralution.it β 1 β
β clickhouse2.fppressa.ru β 1 β
β clickhouse2.nyc1.fontbundles.net β 1 β
β clickhouse2.r-lab.biz β 1 β
β clickhousemysqlslave01.dnx.lu β 1 β
β club.prod.clickhouse.main β 1 β
β club.stage.clickhouse.main β 1 β
β conclickhouse001.srv.conova.net β 1 β
β de-hz-clickhouse1 β 1 β
β defra1-clickhousecdn01d.sys.i3d.network β 1 β
β defra1-clickhousecdn02d.sys.i3d.network β 1 β
β defra1-clickhousecdn03d.sys.i3d.network β 1 β
β defra1-clickhousecdn04d.sys.i3d.network β 1 β
β dev-clickhouse1.nyc1.crafty.net β 1 β
β dwh-clickhouse.dc.smailru.net β 1 β
β fi-hz-clickhouse2 β 1 β
β fiddler-clickhouse-node-0.sd.spawar.navy.mil β 1 β
β fiddler-clickhouse-node-1.sd.spawar.navy.mil β 1 β
β france-production-clickhouse01.kameleoon.net β 1 β
β france-production-clickhouse02.kameleoon.net β 1 β
β france-production-clickhouse03.kameleoon.net β 1 β
β france-production-clickhouse04.kameleoon.net β 1 β
β france-production-clickhouse05.kameleoon.net β 1 β
β france-production-clickhouse06.kameleoon.net β 1 β
β fsn-clickhouse1.node.prerender.io β 1 β
β graphite-clickhouse-01.cern.ch β 1 β
β hitron-clickhouse.cloudcheck.net β 1 β
β is-graph-clickhouse.i.smailru.net β 1 β
β iva-clickhouse1.yamoney.ru β 1 β
β iva-clickhouse2.yamoney.ru β 1 β
β iva-clickhouse3.yamoney.ru β 1 β
β lidl-production-clickhouse01.kameleoon.net β 1 β
β lidl-production-clickhouse02.kameleoon.net β 1 β
β load-clickhouse.nyc1.fontbundles.net β 1 β
β lsclickhouse1.gcore β 1 β
β lsclickhouse2.gcore β 1 β
β lsclickhouse3.gcore β 1 β
β lsg-clickhouse.lbl.gov β 1 β
β mclickhouse0.freenet.de β 1 β
β mclickhouse1.freenet.de β 1 β
β mclickhouse2.freenet.de β 1 β
β nlrtm1-clickhouse01.sys.i3d.network β 1 β
β nlrtm1-clickhousecdn01d.sys.i3d.network β 1 β
β nlrtm1-clickhousecdn02d.sys.i3d.network β 1 β
β nlrtm1-clickhousecdn03d.sys.i3d.network β 1 β
β nlrtm1-clickhousecdn04d.sys.i3d.network β 1 β
β node01.clickhouse.statistics.aatkit.com β 1 β
β node02.clickhouse.statistics.aatkit.com β 1 β
β node03.clickhouse.statistics.aatkit.com β 1 β
β node03.dev.clickhouse.statistics.aatkit.com β 1 β
β node04.clickhouse.statistics.aatkit.com β 1 β
β node05.clickhouse.statistics.aatkit.com β 1 β
β node06.clickhouse.statistics.aatkit.com β 1 β
β node07.clickhouse.statistics.aatkit.com β 1 β
β node08.clickhouse.statistics.aatkit.com β 1 β
β node09.clickhouse.statistics.aatkit.com β 1 β
β node101.clickhouse.statistics.aatkit.com β 1 β
β node102.clickhouse.statistics.aatkit.com β 1 β
β node103.clickhouse.statistics.aatkit.com β 1 β
β node104.clickhouse.statistics.aatkit.com β 1 β
β node105.clickhouse.statistics.aatkit.com β 1 β
β node106.clickhouse.statistics.aatkit.com β 1 β
β node107.clickhouse.statistics.aatkit.com β 1 β
β node108.clickhouse.statistics.aatkit.com β 1 β
β node109.clickhouse.statistics.aatkit.com β 1 β
β node201.clickhouse.statistics.aatkit.com β 1 β
β node202.clickhouse.statistics.aatkit.com β 1 β
β node203.clickhouse.statistics.aatkit.com β 1 β
β node204.clickhouse.statistics.aatkit.com β 1 β
β node205.clickhouse.statistics.aatkit.com β 1 β
β node206.clickhouse.statistics.aatkit.com β 1 β
β node207.clickhouse.statistics.aatkit.com β 1 β
β node208.clickhouse.statistics.aatkit.com β 1 β
β node209.clickhouse.statistics.aatkit.com β 1 β
β prod-us-clickhouse-01.jscrambler.com β 1 β
β product-recommendations-clickhouse01.kameleoon.net β 1 β
β product-recommendations-clickhouse02.kameleoon.net β 1 β
β production-clickhouse01.kameleoon.net β 1 β
β production-clickhouse02.kameleoon.net β 1 β
β production-clickhouse03.kameleoon.net β 1 β
β production-clickhouse04.kameleoon.net β 1 β
β production-clickhouse05.kameleoon.net β 1 β
β production-clickhouse06.kameleoon.net β 1 β
β projectx-clickhouse-nl-1.itsts.net β 1 β
β pushhouse-clickhouse-1.t.push.house β 1 β
β pushhouse-clickhouse-2.t.push.house β 1 β
β pushhouse-clickhouse-3.t.push.house β 1 β
β pushhouse-clickhouse-4.t.push.house β 1 β
β pushhouse-clickhouse-5.t.push.house β 1 β
β pushhouse-clickhouse-6.t.push.house β 1 β
β rclickhouse1.gcore β 1 β
β ru-ihor-clickhouse-1.loc β 1 β
β s04.servers.clickhouse.pepper.com β 1 β
β s05.servers.clickhouse.pepper.com β 1 β
β s06.servers.clickhouse.pepper.com β 1 β
β s07.servers.clickhouse.pepper.com β 1 β
β sds-clickhouse-01.pltrm.net β 1 β
β sds-clickhouse-02.pltrm.net β 1 β
β sds-clickhouse-03.pltrm.net β 1 β
β sds-clickhouse-05.pltrm.net β 1 β
β sds-clickhouse-06.pltrm.net β 1 β
β sds-clickhouse-07.pltrm.net β 1 β
β sm-clickhouse-prod-001.brq2.ff.avast.com β 1 β
β sm-clickhouse-prod-002.brq2.ff.avast.com β 1 β
β sm-clickhouse-prod-003.brq2.ff.avast.com β 1 β
β srv-clickhouse.infomatix.tech β 1 β
β srv01.clickhouse.tonlabs.io β 1 β
β steerfox-intg-clickhouse00.evolix.net β 1 β
β steerfox-pp-clickhouse00.evolix.net β 1 β
β steerfox-prod-clickhouse00.evolix.net β 1 β
β test-clickhouse01.kameleoon.net β 1 β
β test-clickhouse02.kameleoon.net β 1 β
β test-clickhouse03.kameleoon.net β 2 β
β usa-production-clickhouse01.kameleoon.net β 2 β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄ββββ
307 rows in set. Elapsed: 4.130 sec. Processed 1.30 billion rows,
56.10 GB (314.15 million rows/s., 13.58 GB/s.)
I want nice pictures out of this dataset!
What to do?
2006, xkcd.com, CC BY-NC 2.5
β΅ I want something
like this
β΅ I want this
but bigger and better
2018, Ben Cartwright-Cox
https://blog.benjojo.co.uk/post/
scan-ping-the-internet-hilbert-curve
(echo 'P3 4096 4096 255';
clickhouse client --host ... --query "
WITH 4096 AS w, 4096 AS h, w * h AS pixels,
cutToFirstSignificantSubdomain(domain) AS tld,
sipHash64(tld) AS hash,
hash MOD 256 AS r,
hash DIV 256 MOD 256 AS g,
hash DIV 65536 MOD 256 AS b,
toUInt32(ip) AS num,
num DIV (0x100000000 DIV pixels) AS idx,
mortonDecode(2, idx) AS coord
SELECT avg(r)::UInt8, avg(g)::UInt8, avg(b)::UInt8
FROM dns_parsed GROUP BY coord
ORDER BY coord.2 * w + coord.1
WITH FILL FROM 0 TO 4096*4096
" --progress
) | pnmtopng > image.png
So what?
It is just 4K resolution.
I want a full picture of 4 gigapixel, 65536β¨―65536!
It works — we can generate 4 gigapixel picture straight away.
But the picture does not open in many viewers.
Let's represent a giant picture in the same way
as Google Maps — as a set of tiles.
And make a web page that will load the tiles on demand.
But now we have to use JavaScript...
Libraries for viewing large images:
— Leaflet: https://leafletjs.com/
— OpenSeaDragon: https://openseadragon.github.io/
0-0-0.png 1-0-0.png 1-0-1.png 1-1-0.png 1-1-1.png 2-0-0.png
2-0-1.png 2-0-2.png 2-0-3.png 2-1-0.png 2-1-1.png 2-1-2.png
2-1-3.png 2-2-0.png 2-2-1.png 2-2-2.png 2-2-3.png 2-3-0.png
2-3-1.png 2-3-2.png 2-3-3.png 3-0-0.png 3-0-1.png 3-0-2.png
3-0-3.png 3-0-4.png 3-0-5.png 3-0-6.png 3-0-7.png 3-1-0.png
3-1-1.png 3-1-2.png 3-1-3.png 3-1-4.png 3-1-5.png 3-1-6.png
3-1-7.png 3-2-0.png 3-2-1.png 3-2-2.png 3-2-3.png 3-2-4.png
3-2-5.png 3-2-6.png 3-2-7.png 3-3-0.png 3-3-1.png 3-3-2.png
3-3-3.png 3-3-4.png 3-3-5.png 3-3-6.png 3-3-7.png 3-4-0.png
3-4-1.png 3-4-2.png 3-4-3.png 3-4-4.png 3-4-5.png 3-4-6.png
3-4-7.png 3-5-0.png 3-5-1.png 3-5-2.png 3-5-3.png 3-5-4.png
...
{zoom}-{x}-{y}.png
for zoom in {0..6}
do
num_tiles=$((2**zoom))
for tile_x in $(seq 0 $(($num_tiles - 1)))
do
for tile_y in $(seq 0 $(($num_tiles - 1)))
do
filename="tiles/${zoom}-${tile_x}-${tile_y}.png"
echo $filename
[ -s "$filename" ] ||
(
echo 'P3 1024 1024 255';
clickhouse client --host driel7jwie.eu-west-1.aws.clickhouse-staging.com --secure --password "$PASSWORD" --query "
WITH 1024 AS w, 1024 AS h, w * h AS pixels,
cutToFirstSignificantSubdomain(domain) AS tld,
sipHash64(tld) AS hash,
hash MOD 256 AS r, hash DIV 256 MOD 256 AS g, hash DIV 65536 MOD 256 AS b,
toUInt32(ip) AS num,
mortonDecode(2, num) AS src_coord,
bitShiftRight(65536, ${zoom}) AS crop_size,
${tile_x} * crop_size AS left,
${tile_y} * crop_size AS top,
left + crop_size AS right,
top + crop_size AS bottom,
(src_coord.1 >= left AND src_coord.1 < right) AND (src_coord.2 >= top AND src_coord.2 < bottom) AS in_tile,
(src_coord.1 - left) DIV (crop_size DIV w) AS x,
(src_coord.2 - top) DIV (crop_size DIV h) AS y
SELECT avg(r)::UInt8, avg(g)::UInt8, avg(b)::UInt8
FROM dns_parsed
WHERE in_tile
GROUP BY x, y
ORDER BY y * w + x WITH FILL FROM 0 TO 1024*1024
" --progress
) | pnmtopng > $filename
done
done
done
https://reversedns.space/ β΅ click here
Every click generates an SQL query to ClickHouse:
const response = fetch(
`https://driel7jwie.eu-west-1.aws.clickhouse-staging.com/?user=website`,
{
method: 'POST',
body: `SELECT domain FROM dns_parsed WHERE ip = '${ip_str}' FORMAT JSON`
}).then(response => response.json().then(o => {
let host = o.rows ? o.data[0].domain : 'no response';
if (ip_str == current_requested_ip) {
popup.setContent(`<b>${ip_str}</b><br/>${host}`);
}
}));
Don't forget to set up users and quotas:
CREATE USER website IDENTIFIED WITH ...
SETTINGS
add_http_cors_header = 1 READONLY,
limit = 1 READONLY,
offset = 0 READONLY,
max_result_rows = 1 READONLY
GRANT SELECT ON default.dns_parsed TO website;
CREATE QUOTA website
KEYED BY ip_address
FOR RANDOMIZED INTERVAL 1 MINUTE MAX query_selects = 100,
FOR RANDOMIZED INTERVAL 1 HOUR MAX query_selects = 3000,
FOR RANDOMIZED INTERVAL 1 DAY MAX query_selects = 30000
TO website;
Source: Eh Bee Family, 2014, https://vine.co/v/OwTOez0W6wg
Don't be afraid — you can use ClickHouse
for everything.
You will find something new in the process.
Maybe you fill find a creative way to break something.
Source code: https://github.com/ClickHouse/reversedns.space/