Author: Maksim Kita, 2021-05-11.
1. Introduction to hash tables.
2. Main design questions.
3. Benchmarks.
4. C++ hash table design.
GROUP BY
JOIN
SELECT DISTINCT
Main methods
1. Hash function.
2. Collision resolution method.
3. Resize.
4. Cell memory layout.
1. Don't use identity function for integer types.
2. Don't use hash functions for strings (CityHash) for integer types.
3. Don't use cryptographic hash functions if you're not under attack. For example, SipHash calculation ~980 MB/s. CityHash ~9 GB/s.
4. Don't use outdated hash functions. FNV1a.
By default ClickHouse uses bad hash functions.
1. CRC32-C for integer types. One processor instruction (actually two) latency 3 cycles.
2. Special hash function for strings. Can use CityHash, xxHash, wyhash as standard.
1. Chaining.
2. Open Addressing.
3. Good in theory (Cuckoo hashing, Hopscotch hashing, 2-choice hashing). Usually either hard to implement, or slow due to additional memory fetches.
Example: std::unordered_map
1. Pointer stability for key, value.
2. Ability to store large objects, non-movable objects.
3. Works well with bad hash function, high load factor.
4. Very slow. Stresses allocator (even just calling function is expensive for hot path).
Linear probing. Example: ClickHouse HashMap.
Quadratic probing. Example: Google DenseHashMap.
1. Good cache locality.
2. Need to carefully choose hash function.
3. Can't store large objects. Serialize to arena and store pointers to them.
1. By powers of two. Fast modulo division.
size_t place = hash & (size - 1)
2. To a prime number close to power of two. Slow division even with constant switch, libdivide but there's also fastrange.
0.5 is a good choice for linear probing with step 1.
ClickHouse HashMap, Google DenseHashMap uses 0.5.
Abseil HashMap uses 0.875.
Ask client to choose keys for empty and deleted values.
Handle empty value separately and don't store it in hash table.
Compressed storage of metadata and data.
Test hash tables on random integer values.
Test hash tables without accounting for maximum load factor, memory consumption.
Test and not show benchmark code.
On real scenarios and real data. In ClickHouse the real scenario is data aggregation.
Yandex.Metrica dataset.
wget https://datasets.clickhouse.com/hits/partitions/hits_100m_obfuscated_v1.tar.xz
WatchID almost all values are unique. Hash table size 20714865 elements. This is ~600 MB, doesn't fit in LL caches.
| ClickHouse HashMap: | 7.366 sec. |
| Google DenseMap: | 10.089 sec. |
| Abseil HashMap: | 9.011 sec. |
| std::unordered_map: | 44.758 sec. |
Deinitialization of std::unordered_map took longer than benchmarks of other tables.
perf stat -e cache-misses:u ./integer_hash_tables_and_hashes
| ClickHouse HashMap: | 329,664,616 |
| Google DenseMap: | 383,350,820 |
| Abseil HashMap: | 415,869,669 |
| std::unordered_map: | 1,939,811,017 |
Latency Comparison Numbers ---------------------------------- L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns 14x L1 cache Mutex lock/unlock 25 ns Main memory reference 100 ns 20x L2 cache, 200x L1 cache Compress 1K bytes with Zippy 3,000 ns 3 us Send 1K bytes over 1 Gbps network 10,000 ns 10 us Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD Read 1 MB sequentially from memory 250,000 ns 250 us Round trip within same datacenter 500,000 ns 500 us
RegionID frequently repeating values. Hash table size 9040 elements. Fits in LL caches.
| ClickHouse HashMap: | 0.201 sec. |
| Google DenseMap: | 0.261 sec. |
| Abseil HashMap: | 0.307 sec. |
| std::unordered_map: | 0.466 sec. |
1. Hash
2. Allocator
3. Cell
4. Grower (resize interface)
5. HashTable
Hash
Same as std::hash.
template <typename T>
struct Hash
{
size_t operator() (T key) const
{
return DefaultHash<T>(key);
}
};
Allocator
Uses mmap, mremap for large chunks of memory.
class Allocator
{
void * alloc(size_t size, size_t alignment);
void free(void * buf, size_t size);
void * realloc(void * buf, size_t old_size, size_t new_size);
};
There is an allocator that initially allocates memory on stack:
AllocatorWithStackMemory<HashTableAllocator, initial_bytes>
Cell
template <typename Key, typename Mapped, typename HashTableState>
struct HashTableCell
{
...
void setHash(size_t hash_value);
size_t getHash(const Hash & hash) const;
bool isZero(const State & state);
void setZero();
...
};
Grower
struct HashTableGrower
{
size_t place(size_t x) const;
size_t next(size_t pos) const;
bool willNextElementOverflow() const;
void increaseSize();
};
HashTable
template
<
typename Key,
typename Cell,
typename Hash,
typename Grower,
typename Allocator
>
class HashTable :
protected Hash,
protected Allocator,
protected Cell::State;
protected ZeroValueStorage<Cell::need_zero_value_storage, Cell>
ZeroValueStorage
template <bool need_zero_value_storage, typename Cell>
struct ZeroValueStorage;
template <typename Cell>
struct ZeroValueStorage<true, Cell>
{
...
};
template <typename Cell>
struct ZeroValueStorage<false, Cell>
{
...
};
Ability to pass custom Grower.
1. Pass Grower with fixed size, without resize and collision resolution chains we get a Lookup table.
2. Pass Grower with collision resolution step not 1.
Ability to store state in cell.
Save hash. Used for string hash tables.
struct HashMapCellWithSavedHash : public HashMapCell
{
size_t saved_hash;
void setHash(size_t hash_value) { saved_hash = hash_value; }
size_t getHash(const Hash &) const { return saved_hash; }
};
Fast-clearable hash table.
struct FixedClearableHashMapCell
{
struct ClearableHashSetState
{
UInt32 version = 1;
};
using State = ClearableHashSetState;
UInt32 version = 1;
bool isZero(const State & st) const { return version != st.version; }
void setZero() { version = 0; }
};
LRUCache.
struct LRUHashMapCell
{
static bool constexpr need_to_notify_cell_during_move = true;
static void move(LRUHashMapCell * old_loc, LRUHashMapCell * new_loc);
LRUHashMapCell * next = nullptr;
LRUHashMapCell * prev = nullptr;
};
LRUCache. Use boost::intrusive::list.
using LRUList = boost::intrusive::list
<
Cell,
boost::intrusive::value_traits<LRUHashMapCellIntrusiveValueTraits>,
boost::intrusive::constant_time_size<false>
>;
LRUList lru_list;
https://www.boost.org/doc/libs/1_76_0/doc/html/intrusive.html
SmallTable
Consists of an array for some small number of elements. Fits in L1 cache.
template <typename Key, typename Cell, size_t capacity>
class SmallTable : protected Cell::State
{
size_t m_size = 0;
Cell buf[capacity];
...
}
StringHashTable
Consists of 4 hash tables:
1. For strings of size 0-8 bytes.
2. For strings of size 9-16 bytes.
3. For strings of size 17-24 bytes.
4. For strings larger than 24 bytes.
TwoLevelHashTable
Consists of 256 hash tables:
On key insert we calculate the hash table index where we will insert the key.
size_t getBucketFromHash(size_t hash_value)
{
return (hash_value >> (32 - BITS_FOR_BUCKET)) & MAX_BUCKET;
}
We wrote a framework for hash tables for our data aggregation scenario.
https://github.com/ClickHouse/ClickHouse/blob/master/src/Common/HashTable/HashTable.h https://github.com/ClickHouse/ClickHouse/blob/master/src/Common/examples/integer_hash_tables_benchmark.cpp