GROUP BY and Memory limit exceeded

GROUP BY and Memory limit exceeded

Method 1

Simply increase max_memory_usage

Method 2

Enable external memory aggregation:

max_bytes_before_external_group_by

distributed_aggregation_memory_efficient

Method 3

Two-pass aggregation:

– in a subquery, calculate the set of keys
 that will be included in the result;

– only for them calculate all aggregate functions.

Method 3, better

Two-pass aggregation:

– in a subquery, calculate the set of keys
 that will be included in the result;

– only for them calculate all aggregate functions.

To save memory, we will calculate
a set of hashes instead of a set of strings.

Method 4

Magic options:

max_rows_to_group_by,
group_by_overflow_mode = 'any'

Method 5

Sampling

Method 6

Export data for a subset of keys.

Make multiple passes.

Method 7

Calculate data for keys
that occur frequently.

Method 7, better

Calculate data for keys
that occur frequently.

Conclusion

If a query doesn't fit within the limits,
you can still execute it.

?