All Your GUCs in a Row: enable_hashagg

The most consequential toggle in the enable_* family, because the thing it controls changed behavior in PostgreSQL 13 in a way that made some queries slower on upgrade — and enable_hashagg became, for a while, the lever people reached for to get the old behavior back. Default on, context user, same family framing as enable_async_append: a diagnostic instrument, not a tuning knob. But this one has a history worth telling, because the history is why anyone touches it.

Two ways to aggregate

A GROUP BY (or any aggregation that produces groups) has two main execution strategies. Hash aggregation builds a hash table keyed by the grouping columns: each input row probes the table and either starts a new group or updates an existing group’s running state. It’s fast, and it doesn’t care what order the input arrives in — but it holds every group in memory at once, because any future row might belong to any existing group. Group aggregation (shown as GroupAggregate in EXPLAIN) takes the opposite approach: it requires the input sorted on the grouping columns, and then walks through it accumulating one group at a time, emitting each group as soon as the next sort key appears. It only ever holds one group’s state, but it needs sorted input — which usually means paying for a Sort, unless an index already provides the order.

The planner’s choice between them turns largely on how many groups there are relative to memory. Few distinct groups: the hash table fits comfortably, and HashAggregate wins by avoiding the sort. Many distinct groups: the hash table is enormous, and Sort → GroupAggregate becomes competitive or better, because group aggregation’s memory footprint is one group regardless of how many there are. The memory budget for the hash table is work_mem × hash_mem_multiplier, the latter raised from 1.0 to 2.0 in PostgreSQL 15 precisely because hash operations are memory-hungry and harder to estimate than sorts.

The PostgreSQL 13 change, which is the whole story

Before PostgreSQL 13, hash aggregation had a sharp edge: it did not spill to disk. If the planner chose HashAggregate and the hash table turned out bigger than work_mem — because the row estimate was wrong, say — the executor didn’t fall back gracefully. It just kept allocating, blowing through work_mem without limit, occasionally until the OOM killer arrived. Group aggregation always spilled its sort to disk and stayed bounded; hash aggregation didn’t, and a misestimated HashAggregate was a genuine way to take a server down.

PostgreSQL 13 fixed that, in Jeff Davis’s disk-based hash aggregation work: when a HashAggregate exceeds its memory budget, it now enters “spill mode,” writing the overflow groups to temporary tapes in partitions and processing them in later passes, exactly the way a hash join batches. The hash table is finally memory-bounded, and the OOM footgun is gone. An unambiguous improvement — except for the upgrade surprise it created, which the developers anticipated and debated at length before shipping.

Two regressions could bite a query on the move to 13. First, a HashAggregate chosen on an underestimated number of groups, which on PostgreSQL 12 would have quietly overshot work_mem and finished fast, now spills to disk and runs much slower — the same plan, the same bad estimate, but the new graceful-spill behavior turns a memory overshoot into real I/O. Second, because the planner in 13 knows hash aggregation can spill, it considers HashAggregate in cases where 12 would have rejected it for estimated-oversize, so a query that used to get Sort → GroupAggregate might now get a spilling HashAggregate that feeds a worse plan upstream — a Hash Join where you used to get a Merge Join, say. Either way the symptom is the same: a GROUP BY query that regressed specifically across the 12-to-13 boundary, with HashAggregate now showing spill activity.

Symptoms that warrant flipping it

The tell lives in the aggregate node of EXPLAIN (ANALYZE). A HashAggregate with Batches: greater than 1 and a nonzero Disk Usage: is a hash table that spilled — it didn’t fit in work_mem × hash_mem_multiplier and went to disk, and that spill is frequently a large fraction of the query’s runtime. That is the signal that sends you to this parameter.

When you see it, SET enable_hashagg = off for the session and re-run EXPLAIN (ANALYZE). The planner falls back to Sort → GroupAggregate, and you get a direct comparison: if the sort-and-group plan is faster, you’ve confirmed the diagnosis — too many groups for the current memory budget — and now you choose the real fix. Usually that’s more memory: raise work_mem (per-role or per-session, never blindly cluster-wide, because it’s allocated per node per connection and a global raise multiplies by your concurrency), or raise hash_mem_multiplier to give hash operations specifically more room without inflating sort memory. Sometimes it’s an index: a B-tree on the grouping columns provides sorted input for free, making GroupAggregate cheap and sidestepping the hash entirely. And sometimes it’s statistics — if the group-count estimate is badly low, ANALYZE or a higher statistics target lets the planner avoid the bad hash plan on its own.

Leaving enable_hashagg = off in postgresql.conf is the family’s recurring error in an especially costly form here: hash aggregation is the right plan for the common low-cardinality GROUP BY, and disabling it cluster-wide forces every aggregation through a sort it usually doesn’t need. Diagnose with the switch, fix the actual constraint — work_mem, hash_mem_multiplier, an index, or statistics — and set it back. If you upgraded to 13-or-later and a specific report regressed, the answer is almost never to disable hash aggregation forever; it’s to give the spilling query the memory the new bounded executor is honestly telling you it needs.

Two ways to aggregate

The PostgreSQL 13 change, which is the whole story

Symptoms that warrant flipping it

Related