All Your GUCs in a Row: enable_hashjoin

The first of the three join-strategy toggles, so before we get to the parameter, a paragraph on the thing it sits inside: PostgreSQL has exactly three ways to join two tables, and for every join in every query the planner picks one of them. The three enable_* join switches — this one, enable_mergejoin, and enable_nestloop — let you take one option off the table and watch what the planner reaches for instead. Same family rule as always (enable_async_append): diagnostic instruments, not tuning knobs. Default on, context user.

The three joins, briefly

A nested loop scans the outer relation and, for each row, looks up matching rows in the inner relation — cheap when the outer side is tiny or the inner side has an index on the join key, brutal when both sides are large and unindexed, because it’s O(outer × inner). A merge join requires both inputs sorted on the join key, then walks the two sorted streams in lockstep, advancing whichever is behind — excellent when both sides are already sorted (an index produces the order for free), but it needs that sort, and sorting two large unsorted inputs is expensive. A hash join builds a hash table from one input and probes it with the other; it doesn’t care about input order at all, which makes it the workhorse for joining large, unsorted tables on an equality condition.

Hash join works in two phases. The build phase scans the smaller input — the planner deliberately picks the smaller one to minimize memory — and constructs a hash table keyed on the join columns. The probe phase streams the larger input through, hashing each row’s join key and looking it up to find matches. Because both inputs are scanned sequentially, an index on the join condition doesn’t help a hash join at all; this is why you’ll often see the planner choose a sequential scan beneath a hash join even when indexes exist, and that’s correct, not a bug. Hash join needs only an equality join condition (no hashing a <) and enough memory to hold the build side’s hash table.

That memory is the catch, and it’s the same budget as hash aggregation: work_mem × hash_mem_multiplier (the multiplier raised from 1.0 to 2.0 in PostgreSQL 15). If the build side’s hash table fits, the join runs in a single pass. If it doesn’t, PostgreSQL partitions both inputs into batches by join key and processes one batch pair at a time, spilling the rest to temporary files — which works, but the I/O hurts, and the planner will often prefer a merge join once the hash won’t fit in memory.

What disabling it actually does

Worth knowing for all three join switches: enable_hashjoin = off doesn’t truly forbid hash joins. It adds a disable_cost — a hard-coded enormous number, 1e10 — to every hash join path, so the planner will still use one if it has literally no other way to execute the join. For an equality join it’ll switch to a merge or nested loop instead, but for a join the other methods can’t handle, you’ll see a hash join in your plan despite the switch being off. The switches discourage; they don’t strictly prohibit.

Symptoms that warrant flipping it

The tell is in the Hash Join node of EXPLAIN (ANALYZE): Batches: greater than 1, meaning the build side’s hash table didn’t fit in work_mem × hash_mem_multiplier and the join spilled to disk in multiple partitions. A spilling hash join is frequently a large chunk of a slow query’s runtime, and it’s the signal that sends you here.

When you see it, SET enable_hashjoin = off for the session and re-run EXPLAIN (ANALYZE). The planner falls back to a merge join (or a nested loop), and you get the comparison: if the alternative is faster, you’ve confirmed the spilling hash join was the problem, and you choose the real fix. Most often that’s memory — raise work_mem (per-role or per-session, never blindly cluster-wide, since it’s a per-operator budget multiplied by your concurrency), or raise hash_mem_multiplier to give hash operations specifically more headroom. Sometimes it’s an index that makes a merge join genuinely cheaper by supplying pre-sorted input, or makes a nested loop into an indexed inner the better choice for a small probe side. And often the deeper cause is upstream: a hash join (or any join-method surprise) frequently traces back to a row-count misestimate — the planner built a plan for the wrong cardinality — so the join is where you spend the time but ANALYZE or better statistics is where you fix it. The classic version is the planner underestimating a row count, choosing a hash join sized for far fewer rows than actually arrive, and spilling hard.

Leaving enable_hashjoin = off in postgresql.conf is the usual family mistake, and a bad one: hash join is the correct, indispensable strategy for large unsorted equality joins, and disabling it cluster-wide pushes every such join onto a merge join’s sort or a nested loop’s repeated scans. Diagnose with the switch, fix the real constraint — work_mem, hash_mem_multiplier, an index, or statistics — and set it back. The probe tells you the hash join hurt; it doesn’t tell you to live without hash joins.

The three joins, briefly

What disabling it actually does

Symptoms that warrant flipping it

Related