The second of the three join-strategy toggles. The three algorithms were laid out in enable_hashjoin — nested loop, merge join, hash join — so here we go deeper on the middle one. Default on, context user, same family framing as enable_async_append: a diagnostic instrument, not a tuning knob.

How a merge join works

A merge join requires both of its inputs sorted on the join key, and then does something elegantly simple: it walks the two sorted streams in parallel with a cursor on each, at every step advancing whichever cursor points at the smaller value. When the two cursors land on equal values, it emits the matching rows; then it moves on. Because both inputs are sorted, it never has to look backward or hold a hash table — one linear pass over each side and the join is done. It’s the same merge step that powers merge sort, applied to two relations instead of two runs.

Two properties fall directly out of that design. Like a hash join, a merge join needs an equality condition — the “advance the smaller cursor” logic is meaningless without a defined ordering match, and the source code will flatly refuse a non-equality merge operator. And its cost is dominated entirely by one question: are the inputs already sorted? If a B-tree index already delivers each side in join-key order, the merge join skips straight to the merge and is extraordinarily cheap — cheap enough that it can beat a hash join even when the hash would have fit in memory, because it does no hashing and touches presorted data. If the inputs are not sorted, the merge join has to sort them first, and sorting two large relations is expensive enough that the planner will usually prefer a hash join instead.

That is the whole tension of merge join, and it’s worth stating plainly: merge join is what the planner reaches for when the data is already ordered — by an index, or because a sort was going to happen anyway for an ORDER BY upstream. When the order is free, merge join is often the best join there is. When the order has to be bought with a sort, it usually isn’t.

Symptoms that warrant flipping it

There are two mirror-image situations, and they point in opposite directions.

The first: a merge join that’s paying for a sort you don’t think it should. The tell is a Sort node feeding one or both sides of a Merge Join in EXPLAIN (ANALYZE), and worse, that Sort showing Sort Method: external merge Disk: — meaning the input was too big to sort in work_mem and spilled to disk. A merge join whose sorts are spilling is frequently slower than the hash join the planner passed over. SET enable_mergejoin = off for the session, re-run, and compare: if the hash join wins, the merge join’s sort cost was the problem, and the real fix is usually either more work_mem so the sort stays in memory, or — better — an index on the join key that supplies the order for free and eliminates the sort entirely, at which point the merge join might become the right choice honestly.

The second, and subtler: a case where a merge join would have been better but the planner chose something else, typically a nested loop that’s looping far too many times, or a hash join whose build side spilled. Here you’re not disabling merge join, you’re investigating why it wasn’t picked — often the answer is a missing index (so the planner saw only the expensive sort-first version of merge join and rejected it) or a row misestimate that made another plan look cheaper. Flipping other join switches off to force the merge join into view, then reading its cost, tells you what it would have cost and whether an index would make it genuinely cheaper.

One specific pathology to watch, because merge join has a failure mode the others handle differently: many-to-many joins on a duplicate-heavy key. When both sides have many rows sharing the same join value, the merge step must emit every matching pair — the Cartesian product of the two equal groups — and it also has to back up and rescan the inner group for each outer duplicate, which is where you’ll see a Materialize node buffering the inner side. If the join key has heavy duplication on both sides, the merge join’s output and rescanning can balloon; a large Materialize under a Merge Join with a duplicate-heavy key is the sign, and the fix is generally to reconsider the query or the key rather than the join method.

As with the whole family, enable_mergejoin = off is a probe, not a setting. Merge join is the correct and often optimal choice for joining presorted or indexed data, and disabling it cluster-wide forces every such join onto a hash join that has to build a table the merge join didn’t need. Diagnose with the switch, decide whether the real answer is an index, more work_mem, or better statistics, and set it back. The sort under the merge join is usually the thing to fix — not the merge join itself.