All Your GUCs in a Row: enable_indexonlyscan

The third way to use an index, after the plain index scan and bitmap scan of enable_indexscan and enable_bitmapscan — and the one with the most-misunderstood catch, because an index-only scan can be physically possible and still end up reading the heap on nearly every row. Why that happens is the whole post. Default on, context user, same family framing as enable_async_append: a diagnostic instrument, not a tuning knob.

What “index-only” promises, and the asterisk on it

A normal index scan does two reads per row: find the entry in the index, then fetch the row from the heap to get the columns the index doesn’t carry. An index-only scan skips the second read — if every column the query needs is present in the index, the answer can be assembled from the index alone, with no heap access at all. On a wide table where you select a few narrow columns, that can turn millions of random heap fetches into zero, and queries have been observed to run anywhere from twice to twenty times faster as a result. You make it possible by building a covering index, either with the columns in the key or, since PostgreSQL 11, with extra payload columns in an INCLUDE clause.

Here is the asterisk, and it catches almost everyone the first time. PostgreSQL’s MVCC means a row’s visibility — whether your transaction’s snapshot should see it — is stored in the heap, not the index. So even when an index carries every column you asked for, PostgreSQL still has to confirm each row is visible, and that information lives in exactly the place an index-only scan is trying to avoid. If it had to visit the heap to check visibility on every row, the “index-only” scan would be no faster than a plain index scan, and the feature would be pointless.

The visibility map is the whole game

What rescues it is the visibility map: a tiny per-table bitmap with one bit per heap page, set when every row on that page is old enough to be visible to all current and future transactions. When an index-only scan finds a candidate entry, it checks that page’s visibility-map bit. If the bit is set, the page is all-visible, the row is known visible without looking, and no heap access happens — the scan stays genuinely index-only. If the bit is not set, the scan must visit the heap page to check visibility, and for that row it has bought nothing over a plain index scan. The docs put the consequence plainly: an index-only scan “will be a win only if a significant fraction of the table’s heap pages have their all-visible map bits set.”

The visibility map is set by VACUUM. This is the hinge the whole feature turns on, and where it quietly fails in production. The visibility map only becomes accurate as vacuum marks pages all-visible; a page that has been modified since the last vacuum has its bit cleared, and every index-only scan touching it falls back to a heap fetch. So on a large, actively-changing table where autovacuum can’t keep up, you get a plan that proudly says Index Only Scan while doing heavy heap I/O underneath, because most of the pages it touches aren’t marked all-visible. There’s a well-documented case of exactly this: a roughly 200-million-row table whose index-only scan started silently consuming heavy I/O again weeks after it was optimized, because under default autovacuum thresholds the table was being vacuumed only about once every two months, and the visibility map had gone stale. The index was perfect. The visibility map was the problem.

Symptoms that warrant flipping it

The diagnostic signal here is unusually precise, and it’s not really a reason to flip the GUC so much as a number to read: EXPLAIN (ANALYZE) reports a Heap Fetches: count on every Index Only Scan. A low or zero Heap Fetches: is a healthy index-only scan doing what it promised. A high Heap Fetches: on a scan you expected to be cheap is the tell that your visibility map is stale — the scan is index-only in name while paying heap-access costs in reality, and the more your table changes relative to how often it’s vacuumed, the worse it gets.

When you see that, the fix is almost never this parameter. It’s vacuum: run VACUUM on the table to refresh the visibility map immediately, and then make autovacuum keep it fresh — lower autovacuum_vacuum_scale_factor (and the insert-driven threshold) for that specific table so a big, busy relation gets vacuumed far more often than the cluster defaults allow. Where flipping the GUC earns its keep is the opposite diagnosis: SET enable_indexonlyscan = off for the session, re-run EXPLAIN (ANALYZE), and the planner falls back to a plain index scan or bitmap scan, letting you measure whether the index-only scan with its heap fetches is actually beating the alternatives or whether the planner over-estimated how all-visible your table is. That comparison tells you whether to chase the visibility map or accept a different plan.

Two design notes worth carrying away. First, there’s little point padding an index with INCLUDE payload columns unless the table is static enough that index-only scans will usually skip the heap — if you’re visiting the heap anyway, the payload columns saved nothing and just bloat the index and slow writes. Second, covering indexes can suppress HOT updates, which makes the table churn its heap more and its pages go all-visible less often, feeding exactly the staleness that defeats the scan; weigh n_tup_hot_upd before adding one. As ever, leaving enable_indexonlyscan = off in postgresql.conf is the family mistake — it discards a feature that can be a twenty-fold speedup to avoid a problem that vacuuming fixes. Diagnose with the Heap Fetches: number, fix the vacuum cadence, and set it back.

What “index-only” promises, and the asterisk on it

The visibility map is the whole game

Symptoms that warrant flipping it

Related