postgresql when it's not your job

08:16

VACUUM (INDEX_CLEANUP OFF) Considered Harmful

28 January 2025

PostgreSQL version 12 introduced a new option on the VACUUM command, INDEX_CLEANUP. You should (almost) never use it.

First, a quick review of how vacuuming works on PostgreSQL. The primary task of vacuuming is to find dead tuples (tuples that still exist on disk but can’t ever be visible to any transaction anymore), and reclaim them as free space. At a high level, vacuuming proceeds as:

Most of the time in vacuuming is spend removing the dead tuples from the indexes. It has to do this first, because otherwise, you would have tuple references in the indexes that are now invalid… or, worse, look like they are valid but point to wrong tuples!

INDEX_CLEANUP OFF reduces the amount of cleanup that vacuuming can do. It does release the space that dead tuples take up, but leaves the line pointers in place (it has to, so that the indexes are still pointing at the right location).

(The documentation does explain this, but uses terminology that can be unfamiliar to a lot of PostgreSQL users.)

Why would you ever use it, then? The other thing that vacuuming does is mark any tuples which can always be visible to any transaction (until modified or deleted) as “frozen.” This is an important operation to prevent data corruption due to PostgreSQL’s 32-bit transaction ID (you can get more information about freezing here).

Sometimes, for one reason or another, PostgreSQL has been unable to completely “freeze” a table, and the database can get dangerously close to the point that the wraparound data corruption can occur. Since vacuuming the indexes takes most of the time, sometimes, you want to tell PostgreSQL to skip that step, and just vacuum the heap so that the wraparound danger has passed. Thus, the INDEX_CLEANUP option.

But you still have to do a regular VACUUM on the table.

It’s not! And the worst possible situation is to do an INDEX_CLEANUP OFF vacuum on cron or something.The indexes will end up very badly bloated, which will hit query performance.

So, unless you are sure you know what you are doing (that is, you are in a wraparound-point emergency), please pretend this option doesn’t exist. Really.

Comment