30 March 2023
In a comment on my earlier post on
max_wal_size, Lukas Fittl asked a perfectly reasonable question:
Re: “The only thing it costs you is disk space; there’s no other problem with it being too large.”
Doesn’t this omit the fact that a higher
max_wal_sizeleads to longer recovery times after a crash? In my experience that was the reason why you wouldn’t want
max_wal_sizeto e.g. be 100GB, since it means your database might take a while to get back up and running after crashes.
The answer is… as you might expect, tricky.
The reason is that there are two different ways a checkpoint can be started in PostgreSQL (in regular operations, that is; there’s a few more, such as manual
CHECKPOINT commands and the start of a backup using
pg_start_backup). Those are when PostgreSQL thinks it needs to checkpoint to avoid overrunning
max_wal_size (by too much), and when
checkpoint_timeout is reached. It starts a checkpoint on the first of those that it hits.
The theory behind my recommendations on checkpoint tuning is to increase
max_wal_size to the point that you are sure that it is always
checkpoint_timeout that fires rather than
max_wal_size. That in effect caps the checkpoint interval, so larger values of
max_wal_size don’t change the checkpoint behavior once it has reached the level that
checkpoint_timeout is always the reason a checkpoint starts.
But Lukas does raise a very good point: the time it takes to recover a PostgreSQL system from a crash is proportionate to the amount of WAL that it has to replay, in bytes, and that’s soft-capped by
max_wal_size. If crash recovery speed is a concern, it might make sense to not go crazy with
max_wal_size, and cap it at a lower level.
Pragmatically, crashes are not common and checkpoints are very common, so I recommend optimizing for checkpoint performance rather than recovery time… but if your system is very sensitive to recovery time, going crazy with
max_wal_size is probably not a good idea.