01:14
A little more on max_wal_size
30 March 2023
In a comment on my earlier post on max_wal_size
, Lukas Fittl asked a perfectly reasonable question:
Re: “The only thing it costs you is disk space; there’s no other problem with it being too large.”
Doesn’t this omit the fact that a higher
max_wal_size
leads to longer recovery times after a crash? In my experience that was the reason why you wouldn’t wantmax_wal_size
to e.g. be 100GB, since it means your database might take a while to get back up and running after crashes.
The answer is… as you might expect, tricky.
The reason is that there are two different ways a checkpoint can be started in PostgreSQL (in regular operations, that is; there’s a few more, such as manual CHECKPOINT
commands and the start of a backup using pg_start_backup
). Those are when PostgreSQL thinks it needs to checkpoint to avoid overrunning max_wal_size
(by too much), and when checkpoint_timeout
is reached. It starts a checkpoint on the first of those that it hits.
The theory behind my recommendations on checkpoint tuning is to increase max_wal_size
to the point that you are sure that it is always checkpoint_timeout
that fires rather than max_wal_size
. That in effect caps the checkpoint interval, so larger values of max_wal_size
don’t change the checkpoint behavior once it has reached the level that checkpoint_timeout
is always the reason a checkpoint starts.
But Lukas does raise a very good point: the time it takes to recover a PostgreSQL system from a crash is proportionate to the amount of WAL that it has to replay, in bytes, and that’s soft-capped by max_wal_size
. If crash recovery speed is a concern, it might make sense to not go crazy with max_wal_size
, and cap it at a lower level.
Pragmatically, crashes are not common and checkpoints are very common, so I recommend optimizing for checkpoint performance rather than recovery time… but if your system is very sensitive to recovery time, going crazy with max_wal_size
is probably not a good idea.
There is one comment.
sanar at 09:17, 3 April 2023:
Is there any chance that checkpoint is not running even after timeout.. i see this case many time.
max_wal_size=8GB
checkpoint_timeout=5min
checkpoint_completion_target=0.9