All Your GUCs in a Row: checkpoint_timeout and checkpoint_completion_target

The C cluster opens with the first two checkpoint parameters. We take them out of alphabetical order because checkpoint_completion_target is defined as a fraction of checkpoint_timeout and is unintelligible without it. The alphabet can wait one post.

A short tour of checkpoints

A checkpoint is the moment when PostgreSQL guarantees that every dirty page in shared buffers has been written to disk and fsync()‘d, after which the WAL preceding the checkpoint is no longer needed for crash recovery. The checkpointer process performs them on a regular cadence governed by these parameters. The shorter the cadence, the smaller the recovery window after a crash, and the more frequently you do the expensive work of flushing dirty pages.

Two triggers fire a checkpoint:

Time-based, controlled by checkpoint_timeout.
WAL-volume-based, controlled by max_wal_size (which gets its own post in the M cluster).

Whichever happens first wins. Most well-tuned systems hit checkpoint_timeout regularly and max_wal_size only during write bursts; a system that hits max_wal_size constantly is undersized.

`checkpoint_timeout`

Maximum time between automatic checkpoints. Default is 5min. Range is 30 seconds to 1 day. Context is postmaster-no-wait — settable via postgresql.conf or command line, reloaded by sighup, no restart required.

The 5-minute default is too aggressive for almost every production workload. It was reasonable when servers had 4GB of RAM and a pair of spinning disks; on a contemporary server with hundreds of gigabytes of shared_buffers, checkpoints every 5 minutes means constantly flushing pages that would have been re-dirtied between flushes anyway. Pure I/O waste.

The argument for raising it:

Less write amplification. Pages dirtied repeatedly between checkpoints get written once instead of many times. On hot-counter workloads (sequence tables, session stores, queue tables) this is a substantial win.
Fewer full-page writes. PostgreSQL writes the entire 8KB page to WAL the first time a page is modified after a checkpoint, as a torn-write protection. Less-frequent checkpoints means fewer first-modifications-after-checkpoint, which means smaller WAL volume — sometimes dramatically so.

The argument against:

Longer crash recovery. More WAL between checkpoints means more WAL to replay on startup after a crash. A 30-minute checkpoint interval can produce minutes of recovery time on a busy system.
Diminishing returns. Going from 5 to 10 minutes is a clear win. 10 to 20 still helps. 20 to 40 helps less. By the time you’re at an hour, you’ve extracted most of the benefit and added most of the recovery-time cost.

A reasonable target for most production systems: 15min. Write-heavy systems where recovery time isn’t critical: 30min or 60min. Systems where recovery time is paramount and writes are modest: stay at 5min. Set log_checkpoints = on (which it is by default on PG 15+) and watch the log entries to see whether your checkpoints are time-triggered (good) or WAL-triggered (raise max_wal_size).

`checkpoint_completion_target`

Fraction of checkpoint_timeout over which to spread the checkpoint’s I/O. Default is 0.9. Range is 0.0 to 1.0. Context same as checkpoint_timeout.

The mechanic: rather than dumping all dirty pages to disk in one burst at checkpoint start, the checkpointer paces the writes to finish at completion_target × checkpoint_timeout into the interval. With the defaults (5min × 0.9 = 4.5min), the checkpointer writes for 4.5 minutes, leaving 30 seconds of buffer before the next checkpoint is scheduled. The pacing smooths the I/O load and prevents the periodic “everything stops every 5 minutes while we drain the buffer pool” pattern that uncontrolled checkpoints produce.

The default changed from 0.5 to 0.9 in PostgreSQL 14. Before that change, the recommendation in every tuning guide for fifteen years was “set this to 0.9.” The defaults caught up.

A few notes:

Don’t lower it. The docs are unambiguous: lowering this parameter makes checkpoint I/O more bursty, which is exactly what spreading it out is meant to prevent. There is essentially no good reason to set it below 0.9.
Don’t raise it to 1.0. The remaining 10% of the interval is buffer time for the checkpoint’s other work — fsync() calls, file truncation, status updates. Setting completion_target to 1.0 means a slow fsync() runs into the next scheduled checkpoint, and the docs warn this produces “unexpected variation in the number of WAL segments needed.” The 0.9 default is the right number.
The kernel still has to actually write the pages. The checkpointer’s pacing is about issuing the writes, not about when the kernel commits them to disk. The *_flush_after parameters from the B cluster, particularly checkpoint_flush_after, govern how aggressively the kernel is told to start writing back. The two mechanisms are complementary.

Tuning together

The two parameters interact: checkpoint_completion_target is the fraction and checkpoint_timeout is what it’s a fraction of. Raising checkpoint_timeout to 30 minutes with completion_target at 0.9 means the checkpointer spreads its writes over 27 minutes. That’s the right behavior — same smoothness, longer interval, less write amplification.

The combination most production systems should target:

1 checkpoint_timeout = 15min
2 checkpoint_completion_target = 0.9   # already the default on PG 14+
3 max_wal_size = 8GB                   # see future post; raise from default 1GB
4 log_checkpoints = on                 # already default on PG 15+

This gives you 15-minute checkpoint intervals, smooth I/O across 13.5 of those minutes, plenty of WAL headroom to avoid forced checkpoints, and visibility into what the checkpointer is actually doing.

Recommendation: Raise checkpoint_timeout to at least 15min on any production system. Leave checkpoint_completion_target at 0.9 and, if you’re on PG 13 or earlier, set it explicitly because the older default is 0.5 and wrong. Watch pg_stat_bgwriter (or pg_stat_checkpointer on PG 17+) for the ratio of checkpoints_timed to checkpoints_req; if you’re seeing more requested than scheduled checkpoints, raise max_wal_size rather than touching these two.

1	checkpoint_timeout = 15min
2	checkpoint_completion_target = 0.9 # already the default on PG 14+
3	max_wal_size = 8GB # see future post; raise from default 1GB
4	log_checkpoints = on # already default on PG 15+

A short tour of checkpoints

checkpoint_timeout

checkpoint_completion_target

Tuning together

Related

`checkpoint_timeout`

`checkpoint_completion_target`