All Your GUCs in a Row: commit_delay and commit_siblings

commit_delay is the one knob that tells PostgreSQL’s group commit to wait for a bigger group. The database already batches WAL flushes on its own: when one backend is flushing the WAL, others that become ready to commit queue up behind it and ride along on the same fsync. commit_delay makes the backend leading that flush pause for a set number of microseconds before it starts, on the theory that a few more transactions will arrive during the pause and join the batch. commit_siblings is the gate that decides whether the pause is worth taking at all.

commit_delay defaults to 0 (no delay, the feature off), is measured in microseconds, and tops out at 100000, which is a tenth of a second; if you are deliberately adding 100ms to every WAL flush, something has gone wrong well upstream of this parameter. Its context is superuser. commit_siblings defaults to 5, ranges from 0 to 1000, and has context user. Neither needs a restart.

The mechanic, in the post-9.3 group-commit design, is that the leader backend sleeps for commit_delay microseconds after taking the WAL flush lock while followers queue behind it, and then a single fsync durably commits the whole group. The leader won’t sleep if fsync is off, and it won’t sleep unless at least commit_siblings other sessions are in active transactions when the flush is about to begin. That gate is the entire point of commit_siblings: a delay is pure latency cost if nobody else is around to join, so the default of 5 declines to wait during quiet periods. (Anything written about these two from before PostgreSQL 9.3 describes an older, much weaker implementation. Ignore it.)

Here is where most people get it wrong. They assume commit_delay only helps on slow storage. It is most obviously useful there, but the documentation is explicit that it can pay off even on fast SSDs and battery-backed write caches, provided you tune it against a representative workload. The trade is always the same: you amortize the fixed cost of an fsync across more transactions, at the price of adding up to commit_delay of latency to each flush. Whether that’s a win depends entirely on how expensive your flushes are relative to your latency budget, which is a thing you measure rather than guess.

The official tuning method is the part actually worth knowing. Run pg_test_fsync, which reports the average time a single WAL flush takes on your storage, and start commit_delay at roughly half that number. The relationship to commit_siblings then runs in a direction people find backwards: on fast media, raise the sibling threshold, because a cheap flush is only worth delaying when there’s heavy concurrency to amortize across; on high-latency media, lower it, because an expensive flush is worth batching even at modest concurrency. Push commit_delay too high and you widen latency enough that throughput falls, which is the failure mode that earned this parameter its reputation as fiddly and easy to get wrong.

Before any of that, confirm WAL syncing is actually your bottleneck. Through PostgreSQL 17 the signal lives in pg_stat_wal: wal_sync counts how many times WAL was flushed, and wal_sync_time (when track_wal_io_timing is on) tells you how long that took. In PostgreSQL 18 those numbers moved into pg_stat_io; look at the wal object’s fsyncs and fsync_time instead. If sync time is a meaningful slice of your commit latency and you have genuine commit concurrency, this is worth a measured experiment. If it isn’t, you’re tuning the wrong thing.

For nearly everyone that experiment ends with both parameters back at their defaults, and that’s the correct outcome. Automatic group commit already collects the easy throughput without costing you any latency, and if you’re hunting for commit-cost savings the far larger lever is synchronous_commit, which trades durability rather than latency. commit_delay is a specialized complement to that, for a flush-bound, high-commit-rate workload on storage you have actually benchmarked; commit_siblings only matters once you’ve reached for it. Leave them at 0 and 5 until you have a number from pg_test_fsync and a reason.

Related