All Your GUCs in a Row: deadlock_timeout

The name is a small lie. deadlock_timeout is not how long PostgreSQL tolerates a deadlock before breaking it — deadlocks are broken the instant they’re found. It’s how long a process waits on a lock before PostgreSQL bothers to look for a deadlock at all. The default is 1s, the context is superuser, and the gap between what the name implies and what the parameter does is the whole post.

Why the check is deferred

Detecting a deadlock means building a wait-for graph and searching it for a cycle, and that is not free. The vast majority of lock waits are not deadlocks — they’re a transaction briefly waiting for another to commit, which it does, releasing the lock so the waiter proceeds. Running the full detection algorithm on every lock wait would burn cycles proving, over and over, that there’s no cycle.

So PostgreSQL is optimistic. When a process starts waiting on a conflicting lock, it doesn’t check anything; it arms a timer for deadlock_timeout and sleeps. If the lock comes free before the timer fires — the common case — the process wakes, takes the lock, and the deadlock check never runs. Only if the timer fires while the process is still blocked does it run detection, on the theory that a wait this long might not be ordinary contention. The docs put the rationale plainly: the check is relatively expensive, so the server doesn’t run it every time it waits for a lock.

How the detector actually works

When the timer does fire, the waiting backend runs CheckDeadLock() and builds the wait-for graph. Each node is a process; each edge means “this process is waiting for that one.” The edges come in two flavors, and the distinction is the clever part.

A hard edge is a wait on a lock someone already holds: A wants a lock that B holds in a conflicting mode, so A genuinely cannot proceed until B is done. A soft edge is subtler — it comes from the wait queue itself. If A is queued behind B for the same lock and their requests conflict, then A waits for B not because B holds anything yet, but because B is ahead of it in line and will be woken first. Hard edges are fixed facts; soft edges are merely the current queue order.

The detector searches the combined graph for a cycle. No cycle means no deadlock — the backend goes back to sleep, and detection will run again only if some other waiter’s timer fires, which is fine, because every blocked process independently arms its own deadlock_timeout, so whichever one’s timer next fires while a cycle exists will find it. A cycle made entirely of hard edges is a true deadlock: somebody has to die. The detector picks the backend that ran the check as the victim, aborts its transaction with ERROR: deadlock detected (SQLSTATE 40P01), and the released locks let everyone else proceed.

But if the cycle includes a soft edge, PostgreSQL tries something better first: it attempts to reorder the wait queue to break the cycle without aborting anyone. Because a soft edge is only a queue-position artifact, rearranging who gets woken first can sometimes dissolve the deadlock with nobody losing their transaction. The detector only resorts to killing a victim when no reordering resolves it. This is why not every potential deadlock ends in an error — some are quietly defused by reshuffling the line, and you never know it happened.

Setting it

Leave it at 1s for normal operation. The docs note it’s about the smallest value you’d want in practice, and the tuning guidance is to keep it above your typical transaction time so that ordinary lock waits resolve before any waiter wastes effort on a detection pass. On a heavily loaded server with short transactions you might even raise it, trading slightly slower reporting of genuine deadlocks for fewer needless checks.

The one time you reach for this knob is diagnosis, and it’s worth knowing because it’s the parameter’s most useful trick: deadlock_timeout also governs log_lock_waits. When that’s on, any lock wait exceeding deadlock_timeout gets logged — so if you’re hunting lock contention that isn’t quite deadlocking, temporarily lowering deadlock_timeout to something like 100ms (with log_lock_waits = on) turns it into a tripwire that surfaces every meaningful lock wait in the log. Set it low while investigating, put it back to 1s when you’re done. That pairing — short deadlock_timeout plus log_lock_waits — is how you find the contention a normal deadlock error never shows you, because the waits that hurt your throughput are usually the ones that resolve a half-second before anyone would have called them a deadlock.

Why the check is deferred

How the detector actually works

Setting it

Related