AIO Grows Up

PostgreSQL 18 shipped asynchronous I/O. PostgreSQL 19, currently in feature freeze and headed for a September release, makes it tolerable to operate.

That sounds like a snide reading. It is not. The AIO subsystem in PG18 was a serious piece of engineering, and on the workloads it covers — sequential scans, bitmap heap scans, and VACUUM — it does what it says. The problem was never the I/O path. The problem was the operator surface. PG18 gave you io_method (sync, worker, io_uring) and, for the worker mode, a single static GUC: io_workers, defaulting to 3, capped at 32. You picked a number at boot and lived with it. PG19 replaces that single dial with a self-managed worker pool, and that is the change worth understanding.

What PG18 actually did

A short refresher, because the PG18 AIO design constrains everything PG19 changed.

In PG18, a backend that wants to read a block no longer issues a synchronous pread() against the kernel and then blocks until the data arrives. It submits a read request, keeps doing other work where possible, and collects the result later. The submission and completion machinery is implemented behind io_method, which selects from three backends:

sync — the old behavior, kept for compatibility and for environments where AIO is unsafe or unavailable. No async at all; this is “AIO off.”
worker — the portable backend. A pool of background processes performs blocking I/O on behalf of the regular backends. Submitter and completer communicate over shared memory. Available on every platform PostgreSQL builds on.
io_uring — the Linux-native backend. Uses the kernel’s io_uring interface for genuinely asynchronous I/O with no extra processes in the path. Faster, lower CPU per operation, only available on a sufficiently recent Linux kernel and only when the kernel hasn’t disabled io_uring for security reasons (which several distros and most managed Kubernetes runtimes have).

The choice between worker and io_uring is not academic. io_uring wins on throughput and on CPU per I/O. worker wins on availability — it works on macOS, on FreeBSD, in containers where io_uring is policy-blocked, and on RHEL kernels old enough that io_uring is either absent or known-buggy. If you operate Postgres on anything that isn’t a current Linux kernel under your direct control, worker is your AIO backend whether you wanted it to be or not.

And that brings us to PG18’s operator problem: io_workers. You set the size of the pool at startup. The default of 3 was conservative; the practical ceiling of 32 was generous; the right number for your workload was a guess. Too few workers and you bottleneck on submission. Too many and you pay for idle backends in shared memory and process slots. The number that was right for your nightly VACUUM was not the number that was right for an analytics report. You tuned it once and accepted the cost.

What PG19 changed

PG19 deletes the static dial and replaces it with four GUCs:

io_min_workers — the floor. Workers below this count are kept alive even when idle.
io_max_workers — the ceiling. The pool will not exceed this regardless of demand.
io_worker_idle_timeout — how long an idle worker hangs around before exiting.
io_worker_launch_interval — minimum time between successive worker launches.

The pool now scales between io_min_workers and io_max_workers based on the queue depth of pending I/O submissions. Idle workers retire after io_worker_idle_timeout. Bursts spawn new workers, but no faster than io_worker_launch_interval permits.

The four-GUC shape of this is deliberate. A pure on-demand pool with no floor adds latency to the first I/O of a session — the worker has to be spawned. A pool that scales without rate-limiting can thrash on short bursts, spawning workers that exit before they do useful work. The launch interval is the obvious knob; the idle timeout is the less-obvious one, and it’s the one you’ll probably want to leave alone.

What this is not

A few things this change is not, because the marketing version of PG19 will inevitably get them wrong.

It is not async writes. PostgreSQL’s write path remains synchronous in the sense that matters: WAL is written and flushed before commit, and shared-buffer writes go through the bgwriter and checkpointer the way they always have. AIO covers reads and a few maintenance write paths. If your workload is bottlenecked on commit latency, none of this helps you — your bottleneck is fsync, and fsync is not negotiable.

It is not a replacement for effective_io_concurrency or maintenance_io_concurrency. Those GUCs control prefetch depth — how many outstanding I/Os a single scan or VACUUM will have in flight. AIO controls the mechanism by which those I/Os execute. They compose. A high effective_io_concurrency with io_method = sync produces nothing; the prefetcher requests blocks and the kernel happily ignores it. A high effective_io_concurrency with io_method = worker and a tight io_max_workers ceiling produces queueing inside the pool. You want both knobs aligned to your storage’s actual queue depth.

It is not a reason to skip io_uring if you have it. The PG19 worker pool is still a worker pool. It still pays context-switch and IPC costs that io_uring doesn’t. If you’re on a kernel that supports io_uring and your security posture allows it, io_uring is still the right answer.

How to tune it

In a sentence: leave it alone first, measure, then adjust.

The defaults shipping with PG19 are reasonable for the median workload. The reason to touch them is one of three signals.

You see I/O queueing under load. If pg_stat_io shows a backlog growing during your peak hours and your storage isn’t saturated, raise io_max_workers. The ceiling exists to keep a misbehaving query from spawning a thousand worker processes; on a host with the cores and RAM to support more, raising it is straightforward.

You see worker-spawn churn. If you observe workers being created and exiting in tight loops — visible in process listings or in PG19’s worker-pool stats — raise io_min_workers or extend io_worker_idle_timeout. The cost of a permanent floor of two or three idle workers is trivial compared to the cost of constantly recreating them.

Your workload is bursty in a structured way. If you have a predictable nightly batch window where I/O concurrency demand is ten times the daytime baseline, you do not need to engineer a clever scaling strategy. Set io_max_workers high enough to cover the burst, set io_min_workers low enough to not waste resources during the day, and trust the pool. The whole point of dynamic scaling is to make this a non-decision.

What you should not do is set io_min_workers and io_max_workers to the same value to “make it deterministic.” That recreates the PG18 static-pool behavior, which was the problem.

What to expect on what

The workloads that visibly benefit from AIO in PG18 will continue to benefit in PG19; the change is in operator overhead, not in I/O throughput. If you want to know what to expect, the rough hierarchy is:

VACUUM is the biggest beneficiary. It is the canonical example of a process that reads enormous numbers of pages in a predictable order and writes back a smaller number — exactly the access pattern AIO is built for. Bloated tables on slow storage see meaningful real-world improvement; the kind of VACUUM that used to take eight hours runs in five.

Sequential scans on tables larger than shared_buffers benefit substantially. If your analytics queries spend their time in Seq Scan on cold data, AIO is doing real work for you.

Bitmap heap scans benefit, particularly on queries where the bitmap is sparse and the heap fetches are scattered. Less spectacular than the sequential-scan case, but real.

Index scans on small selective queries benefit very little. There is not much I/O to overlap. If your workload is OLTP point lookups against indexes that fit in shared_buffers, AIO is not your problem and not your solution.

Hot OLTP write workloads benefit not at all. As noted: writes remain synchronous. If you wanted async writes you wanted a different database, which is a perfectly fine thing to want, but PostgreSQL is not it.

The deeper change

Worth saying out loud: the PG18 → PG19 AIO transition is the kind of subsystem maturation that doesn’t generate headlines. It is the version after the version that was announced. The headline feature ships, then over the next cycle the people who actually run it discover what’s awkward, and the fixes go in. PG18’s io_workers was the awkward part. PG19’s worker pool is the fix.

This is how good database features evolve. The other way is how bad ones do.