All Your GUCs in a Row: effective_cache_size

This is one of the most consistently misunderstood parameters PostgreSQL has, and the misunderstanding is always the same shape: people believe effective_cache_size does something to memory. It allocates a cache. It reserves RAM. It controls how much PostgreSQL keeps in memory. It does none of these. It allocates nothing, reserves nothing, and changes no runtime behavior whatsoever. It is a single number whispered to the query planner, and its only effect is to change which plans the planner thinks are cheap. The default is 4GB, the context is user, and it deserves more than the usual word count because getting it wrong is so easy and so quietly costly.

What it is not

Start by clearing the wreckage. The docs are unusually blunt for once: this parameter “has no effect on the size of shared memory allocated by PostgreSQL, nor does it reserve kernel disk cache; it is used only for estimation purposes.” Read that twice. You can set effective_cache_size to 1TB on a machine with 16GB of RAM and nothing will crash, nothing will be allocated, no memory pressure will result — you will simply have lied to the planner, and it will make worse decisions as a consequence. Conversely, setting it to 64MB doesn’t shrink any cache; it just tells the planner to assume almost nothing stays cached, which makes it time its index scans as though every page fetch hits cold disk.

It is also not shared_buffers. shared_buffers genuinely allocates memory — a fixed region grabbed at startup, exclusively PostgreSQL’s. effective_cache_size is an estimate of total cache available to PostgreSQL data, which on a normal Linux host is shared_buffers plus the operating system’s page cache, since the kernel caches the same data files PostgreSQL reads. The two parameters are not alternatives, not redundant, and not the same units of thing: one buys memory, the other describes the world.

What it actually does

effective_cache_size feeds exactly one piece of machinery: the planner’s estimate of how many physical page reads an index scan will require. The relevant code in costsize.c implements the Mackert-Lohman formula — from a 1989 paper, “Index Scans Using a Finite LRU Buffer: A Validated I/O Model” — which answers a specific question: if a scan needs to touch some number of pages out of a relation, and there’s a buffer of a given size available, how many of those touches will be actual disk fetches versus cache hits? A comment in the source states the assumption directly: effective_cache_size is taken as the total buffer pages available for the whole query — “we include kernel space here” — pro-rated across the tables and index under consideration.

The consequence flows from there. A large effective_cache_size tells the formula that repeated access to the same pages will mostly hit cache, so an index scan’s true I/O cost is low — which makes the planner more willing to choose index scans, and especially nested-loop joins with an inner index scan, where the same index and heap pages get hit over and over. A small value tells the formula that those repeated touches will miss and pay full random-read cost, so the planner shies away from index-driven nested loops and leans toward sequential scans, bitmap heap scans, and hash or merge joins that read each page once. Same data, same query, same hardware — change only this number, and the chosen plan can flip entirely. That is the whole behavior: it is a thumb on the scale between “assume things are cached” and “assume they are not,” applied to index-scan costing.

Worth noting what it deliberately does not model: the planner does not assume data survives in cache between queries, only within the costing of a single one. The cross-query caching that actually happens on a busy server — your hot tables genuinely sitting in RAM all day — is real but is not what this parameter represents. It’s a within-query I/O estimate that happens to be tuned by your sense of how much memory exists.

Setting it

Because it costs nothing and reserves nothing, you set it generously: an estimate of all the RAM that could plausibly cache PostgreSQL’s data. The standard rule of thumb is 50–75% of total system memory when PostgreSQL is the machine’s primary occupant, and that’s a fine starting point — it captures shared_buffers plus the page cache the kernel will devote to your files. It should always be larger than shared_buffers; a value smaller than your actual buffer pool is incoherent on its face.

Err on the high side rather than the low. The classic failure mode in the field is an effective_cache_size left at a too-conservative value on a server with plenty of RAM: the planner, believing the cache is small, systematically overprices index scans and picks sequential scans and hash joins for queries that would have been far faster as index nested loops against the data that is, in reality, sitting in memory the whole time. The fix costs a SET and a re-plan. The too-high direction has a milder failure — the planner over-trusts the cache and may pick a nested loop that turns into real I/O if the assumption is wrong — but on a server genuinely dedicated to PostgreSQL with RAM to spare, generous is correct, and the 50–75% guidance lands you there.

One practical note: it’s user context, so you can SET effective_cache_size for a single session and re-run EXPLAIN to watch plans shift, which is the fastest way to see what this parameter does rather than take anyone’s word for it. Set it low, plan a join, set it high, plan again, and the thumb on the scale becomes visible. For the cluster, set it once in postgresql.conf to your 50–75%, confirm it exceeds shared_buffers, and leave it — there is nothing here to keep tuning, only a one-time estimate to get into the right order of magnitude so the planner stops mispricing your index scans.

What it is not

What it actually does

Setting it

Related