These two are inseparable. They combine in a single formula that decides when autovacuum runs ANALYZE against a table, and discussing one without the other gives you half a picture. So: a double-header.
The formula:
1 analyze threshold = autovacuum_analyze_threshold
2 + autovacuum_analyze_scale_factor × reltuples
When the number of tuples inserted, updated, or deleted since the last ANALYZE exceeds this value, autovacuum schedules an ANALYZE on the table. Defaults: autovacuum_analyze_threshold = 50, autovacuum_analyze_scale_factor = 0.1. Both have context sighup; both can be overridden per-table via storage parameters, which is where the real action is.
The threshold is the additive floor. It mostly exists to prevent constant re-analysis of tiny tables — a 5-row lookup table doesn’t need an ANALYZE every time you change a row. The scale factor is the proportional component. With the default 0.1, a table needs roughly 10% of its rows to change before ANALYZE triggers.
For tables of a few thousand rows, the defaults are correct. For larger tables, they are catastrophically wrong, and this is the whole reason the post exists.
Walk the math:
| Table size | Changes needed before ANALYZE |
|---|---|
| 1,000 rows | 150 |
| 100,000 rows | 10,050 |
| 10,000,000 rows | ~1,000,000 |
| 1,000,000,000 rows | ~100,000,000 |
A billion-row table waits for a hundred million row changes before its statistics get refreshed. In the meantime, the planner is making decisions based on data that may be weeks out of date. Queries that should use an index switch to sequential scans; nested loops that should be hash joins persist; estimated row counts diverge from reality by orders of magnitude. The slow-query escalation that arrives in your inbox at 9am Tuesday is, on inspection, “we have a billion rows and statistics from last quarter.”
The fix is not to lower the global scale factor — that would over-ANALYZE your small tables. The fix is per-table:
1 ALTER TABLE big_events SET (
2 autovacuum_analyze_scale_factor = 0.01, -- 1%
3 autovacuum_analyze_threshold = 10000 -- floor
4 );
For a 100M-row table, that triggers ANALYZE after roughly a million changes instead of ten million. For really large tables, push the scale factor lower still — 0.005 or 0.001 — and lean on the threshold to prevent constant re-analysis when the table is briefly idle. There is no harm in ANALYZE running more often; it samples a small fraction of the table and is cheap relative to what stale statistics cost you.
Recommendation: Leave the globals alone. Identify your largest tables — anything over a million rows is a candidate, anything over ten million is mandatory — and set per-table autovacuum_analyze_scale_factor to a smaller value, with autovacuum_analyze_threshold raised to compensate. This is the single highest-leverage piece of autovacuum tuning you will ever do, and almost nobody does it.