How the Other Half Counts

PostgreSQL has ANALYZE. You run it (or autovacuum runs it for you), it draws a sample of 300 × default_statistics_target rows, and it writes a row per column into pg_statistic: a null fraction, an n-distinct estimate, a most-common-values list, an equi-depth histogram, and a physical-vs-logical correlation. The planner reads those numbers, multiplies selectivities together, costs a handful of join strategies, and picks one. Three join algorithms are on the menu: nested loop, merge join, hash join.

That is the entire shape of the problem, and every cost-based optimizer ever shipped solves the same one. They differ in three places, and only three: where the numbers come from, how stale the numbers are allowed to get, and which plan shapes are even legal to choose between. The algorithms are the boring part. Everybody hash-joins. The interesting part is the bookkeeping.

So: PostgreSQL has ANALYZE. What does everyone else have? Six answers, arranged from the system with the most knobs to the system with none.

A note on the menu

Before the statistics, a word about what they’re feeding, because “hash join vs. merge join” quietly assumes the database has both. Three of the six here don’t, and one of them doesn’t have either.

A row-store with B-tree indexes and OLTP ancestry tends to carry the full classical set: nested loop for small driving sets with a good index on the inner side, merge join for two already-sorted inputs, hash join for large unsorted equijoins. Oracle and Db2 are this. A system born for analytics, scanning columns in vectorized batches, finds merge join almost useless; sorting both sides to join them is a tax you pay only when the optimizer is cornered. The vectorized hash join wins nearly every analytical equijoin, so that’s what gets built and tuned. DuckDB and Snowflake are this. MySQL is its own case: it shipped for two decades with exactly one join algorithm and only gained a second in 2019. And SQLite is the limit of the argument; it has one join algorithm, nested loop, and has never had any other. There is no hash join and no merge join to weigh it against. The planner’s entire job is to pick the order of the loops.

Keep that in mind as the statistics get more elaborate. Half the precision exists to choose between options the engine may not even have, and at least one engine here spends its statistics on a decision with only one possible operator and several possible orderings.

Oracle: every feature anyone ever asked for

Oracle Database is the maximalist. If a statistics idea has appeared in a paper since 1995, Oracle has shipped a version of it, kept the old version for compatibility, and added a preference to control which one runs.

The gatherer is DBMS_STATS, a PL/SQL package, not a SQL statement. By default you don’t call it; an AutoTask job runs during the nightly maintenance window and re-gathers anything missing or stale, where “stale” means roughly 10% of rows changed since the last gather. Oracle 19c added high-frequency automatic statistics collection, a lightweight task that revisits stale objects every 15 minutes by default, so the window job isn’t the only line of defense. For bulk operations, online statistics gathering (12c) piggybacks a stats collection onto CREATE TABLE AS SELECT and direct-path inserts; you were scanning every row anyway, so you may as well count them on the way past.

The column statistics are where Oracle shows off. Histograms come in four flavors, and the optimizer picks which to build based on the data: frequency (one bucket per distinct value, when there are few enough), top-frequency (frequency for the popular values, the long tail ignored), height-balanced (equi-depth, the legacy type), and hybrid (height-balanced buckets that also track the frequency of the value at each bucket boundary, which is the type you actually want and the reason the other two mostly stopped mattering after 12c). N-distinct is computed with a HyperLogLog sketch (APPROXIMATE_NDV_ALGORITHM, defaulting to HYPERLOGLOG since 19c) rather than by sorting and counting, which is the difference between a stats gather that finishes and one that doesn’t. PostgreSQL people will recognize the move; pg_stat_statements and friends reach for the same sketch family for the same reason.

Then there is the correlation problem, which Oracle attacks from two directions. Extended statistics let you declare a column group (DBMS_STATS.CREATE_EXTENDED_STATS on (make, model)) so the optimizer stops assuming model = 'Accord' and make = 'Honda' are independent events; this is the direct analog of PostgreSQL’s CREATE STATISTICS. Expression statistics do the same for UPPER(last_name) and similar derived values.

The genuinely different machinery is adaptive. SQL plan directives are notes the optimizer leaves itself: “last time I planned a query with this predicate shape, my cardinality estimate was garbage.” The directive outlives the individual statement and triggers dynamic sampling on the next query that matches. Adaptive plans go further and defer the decision to runtime. The optimizer ships a plan with a nested-loop join and a hash join both wired up, buffers the first rows out of the driving side through a statistics collector, and if the actual row count crosses a precomputed inflection point, it switches to the hash join mid-execution. The estimate was wrong; the plan corrects itself anyway.

If that sounds like it could destabilize a plan you’d spent months getting right, Oracle agrees with you. OPTIMIZER_ADAPTIVE_STATISTICS defaults to FALSE, because the adaptive-statistics features as originally shipped in 12c caused enough plan churn that Oracle turned the aggressive parts off by default and told everyone to leave them off. The maximalist learned the hard way that more statistics is not the same as better plans. Worth remembering before you reach for the equivalent knobs anywhere else.

One footnote, because the marketing obscures it: real-time statistics, the 19c feature that gathers stats during ordinary DML instead of waiting for the next gather, is restricted to Engineered Systems. Exadata only. On a normal Oracle install it does not run.

Db2: the optimizer that makes up numbers

IBM Db2 (the LUW line, the one a PostgreSQL person would compare against) has the explicit gatherer you’d expect, RUNSTATS, invoked per table:

1 RUNSTATS ON TABLE db2inst1.sales WITH DISTRIBUTION AND DETAILED INDEXES ALL

WITH DISTRIBUTION is the part that matters. Without it you get cardinalities and index stats; with it you get distribution statistics, which are frequency values for the common entries plus quantiles for the spread. Same two-pronged idea as PostgreSQL’s MCV-list-plus-histogram split, different vocabulary.

Automatic collection is two separate mechanisms, and the distinction is the interesting bit. AUTO_RUNSTATS is the asynchronous one: a background process notices a table has drifted and schedules a RUNSTATS later, the way autovacuum schedules an autoanalyze. AUTO_STMT_STATS is the synchronous one, and it has no PostgreSQL equivalent. With real-time statistics enabled (it’s on by default for new databases), the optimizer can collect statistics at the moment it compiles your query. You submit SQL against a table whose stats are stale; rather than plan blind, Db2 pauses, samples the table right then, and plans on fresh numbers. There is a time budget so this doesn’t turn every first-run query into a maintenance job, but the mechanism is real and it fires constantly. The monitoring counters say so out loud: a busy database will report tens of thousands of “Synchronous runstats” events.

And when even a quick synchronous sample is too expensive, Db2 will fabricate statistics. It reads cheap metadata it already has (the number of pages the table occupies, the row width) and manufactures a cardinality estimate from that alone. The “Statistic fabrications” counter sits right next to the runstats counters in the snapshot. This is a database that would rather invent a number than plan with no number, and the design instinct behind it is sound: a fabricated estimate from page count beats the catalog’s belief that the table still has the 1,000 rows it had at creation.

Correlation gets the same two answers Oracle gives. Column group statistics (RUNSTATS ON COLUMNS ((c1, c2))) capture combined cardinality for correlated columns. Statistical views are the more unusual tool: you define a view over a join or a filtered subset, register it, run RUNSTATS against it, and the optimizer uses the view’s statistics to estimate the cardinality of other queries whose shape resembles the view, including cross-table relationships a single-table stat can’t express. It’s a way of teaching the optimizer about a correlation that lives in a join rather than in a table.

The join menu is the full classical three: NLJOIN, MSJOIN (merge scan join), and HSJOIN. And Db2 hands you an explicit dial on how hard to look for the best combination of them. CURRENT QUERY OPTIMIZATION runs from 0 to 9; the default of 5 uses a greedy join enumeration that’s good enough for most workloads, and 9 unleashes a near-exhaustive search you reserve for genuinely complex queries where the planning time is worth it. PostgreSQL has the analogous tradeoff buried in geqo_threshold, the point at which it abandons exhaustive search for a genetic algorithm, but Db2 makes the whole spectrum a session setting you’re expected to reach for.

MySQL: the minimalist, dragged forward

MySQL is the instructive opposite of Oracle, and the comparison flatters PostgreSQL more than anything else in this list.

Start with the join menu, because it explains everything else. For most of its history MySQL had exactly one join algorithm: nested loop, with a block-nested-loop variant to cut the inner-table rescans. No merge join, ever. No hash join until 8.0.18 in 2019, which then took over the block-nested-loop role outright by 8.0.20. So the question this whole article is built around, “hash join vs. merge join,” is one MySQL literally could not ask until recently and still can’t fully ask today; there is no merge join to weigh the hash join against. The optimizer’s job has always been narrower: pick an access method per table, pick a join order, and (now) decide whether a join with no usable index runs as a hash join. optimizer_search_depth governs how exhaustively it explores the order, defaulting to 62.

The statistics split in two, along the index boundary. InnoDB persistent statistics handle indexed access. They’ve been persistent and on by default since 5.6.6 (before that, MySQL re-estimated index cardinality from a handful of random dives on every server restart, which is exactly as unstable as it sounds). The engine samples index pages (innodb_stats_persistent_sample_pages, default 20), stores per-index cardinality in mysql.innodb_table_stats and mysql.innodb_index_stats, and re-gathers automatically when about 10% of the table has changed (innodb_stats_auto_recalc). Twenty pages is not many. The estimates are coarse, and on a large table they can be coarse enough to flip a plan.

Column histograms are the bolted-on second half, added in 8.0 and showing it. You build them by hand:

1 ANALYZE TABLE sales UPDATE HISTOGRAM ON region, channel WITH 32 BUCKETS;

You get a choice of singleton (one bucket per value) or equi-height buckets, up to 1,024 of them, stored in the data dictionary and exposed through INFORMATION_SCHEMA.COLUMN_STATISTICS. Two catches, and they’re both significant. First, histograms are not maintained automatically; you re-run ANALYZE TABLE yourself, on your own schedule, forever. Second, the optimizer largely ignores a histogram on a column that has an index, preferring the index’s own statistics, which means histograms are mostly a tool for the non-indexed columns where you’d otherwise have nothing but a guess. They fill a hole rather than upgrade the floor.

There is no synchronous fabrication, no plan directive, no adaptive runtime switch. MySQL gathers what it gathers, on a schedule you mostly drive by hand, and plans once. For the OLTP point-lookup workloads it grew up serving, that’s frequently fine. Hand it a six-way analytical join and the thinness of the statistics, and the narrowness of the menu they feed, becomes the whole story.

SQLite: ANALYZE, by hand

SQLite has an ANALYZE statement, the same keyword PostgreSQL uses. That is the most misleading similarity in this entire article. The keyword matches; almost nothing behind it does.

ANALYZE populates a table called sqlite_stat1, and the contents are about as minimal as a statistic can be. For each index, SQLite stores a short string of integers: the estimated number of rows in the table, followed by the average number of rows that share a value for each left-most prefix of the index. An index whose stat string ends in a small number is selective; one that ends in a large number is not. That’s the whole model. If an equality on column x averages 10 matching rows and an equality on y averages 3, the planner prefers the index on y, and it knows that from two integers. There is no most-common-values list and, in the default build, no histogram. Selectivity is a single averaged number per index prefix, and the planner assumes values are uniformly distributed within that average.

You can get something closer to a distribution, but you have to compile for it. With SQLITE_ENABLE_STAT4, ANALYZE also fills sqlite_stat4 with sampled index-key entries and their cumulative row counts, which lets the planner reason about skew and range constraints (WHERE price BETWEEN 100 AND 200) instead of assuming uniformity. It’s the rough analog of a histogram, built from index-key samples rather than per-column buckets. Many builds ship without it. The older sqlite_stat3 is legacy and no longer written.

Here is the part with no analog anywhere else on this list: nothing runs ANALYZE for you. There is no autoanalyze, no background task, no row-change threshold. SQLite gathers statistics exactly when you tell it to and never otherwise. The nearest thing to automation is PRAGMA optimize, which inspects the database and runs ANALYZE on tables it judges would benefit, and which since version 3.46.0 (2024-05-23) automatically bounds the scan so it stays fast on a large database. But PRAGMA optimize is a command you call, not a daemon that wakes up; the documented practice is to run it before closing a short-lived connection, or on open plus periodically for a long-lived one. Forget to, and the database plans on whatever statistics it last had, which may be none.

That last possibility produces the most SQLite thing in the whole comparison. A freshly created database has no rows, so ANALYZE on first run gathers nothing useful, which is a real problem when the database is an application’s file format and every install starts empty. The documented workaround is to build a representative database during development, run ANALYZE on it, and then ship the resulting sqlite_stat1 rows as literal INSERT statements baked into the application, so that a brand-new database in the field plans the way the prototype did in the lab. You hand-carry the statistics into the binary. No other database here would even phrase the problem that way.

The planner consuming all this is the Next-Generation Query Planner, which since 2013 has done a proper cost-based search over join orderings, replacing a legacy greedy heuristic that only ever kept the single cheapest partial plan at each step and so missed orderings that start worse and end better. But notice what it is searching: orderings, not operators. Every join is a nested loop, so the only decisions are which table drives the outer loop and which index (or scan) serves each inner one. When you do need to override it, the levers match that model exactly: a CROSS JOIN forces the loop order (SQLite never reorders one), and a unary + on a column disqualifies an index for that term. The planner finds good orderings on its own as long as no index has more than ten or twenty rows sharing a left-most value; past that, skew defeats the uniformity assumption, and running ANALYZE is the documented fix.

There is one concession to analytical workloads, added in 3.38.0 (2022): the Bloom filter optimization. For a large join, SQLite can build a Bloom filter on one table and consult it to skip inner-loop iterations that cannot possibly find a match, which materially speeds up the star-schema-shaped queries that would otherwise punish a pure nested loop. It is not a new join algorithm. The join is still a nested loop; the Bloom filter just lets it skip cheaply, and it shows up as such in EXPLAIN QUERY PLAN. The single-operator model holds; it just got a pruning trick.

None of this is a shortfall. SQLite is the genre’s minimum, executed cleanly: one join algorithm, statistics you gather by hand or compile into the application, and a planner candid about needing ANALYZE when the data is skewed. For a database that runs inside a phone, a browser, and a few billion other places where no DBA will ever log in, designing out the background machinery is the correct decision, not a missing row on a feature matrix.

DuckDB: statistics as a side effect of columns

DuckDB also has an ANALYZE statement, and having just watched SQLite’s turn out to mean so little, you might expect the same here. It means something different again. DuckDB’s documentation is blunt that the statistics ANALYZE recomputes are used only for join-order optimization, and the rest of the statistics story is automatic and structural in a way ANALYZE has nothing to do with.

Because DuckDB is columnar and writes data in row groups, it gets the cheap statistics for free as a property of the storage format. Every row group (and every Parquet row group it reads from disk, since it treats external Parquet as a first-class table) carries min/max bounds per column, null counts, and approximate distinct counts. These are zone maps, the same idea as Snowflake’s micro-partition metadata and PostgreSQL’s BRIN indexes: don’t describe the distribution in detail, just record enough per chunk to skip the chunk entirely when a predicate can’t match it. There’s no equi-depth histogram with a configurable bucket count here, because the analytical workload DuckDB targets cares far more about “can I skip this 100MB block” than about estimating the selectivity of WHERE status = 'shipped' to three significant figures.

What the optimizer does with those numbers is the better story. DuckDB runs a genuine cost-based join-order optimizer (dynamic programming over the join graph, with a greedy fallback when the table count gets large enough that exhaustive search is infeasible), and its goal is the one that actually dominates analytical query cost: minimize the size of the intermediate results flowing between joins. Put the large table in the wrong place in the join tree and you materialize billions of intermediate rows; put it right and the same query runs in seconds. That’s the decision the statistics exist to inform, far more than the choice of physical operator, because the physical operator is almost always the vectorized hash join.

The clever piece is statistics propagation. While optimizing, DuckDB walks the join graph and uses the min/max statistics to invent new filters. If it knows t1.a ranges over [25, 50] and the query joins t1.a = t2.a, it synthesizes t2.a >= 25 AND t2.a <= 50 and pushes it down into the scan of t2, pruning row groups of t2 that can’t possibly find a join partner. PostgreSQL does a limited version of this with equivalence classes and merge-join range skipping, but DuckDB leans on it hard because the zone-map statistics make the derived filter immediately actionable at the storage layer. The same lightweight min/max metadata that prunes scans also generates the predicates that drive more pruning.

The thing to internalize: in DuckDB, ANALYZE is a minor tuning command for one optimizer phase, not the foundation of planning. The foundation is written into the file format every time you insert a row.

Snowflake: no ANALYZE, because there’s nothing to analyze

Snowflake is where the question dissolves. There is no ANALYZE. There is no RUNSTATS, no DBMS_STATS, no histogram you can build or refresh, no statistics-sampling percentage to tune. Not because Snowflake plans without statistics, but because gathering them was designed out of existence.

Snowflake stores every table as a set of micro-partitions: immutable, compressed, columnar files of 50 to 500 MB of uncompressed data each, created automatically as data lands and never modified in place (an update writes new micro-partitions and retires the old ones, which is also where Time Travel and zero-copy cloning come from). When a micro-partition is written, Snowflake records its metadata as part of the write: the min and max of every column, the distinct-value count, the null count, and a handful of other properties, plus table-level aggregates above that. Statistics aren’t gathered on a schedule because they’re a byproduct of the only operation that can change the data. They are never stale, for the same reason a photograph is never out of date with respect to the moment it was taken; the storage is the statistics.

This reframes the plan-shape question for the distributed world Snowflake lives in. The micro-partition metadata’s first job is pruning: given WHERE order_date = '2026-01-15', the optimizer consults the min/max ranges and reads only the micro-partitions whose range could contain that date, skipping the rest without touching them. This is the dominant performance lever, far more than join-algorithm selection, which is why so much Snowflake tuning advice is really about clustering, that is, about keeping the data physically arranged so the min/max ranges stay narrow and pruning stays effective. The optimizer also uses the cardinality metadata to make the decisions that matter when a join runs across many compute nodes: which input is the build side and which is the probe side, and whether a join is small enough to broadcast one side to every node or large enough to require shuffling both sides by hash. “Broadcast vs. shuffle” is the cloud warehouse’s version of “nested loop vs. hash,” and it’s decided from the same per-partition counts.

Everything else is automatic and serverless. Automatic clustering re-sorts micro-partitions in the background to fight the natural decay of pruning effectiveness as DML scatters values across partitions. The search optimization service builds an auxiliary access path for selective point lookups, the closest thing Snowflake offers to a secondary index, and you pay for it as a serverless feature rather than declaring it as schema. The recently shipped Optima goes further still, watching query history for hot predicates and autonomously generating extra per-partition metadata to prune them harder, with no configuration and no statement for you to run.

The trajectory of the whole list ends here. Oracle gives you every statistics feature ever invented and a dozen knobs to manage them. Snowflake gives you zero, on the theory that the right number of statistics decisions for a user to make is none.

Where this leaves PostgreSQL

The algorithms turned out not to be the variable. Most of these systems hash-join their way through the heavy work; merge join is the rare bird, carried only by the two with full OLTP heritage (Oracle, Db2) and quietly useless to the analytical pair. MySQL waited until 2019 for a hash join and never had a merge join at all, and SQLite has only ever had the nested loop. The systems with the richest statistics spend that richness not on picking the operator, which is frequently a foregone conclusion, but on getting the cardinality estimate right enough that the pick is obvious.

The actual axis is freshness and who owns it. Snowflake made statistics intrinsic to storage and removed the decision. DuckDB made them a property of the columnar format and reads them off the row groups. Db2 will gather them synchronously while planning your query, and fabricate them from page counts when it must. Oracle will gather them every fifteen minutes, leave itself notes when it gets an estimate wrong, and rewire the plan at runtime if it still does. At the far other end sit the two that hand the job back to you: MySQL keeps its index statistics current on its own but leaves histograms entirely to your schedule, and SQLite refreshes nothing whatsoever without an explicit ANALYZE, to the point that the recommended fallback is to compile the statistics into the application.

PostgreSQL sits in the conservative middle of that range, and the position is a set of deliberate choices worth naming. It samples rather than scanning, so a gather stays cheap. It refreshes on a row-change threshold through autoanalyze rather than synchronously, which already makes it more self-maintaining than the systems that wait for a human to remember, but less aggressive than the ones that gather mid-query; planning never blocks on statistics collection, and the price is that a first-run query against a freshly-grown table plans on stale numbers, with no fabrication safety net underneath. It assumes column independence unless you explicitly build extended statistics with CREATE STATISTICS, where Oracle and Db2 will, with the right settings, notice the correlation and propose the multi-column stat for you. And it plans once, with no runtime adaptation; the plan you cost is the plan you run.

None of that is a deficiency. It’s a different point on the same tradeoff every system here is making, between the cost of keeping statistics fresh and the cost of planning with stale ones. Knowing where the others landed tells you exactly which of PostgreSQL’s defaults to reach for first when a plan goes wrong: the stale-statistics gap is a manual ANALYZE or a tighter autovacuum_analyze_scale_factor, and the independence assumption is a CREATE STATISTICS away. PostgreSQL won’t gather the multi-column stat or fabricate the row count on its own. It will, however, do precisely what you tell it to, which over a long enough horizon is the feature that ages best.