How the Other Half Plans

Statistics are the input. Planning is what the database does with them: it takes a declarative query, which describes what you want and says nothing about how, and turns it into an executable plan, which is nothing but how. There are two jobs inside that. First, rewrite the query into a logically equivalent but more tractable shape, which is where subquery flattening, predicate pushdown, and view merging live. Second, search the space of physical plans (join orders, join algorithms, access paths) for the cheapest one the cost model can find. The second job is the hard one, because the number of possible join orders for a query grows faster than anyone wants to contemplate, and every database in this article is, underneath, a strategy for not enumerating all of them.

Two questions separate the six systems here. How does each one tame that search space? And once it has an answer, how much will it let you argue with the result? Those sound like the same question. They are not, and the most useful thing this comparison does is pull them apart. A database can search brilliantly and refuse you any override at all (Snowflake), search crudely and hand you a fistful of hints anyway (MySQL until recently), or search hard and expose every lever ever machined (Oracle). Sophistication of the search and generosity of the control surface are independent axes. Knowing where a system sits on each tells you most of what its planner feels like to live with.

The common ancestor

Nearly everyone here descends from, or defines itself against, a single 1979 paper: the System R optimizer design from Selinger and colleagues at IBM. The recipe is the one you already know: estimate the cost of operations from statistics, enumerate join orders bottom-up with dynamic programming, prune any subplan that’s dominated by a cheaper subplan covering the same relations, and restrict the search to left-deep trees to keep the explosion manageable. Cost-based, bottom-up, dynamic-programming. Forty-odd years later it is still the spine of most production optimizers.

PostgreSQL is the closest living relative, and it makes a clean reference point because it implements the textbook almost without deviation. The planner enumerates join orders with bottom-up dynamic programming in standard_join_search, costs Path nodes as it goes, keeps the cheapest path for each set of relations, and assembles a Plan tree from the winner. When the number of tables in the FROM clause exceeds geqo_threshold (default 12), exhaustive search becomes infeasible and PostgreSQL switches to GEQO, a genetic algorithm that breeds and mutates candidate join orders rather than enumerating them. Above the threshold you trade the guarantee of the cheapest plan for the guarantee of finishing. Plans for prepared statements get cached, with the runtime choosing between a re-planned custom plan and a reusable generic plan according to plan_cache_mode and a running cost comparison.

And the control surface is, deliberately, almost bare. PostgreSQL ships no query hints. None. That is a policy, not an oversight, and it is the single most argued-about decision in the project’s history. Hold that thought; it’s the right note to end on, not to open with.

Oracle: the transformation engine

Oracle Database plans in two stages, and the first stage is the one people underestimate. Before it costs a single join order, the query transformation layer rewrites your SQL into other SQL it might prefer: subquery unnesting, view merging, predicate pushing, OR expansion, join factorization, star transformation for data-warehouse schemas, group-by placement, query rewrite against materialized views. Some of these transformations are heuristic (always applied because they’re always at least as good), but the interesting ones are cost-based transformations: Oracle generates the transformed and untransformed versions, costs both, and keeps the winner. The optimizer is rewriting your query and pricing the rewrites against each other before it even gets to join enumeration.

Then the plan generator does the System R job: enumerate join orders, methods, and access paths, cost them, pick the cheapest. The full classical join menu is available, and the runtime-adaptive half of Oracle’s planning (adaptive plans, SQL plan directives) I covered with the statistics, because that machinery exists to correct cardinality estimates and that’s a statistics story as much as a planning one.

The control surface is where Oracle earns the “maximalist” label. There is a rich inline hint system (LEADING, USE_HASH, INDEX, FULL, PARALLEL, and dozens more) embedded in /*+ ... */ comments. Above hints sits SQL Plan Management: SQL plan baselines capture a known-good plan for a statement and forbid the optimizer from using a different one until the new candidate has been verified, in the background, to be genuinely faster. That is plan stability as a managed lifecycle, not a single frozen plan. SQL profiles (generated by the SQL Tuning Advisor) bolt cardinality corrections onto a statement without touching its text. Plans live in the shared cursor cache, keyed by SQL text and shared across executions, and adaptive cursor sharing lets one statement keep several plans for different bind-variable selectivity profiles, so the plan for status = 'rare' and the plan for status = 'common' can coexist. Every layer of this is a place you can intervene. Oracle’s theory is that a serious DBA will need all of them eventually.

Db2: rewrite the graph, then cost it

IBM Db2 (the LUW line again) descends from Starburst, the IBM Almaden research optimizer, and it shows in the architecture. A query is parsed into a Query Graph Model, an internal graph representation, and then two distinct phases operate on it. The query rewrite phase transforms the QGM aggressively and heuristically: correlated subqueries become joins, predicates migrate to where they prune earliest, redundant joins disappear, views merge. The cost-based optimization phase then does join enumeration and physical plan selection on the rewritten graph, with all three classical join methods on the table.

The distinctive planning knob is the optimization class, set through CURRENT QUERY OPTIMIZATION and running from 0 to 9, default 5. It is a single dial on how hard the optimizer works, governing both which rewrites are attempted and how the join space is searched; lower classes use a greedy join enumeration that plans fast, and class 9 unleashes a near-exhaustive dynamic-programming search you reserve for genuinely complex queries where the planning time pays for itself. PostgreSQL buries the analogous tradeoff in geqo_threshold; Db2 makes the whole spectrum a session setting and expects you to reach for it.

Plan reuse has a wrinkle no other system here shares: static SQL. The classic Db2 model binds SQL into a package at compile time, costs and freezes the access plan then, and executes that frozen plan until you explicitly rebind. The plan is chosen once, in advance, and stored. Dynamic SQL gets the more familiar treatment, with plans held in the package cache. And the hint mechanism is characteristically structured: rather than inline comments, Db2 uses optimization guidelines, XML documents (optimization profiles) registered against the database that steer join order, access method, and rewrite choices for matching statements. It is hinting with a schema.

MySQL: a planner being replaced mid-flight

MySQL is the one system here whose planner you can watch being rebuilt in real time, which makes it the most interesting and the hardest to describe in the present tense.

The traditional optimizer is the modest one. Join enumeration is a greedy, depth-first cost-based search restricted to left-deep trees, bounded by optimizer_search_depth (default 62) and pruned heuristically by optimizer_prune_level. The cost model is serviceable and was, for years, fairly crude. MySQL 8.0 layered on real transformations (derived-table merging, subquery-to-semijoin conversion with a set of strategies, materialization, index condition pushdown, index merge), all toggled through optimizer_switch flags, which dragged the optimizer forward without changing its left-deep, greedy core.

The replacement is the hypergraph optimizer, introduced experimentally in 8.0.23 and developed steadily since. It models the query as a hypergraph (vertices are tables, hyperedges are join predicates that may touch many tables at once) and runs DPhyp, a proper dynamic-programming join-enumeration algorithm, over it. The consequences are exactly what System R left on the table: it considers bushy trees, not just left-deep ones, and it treats hash join as a first-class citizen. Until recently it was usable only in debug builds of stock MySQL Server; as of MySQL 9.7 Community Edition (April 2026) it ships in every edition as a selectable alternative, off by default, switched on per session or per statement through optimizer_switch='hypergraph_optimizer=on'. It is already the default optimizer in MySQL HeatWave, and on a normal install today you are still running the greedy left-deep optimizer unless you turn the new one on. MySQL is migrating from a 1990s join optimizer to a modern one in front of everybody, one release at a time.

On control, MySQL is generous. There are legacy index hints (USE INDEX, FORCE INDEX, STRAIGHT_JOIN) and a newer family of optimizer hints in /*+ ... */ comments added in 8.0 (JOIN_ORDER, HASH_JOIN, BKA, SET_VAR to override an optimizer flag for a single statement, and more). What it lacks is durable plan management; the old query cache was deprecated in 5.7 and removed in 8.0, and there is no Oracle-style shared plan store. Plans are made, used, and forgotten.

SQLite: small search, frozen behaviour

SQLite runs the Next-Generation Query Planner, in place since 2013, and for a database that fits in a phone it is a more principled cost-based planner than you’d guess. The search is over join loop orderings, because every join is a nested loop and the only decisions are which table drives the outer loop and which index serves each inner one. The NGQP keeps several candidate partial plans alive at each step and costs them forward, which sounds unremarkable until you learn that the legacy planner it replaced was a pure nearest-neighbour heuristic that kept only the single cheapest choice at each step and so routinely missed an ordering that starts slightly worse and finishes much better. The upgrade was from greedy-of-one to a real bounded search.

The transformation list is long and well documented: subquery flattening, pushing WHERE terms down into subqueries and views, the OR-by-union optimization, the MIN/MAX and COUNT(*) shortcuts, omitting LEFT JOINs whose columns are never used, constant propagation, and the skip-scan. The most charming entry is the automatic index: if SQLite decides a query would benefit from an index that doesn’t exist, it will build a transient one on the spot, use it for that single query, and throw it away. A database with no DBA will index itself for the duration of a statement.

Everything compiles down to VDBE bytecode, and the planner prizes stability above cleverness; the project publishes an explicit checklist for avoiding plan regressions across upgrades, because SQLite ships inside applications that cannot tolerate a query that was fast yesterday turning slow on a library bump. The control surface matches the philosophy that you should rarely need it: INDEXED BY and NOT INDEXED clauses, a unary + to disqualify a term from index use, CROSS JOIN to pin loop order (SQLite never reorders one), and the likelihood() family to correct a selectivity guess. The documentation’s posture toward all of these is the same: a last resort, and if you need one, please report the query so the planner can be fixed instead.

DuckDB: the modern textbook, in an embedded box

DuckDB is what you get when people who read the recent literature build an optimizer from scratch with no backward compatibility to honour. The pipeline is clean: parser, then binder (resolving tables and columns against the catalog), then a logical planner, then a stack of optimizer rules, then a physical planner that picks operator implementations. The current build runs over two dozen distinct optimizer rules, and the names are a tour of the modern canon: expression rewriting and constant folding, filter pushdown (which also duplicates predicates across equivalence sets and prunes provably-empty subtrees), common-subexpression elimination, statistics propagation, join filter pushdown.

The join-order optimizer is the genuine article: DPccp, the dynamic-programming enumeration from Moerkotte and Neumann’s “Dynamic Programming Strikes Back,” with a greedy fallback when the join graph gets too large for exhaustive search. That is a stronger default than PostgreSQL’s, which tips into a genetic algorithm at twelve tables; DuckDB stays in dynamic programming longer and falls back to greedy rather than to randomness. The other standout is subquery unnesting: DuckDB implements flattening of arbitrary correlated subqueries (the dependent-join elimination technique from the Neumann–Kemper line of work, internally a “delim join”), which means correlated subqueries that other systems re-execute per outer row get rewritten into a single set-based join. For analytical queries that is frequently the difference between seconds and hours.

Control is minimal and unapologetic. There is no real hint system; you can disable specific optimizer rules for debugging, and you can pin a join order by disabling the join-order optimizer and writing the joins in sequence, but these are escape hatches, not a supported tuning surface. DuckDB’s own writing on the subject is blunt to the point of cheek: if you think you’ve found a query where you can beat the optimizer by hand, make sure you’ve also hand-optimized for every possible future state of the data, because the optimizer has. The position is that the search is good enough that overriding it is usually a mistake you haven’t noticed yet.

Snowflake: maximum search, zero knobs

Snowflake plans in the cloud services layer, architecturally separate from the compute warehouses that execute, and the separation matters: planning is a centralized, managed service, not something happening inside your compute. The optimizer is cost-based, fed by the always-fresh micro-partition metadata, and it produces a plan shaped as a DAG of operators connected by links, where the links are the data-exchange edges that move and redistribute rows across the nodes of the warehouse. Cost-based join ordering picks the sequence; the genuinely distributed decision is the distribution strategy for each join, broadcast the small side to every node or hash-partition both sides and shuffle. As I noted with the statistics, “broadcast vs. shuffle” is the cloud warehouse’s version of “nested loop vs. hash,” and Snowflake now makes that choice with Adaptive Join Decisions, switching between broadcast and hash-partitioned joins using feedback from runtime execution rather than committing on the estimate alone.

And the control surface is, on purpose, empty. Snowflake exposes no inline query hints. The /*+ BROADCAST */-style syntax that circulates in blog posts is borrowed from Spark and is not a Snowflake feature; the broadcast-or-shuffle decision belongs to the optimizer and, increasingly, to the runtime. The closest thing to a hint is declarative and lives in the schema: you can define primary-key, foreign-key, and unique constraints with the RELY property, which Snowflake does not enforce but which lets the optimizer perform join elimination and similar rewrites it otherwise couldn’t justify. Everything else that looks like tuning (clustering keys, the search optimization service) is physical design, not plan steering. Layered on top, the result cache returns the stored result of an identical query for 24 hours without planning or executing anything at all, which is the most effective query optimization of all: don’t run the query.

The trajectory from the statistics piece repeats here, but along a different axis. There, Snowflake had removed the statistics decision. Here it has removed the planning decision. You write SQL and define your schema; the optimizer owns every choice from there, and it is not interested in your opinion about join order.

Where this leaves PostgreSQL

Plot the six on the two axes and the shape is clear. On search sophistication, the modern cost-based planners cluster at the top: Oracle with its cost-based transformations, Db2 with its Starburst rewrite-then-cost split, DuckDB with DPccp and arbitrary-subquery unnesting, Snowflake with its distributed cost model, and MySQL climbing fast as the hypergraph optimizer rolls out. SQLite sits lower by design, with a real but small search over a single join operator, and it is content there. On control, the spread is enormous and uncorrelated with the first axis: Oracle and Db2 hand you hints, baselines, profiles, and guidelines; MySQL hands you hints; SQLite and DuckDB hand you a few last-resort overrides and a sermon about not using them; Snowflake hands you nothing.

PostgreSQL sits high on search and almost zero on control, and the second half of that is the deliberate, contested choice. The planner is a faithful System R implementation with genuine dynamic programming, and the project has refused, for its entire history, to add query hints to the core. The reasoning is consistent and worth stating fairly: hints ossify a plan that was correct for last year’s data and last year’s optimizer, they become a crutch that hides the cardinality-estimation bugs the project would rather fix, and a hint that helps today is a latent regression the day the data shifts. The counter-position is equally real and held by competent people: sometimes you genuinely know more than the optimizer, the cost of a wrong plan in production is not theoretical, and the extension pg_hint_plan exists and is leaned on heavily in exactly the shops that can’t wait for a planner improvement. Both things are true. The PostgreSQL core’s position is that the right number of hints to bless officially is still zero, and that the search should be made good enough that you don’t reach for them; the practitioner’s position is that “good enough” is doing a lot of work in that sentence.

What the comparison settles is that PostgreSQL’s planner is not unsophisticated, which is the accusation the missing hint system invites. It searches as seriously as anything here short of the cost-based-transformation maximalists, and it then declines, on principle, to let you override the result. That combination is rare. Most systems that search this well (Oracle, Db2) also give you every lever; the only other system here that searches hard and refuses you the controls is Snowflake, and Snowflake gets away with it by owning the entire stack down to the storage. PostgreSQL refuses you the controls while running on storage and statistics you manage yourself, which is either principled or stubborn depending on the afternoon, and is the most PostgreSQL thing about it.