waxsql: Wax Fruit for Your Query Planner

waxsql generates SQL the way a bowl of wax fruit fills a centerpiece: it looks real, it nourishes nothing, and it will never spoil. Every query it produces is type-correct against a real PostgreSQL schema and computationally pointless, which is exactly what you want when you are fuzzing a query rewriter, beating on a parser, or generating a reproducible workload that has no business returning useful rows. I built it for the times you need a great deal of valid SQL and none of it has to mean anything.

1 from waxsql import generate_query, generate_schema, print_query
2 
3 schema = generate_schema(seed=42, complexity=8)
4 query  = generate_query(seed=42, schema=schema, complexity=8)
5 
6 print(schema.emit_ddl())   # CREATE TABLE / ALTER TABLE / CREATE INDEX
7 print(print_query(query))  # type-correct SELECT against that schema

(No Python required, either. The bundled CLI does the same: waxsql gen --seed 42 -c 8 | psql pipes a fresh schema and query straight into a database.)

Type-correct, not just grammar-correct

The interesting word is valid. Plenty of tools emit SQL the grammar accepts; that bar is low, and the result is usually nonsense the moment PostgreSQL’s type resolver looks at it. waxsql generates every expression with a target type, asking an internal catalog “what produces type T?” and building from the answer. It never emits int + text, because no operator satisfies that request. SQL that the grammar accepts but the type system rejects (WHERE current_timestamp + 'hello', and its thousand cousins) is not merely avoided; it is structurally impossible for the generator to produce. The output clears parse-analysis, not just parsing. That puts it in the lineage of SQLsmith rather than the yacc-driven fuzzer.

One dial, ten notches

A single complexity argument, 0 through 10, decides how much of the language is in play. At 0 you get SELECT col FROM t. The notches climb through joins, aggregates, subqueries, CTEs, and window functions, topping out at WITH RECURSIVE, ROLLUP, CUBE, and GROUPING SETS. The same dial scales the generated schema, from a couple of tables up to a dozen, with self-referential and, at the high end, cyclic foreign keys.

Same seed, same bytes

Determinism is the property I care about most. The same (seed, complexity) pair produces byte-identical SQL across runs and across Python versions. That is what makes a fuzzer usable in practice: when seed 91128 breaks your tool, you file the seed, not a multi-kilobyte blob, and it reproduces on anyone’s machine. It also makes golden-output testing tractable.

Schema and query generation run on independent RNG streams, so the natural idiom is to pin one schema seed and sweep the query seed across a few thousand values. The result is a stream of unrelated, valid queries against one stable target: the right shape for soak-testing a planner, a query-rewriting layer, or a connection-pooling proxy.

It checks its own work

“Generates valid SQL” is a claim, so waxsql ships the means to check it: three validators, each strictly stronger than the last. SYNTAX runs the SQL through pglast (the bundled libpg_query) in microseconds, no database required. PARSE runs PREPARE against a live cluster, catching the name- and type-resolution errors syntax-checking cannot see. PLAN runs EXPLAIN, exercising the planner on top. The test suite holds every SQL-emitting path to these checks across many seeds, so the claim is enforced rather than asserted.

It is pure Python with zero runtime dependencies; the PostgreSQL-facing pieces are optional extras, and it wants Python 3.10 or newer. pip install waxsql, and you have realistic, meaningless SQL on tap.

1	from waxsql import generate_query, generate_schema, print_query
2
3	schema = generate_schema(seed=42, complexity=8)
4	query = generate_query(seed=42, schema=schema, complexity=8)
5
6	print(schema.emit_ddl()) # CREATE TABLE / ALTER TABLE / CREATE INDEX
7	print(print_query(query)) # type-correct SELECT against that schema

Type-correct, not just grammar-correct

One dial, ten notches

Same seed, same bytes

It checks its own work

Related