coddpiece: Watch Relational Algebra Become SQL

coddpiece is a relational-algebra teaching library for Python. You build an algebra expression by method-chaining; it compiles that expression to real SQL and runs it on any DB-API 2.0 connection, including the sqlite3 module that already ships with Python. The thesis is simple: learn the algebra first, and most of SQL’s apparent complexity turns out to be surface syntax over a small, closed set of operations. It’s aimed at people who have written SQL for years and never studied the theory underneath it.

The name is for Edgar F. Codd, who introduced the relational model in 1970. The other word is spelled differently, and I’ll assume that’s not what brought you here.

The whole point is the side-by-side

Every expression can explain itself. .explain() renders four things at once: the algebra notation, the expression tree, the compiled SQL, and a plain-English reading.

1 >>> print(s.select(s.city == "London").project("sname").explain())
2 Algebra:
3   π(sname)(σ(city="London")(s))
4 
5 Tree:
6   Project(sname)
7   └─ Selection(city="London")
8      └─ s
9 
10 SQL:
11   SELECT DISTINCT "sname"
12   FROM "s"
13   WHERE "city" = ?
14   -- params: ['London']
15 
16 Reading:
17   Keep only rows where city="London". Keep only columns sname.

That is mildly interesting for a selection and a projection. It becomes the entire sales pitch for division. Relational division answers “find the X associated with every Y” (which suppliers ship all the red parts?), and it is one operator: sp.project("sno", "pno").divide(red_parts). The SQL it compiles to is a nested NOT EXISTS ... EXCEPT double negation that is famously unpleasant to write and worse to read. Put the clean algebra and the gnarly SQL next to each other and the double negation finally clicks. That correspondence is the product: the algebra you see rendered and the SQL that actually runs are guaranteed to denote the same query, or the library has failed at its one job.

A couple of deliberate choices

The relational model is set-based: no duplicate rows, ever. SQL is not (even if it sometimes claims otherwise); it defaults to bags. coddpiece sides with the algebra and puts DISTINCT on every query, so the results match the theory. When you want to see the gap, .bags() turns it off: sp.project("sno").count() is 4 distinct suppliers, sp.project("sno").bags().count() is 12 shipments.

The other choice you’ll notice immediately is that comparisons don’t return booleans. s.city == "London" builds a predicate node, because Python lets you overload == to return whatever you like. It does not let you overload and, or, and not, so those are off-limits; you compose predicates with &, |, and ~. Write and out of habit and coddpiece raises an error telling you to use & instead, which is the only humane thing to do with a deliberate Liskov violation.

Backends, honestly

coddpiece talks to anything that implements PEP 249. It sniffs the connection’s parameter style and identifier quoting and introspects schemas from the database, so the same expression compiles for SQLite, PostgreSQL, or MySQL. One caveat, stated plainly because you’d find out anyway: only the SQLite path is exercised against a live database in CI. The PostgreSQL and MySQL branches exist and are wired up, but until the Postgres job is asserting on real round-trips, treat them as well-founded inference rather than proven. If you run it on PG or MySQL and something is off, the issue tracker is open.

pip install coddpiece, Python 3.10 or newer, and SQLite is already on your machine, so there’s nothing else to install to get going. MIT licensed. It ships with the suppliers-and-parts dataset from C.J. Date, so you can from coddpiece.datasets import suppliers_and_parts and start composing operators against the same three tables every database textbook uses.

1	>>> print(s.select(s.city == "London").project("sname").explain())
2	Algebra:
3	π(sname)(σ(city="London")(s))
4
5	Tree:
6	Project(sname)
7	└─ Selection(city="London")
8	└─ s
9
10	SQL:
11	SELECT DISTINCT "sname"
12	FROM "s"
13	WHERE "city" = ?
14	-- params: ['London']
15
16	Reading:
17	Keep only rows where city="London". Keep only columns sname.

The whole point is the side-by-side

A couple of deliberate choices

Backends, honestly

Related