15 October 2009
When I learned SQL, I was taught that the way to do a join was with a WHERE clause. For example, if you have orders:
CREATE TABLE orders ( order_id bigserial PRIMARY KEY, customer_id bigint REFERENCES customers(customer_id), ... )
and order lines:
CREATE TABLE order_lines ( ordinal integer NOT NULL, order_id bigint REFERENCES orders(order_id), item_id bigint REFERENCES items(item_id), quantity decimal(10,2) NOT NULL CHECK(VALUE >= 0), ... PRIMARY KEY(order_id, ordinal) )
… to get a list of
SELECT orders.customer_id, order_lines.item_id FROM orders, order_lines WHERE orders.order_id = order_lines.order_id;
And that works great. However, there is another syntax, which is to use a JOIN clause in the FROM list:
SELECT orders.customer_id, order_lines.item_id FROM orders NATURAL JOIN order_lines;
NATURAL JOIN, in the words of the wonderful PostgreSQL documentation, “is shorthand for a USING list that mentions all columns in the two tables that have the same names,” so it’s equivalent to:
SELECT orders.customer_id, order_lines.item_id FROM orders JOIN order_lines USING(order_id);
which in turn is equivalent to:
SELECT orders.customer_id, order_lines.item_id FROM orders JOIN order_lines ON orders.order_id=order_lines.order_id;
NATURAL JOIN is nifty, but it does have a signficant danger. Suppose later we decided to add notes to orders:
ALTER TABLE orders ADD COLUMN note text;
And to order lines, too:
ALTER TABLE order_lines ADD COLUMN note text;
Suddenly, our natural join is much less, well, natural; it’s equivalent to:
SELECT orders.customer_id, order_lines.item_id FROM orders JOIN order_lines ON (orders.order_id=order_lines.order_id) AND (orders.note=order_lines.note);
… which is almost certainly not what you want. In absence of amazingly good naming discipline, NATURAL is a bit too surprising.
That being said, I’ve gotten very fond of USING and ON. I find it very intuitive to separate join conditions from other filtering conditions. For example, if we only wanted order lines with a quantity greater than zero:
SELECT orders.customer_id, order_lines.item_id FROM orders JOIN order_lines USING(order_id) WHERE order_lines.quantity > 0;
I like the separation of the join condition that assembles the product of the input tables from the filtration conditions, as I tend to think of those independently when designing a query.
This starts coming apart when you have extremely complicated join conditions, or are joining over many tables (more than three or four), it’s often clearer to move the join conditions into the WHERE clause. Otherwise, see if separating out the join conditions from the other conditions makes your queries easier to read; it did for me.