09:32
I’ve been doing a lot more development with Django and PostgreSQL lately. The logical thing to do is to blog about it, of course…
On my blog running WordPress?
Well, that won’t do at all.
So, I’m going to be migrating this blog to use Django and PostgreSQL, and will blog about the adventure here. I’ll be starting with a bare VPS “slice” at Slicehost, running Centos 5.3, and document each step.
Onwards to step 1: Installing PostgreSQL on the slice.
20:51
The default installation of PostgreSQL listens on 127.0.0.1 (the local loopback address) and on Unix sockets. The controlling parameter, listen_addresses
, isn’t documented to have a setting that just listens on sockets, and not the loopback address.
As it happens, such a setting exists:
listen_addresses=''
Among other things, this appears to prevent local instances of pgAdmin from connecting (although psql
works fine), so its utility is not clear, but there it is if you wish it.
19:52
When I learned SQL, I was taught that the way to do a join was with a WHERE clause. For example, if you have orders:
CREATE TABLE orders (
order_id bigserial PRIMARY KEY,
customer_id bigint REFERENCES customers(customer_id),
...
)
and order lines:
CREATE TABLE order_lines (
ordinal integer NOT NULL,
order_id bigint REFERENCES orders(order_id),
item_id bigint REFERENCES items(item_id),
quantity decimal(10,2) NOT NULL CHECK(VALUE >= 0),
...
PRIMARY KEY(order_id, ordinal)
)
… to get a list of customer_id
, item_id
pairs:
SELECT orders.customer_id, order_lines.item_id
FROM orders, order_lines
WHERE orders.order_id = order_lines.order_id;
And that works great. However, there is another syntax, which is to use a JOIN clause in the FROM list:
SELECT orders.customer_id, order_lines.item_id
FROM orders NATURAL JOIN order_lines;
NATURAL JOIN, in the words of the wonderful PostgreSQL documentation, “is shorthand for a USING list that mentions all columns in the two tables that have the same names,” so it’s equivalent to:
SELECT orders.customer_id, order_lines.item_id
FROM orders JOIN order_lines USING(order_id);
which in turn is equivalent to:
SELECT orders.customer_id, order_lines.item_id
FROM orders JOIN order_lines ON orders.order_id=order_lines.order_id;
NATURAL JOIN is nifty, but it does have a signficant danger. Suppose later we decided to add notes to orders:
ALTER TABLE orders ADD COLUMN note text;
And to order lines, too:
ALTER TABLE order_lines ADD COLUMN note text;
Suddenly, our natural join is much less, well, natural; it’s equivalent to:
SELECT orders.customer_id, order_lines.item_id
FROM orders JOIN order_lines ON (orders.order_id=order_lines.order_id)
AND (orders.note=order_lines.note);
… which is almost certainly not what you want. In absence of amazingly good naming discipline, NATURAL is a bit too surprising.
That being said, I’ve gotten very fond of USING and ON. I find it very intuitive to separate join conditions from other filtering conditions. For example, if we only wanted order lines with a quantity greater than zero:
SELECT orders.customer_id, order_lines.item_id
FROM orders JOIN order_lines USING(order_id)
WHERE order_lines.quantity > 0;
I like the separation of the join condition that assembles the product of the input tables from the filtration conditions, as I tend to think of those independently when designing a query.
This starts coming apart when you have extremely complicated join conditions, or are joining over many tables (more than three or four), it’s often clearer to move the join conditions into the WHERE clause. Otherwise, see if separating out the join conditions from the other conditions makes your queries easier to read; it did for me.
08:36
“The locking on this application is not so much pessimistic as clinically depressed.” — me