postgresql when it's not your job

15 October 2009

19:52

Only NATURAL: JOIN clauses vs WHERE conditions

When I learned SQL, I was taught that the way to do a join was with a WHERE clause. For example, if you have orders:

CREATE TABLE orders (
   order_id bigserial PRIMARY KEY,
   customer_id bigint REFERENCES customers(customer_id),
   ...
)

and order lines:

CREATE TABLE order_lines (
   ordinal integer NOT NULL,
   order_id bigint REFERENCES orders(order_id),
   item_id bigint REFERENCES items(item_id),
   quantity decimal(10,2) NOT NULL CHECK(VALUE >= 0),
   ...
   PRIMARY KEY(order_id, ordinal)
)

… to get a list of customer_id, item_id pairs:

SELECT orders.customer_id, order_lines.item_id
   FROM orders, order_lines
   WHERE orders.order_id = order_lines.order_id;

And that works great. However, there is another syntax, which is to use a JOIN clause in the FROM list:

SELECT orders.customer_id, order_lines.item_id
   FROM orders NATURAL JOIN order_lines;

NATURAL JOIN, in the words of the wonderful PostgreSQL documentation, “is shorthand for a USING list that mentions all columns in the two tables that have the same names,” so it’s equivalent to:

SELECT orders.customer_id, order_lines.item_id
   FROM orders JOIN order_lines USING(order_id);

which in turn is equivalent to:

SELECT orders.customer_id, order_lines.item_id
   FROM orders JOIN order_lines ON orders.order_id=order_lines.order_id;

NATURAL JOIN is nifty, but it does have a signficant danger. Suppose later we decided to add notes to orders:

ALTER TABLE orders ADD COLUMN note text;

And to order lines, too:

ALTER TABLE order_lines ADD COLUMN note text;

Suddenly, our natural join is much less, well, natural; it’s equivalent to:

SELECT orders.customer_id, order_lines.item_id
   FROM orders JOIN order_lines ON (orders.order_id=order_lines.order_id)
      AND (orders.note=order_lines.note);

… which is almost certainly not what you want. In absence of amazingly good naming discipline, NATURAL is a bit too surprising.

That being said, I’ve gotten very fond of USING and ON. I find it very intuitive to separate join conditions from other filtering conditions. For example, if we only wanted order lines with a quantity greater than zero:

SELECT orders.customer_id, order_lines.item_id
   FROM orders JOIN order_lines USING(order_id)
   WHERE order_lines.quantity > 0;

I like the separation of the join condition that assembles the product of the input tables from the filtration conditions, as I tend to think of those independently when designing a query.

This starts coming apart when you have extremely complicated join conditions, or are joining over many tables (more than three or four), it’s often clearer to move the join conditions into the WHERE clause. Otherwise, see if separating out the join conditions from the other conditions makes your queries easier to read; it did for me.

13 October 2009

13:39

SFPUG: Statistics and Postgres — How the Planner Sees Your Data

The video for the September 8, 2009 SFPUG meeting is now available: Read the rest of this entry »

27 August 2009

08:36

Quote of the Day

“The locking on this application is not so much pessimistic as clinically depressed.” — me

14 August 2009

16:23

Quote of the Day

“Hovercraft are just as happy hovering over the surface of land as of water. The authorities forbid their use on roads. And when you appreciate that they have no brakes, tend to slither off the camber down into the gutter and are deflected sideways by the slightest breeze, you can see that the authorities have made a wise move for once.” — Chris Roper of the Steam Boat Willy project, dedicated to developing a human-powered hovercraft

13 August 2009

22:22

SFPUG: Windowing and Common Table Expressions

The video for the August 11, 2009 SFPUG meeting is now available: Read the rest of this entry »

10 August 2009

22:43

Quote of the Day

“The question of whether a machine can think is no more interesting than the question of whether or not a submarine can swim.” — Edsger Dijkstra

9 August 2009

01:51

Resume Building

The trailer for the sequel to Tron looks awesome.

And this gives me an opportunity to vent one of my pet peeves about the original Tron.

In Tron, you have (in essence) a battle between three programmers: Flynn (hero), Bradley (hero), and Dillinger (villain). Now, what actual programs have these three delivered, per the movie?

Flynn: Some games (and those he wrote after hours on the company’s computer).
Bradley: A pretty bad-ass security program.
Dillinger: An operating system which becomes sentient.

Unsporting though it may be of me, I know which one I’d hire. With appropriate code reviews, of course.

8 August 2009

16:59

Dot Dot Dot

In a thoughtful post at the Big Nerd Ranch blog, Joe Conway talks about the relatively new dot notation in Objective-C for invoking messages on objects.

The executive summary is, he doesn’t like it.

He’s making two arguments against dot notation:

While it is now legal Objective-C, it’s an uneasy fit to an enormous history of Objective-C code, and existing coding patterns in the language.
It blurs the lines between accessing fields in a structure and invoking code, and that’s bad in and of itself.

I have no argument at all with point 1. I personally dislike the brackets notation, but that’s just years of writing C++. Objective-C is a different language, different syntax, get over it, write some code: no problem.

Point 2… well. Maybe this is also my years of C++, but I really don’t see the problem. For example, he writes:

In Objective-C, the square brackets ([]) were added for message sending. While the square brackets had previously only been used for indexing an array, the intent of the brackets can be easily determined by the context.

That’s true, but the original reason that Objective-C used brackets was not to differentiate message sending from structure access; it was because Objective-C was (and largely still is) a preprocessor on C, and using brackets in that way made it easy to parse and do the substitutions. I’m entirely in favor of making a virtue of that necessity, but let’s not forget that the original reason was arbitrary, and unique in the history of object-oriented extensions to C.

He goes on to say, in the example:

int x = foo.value;

What does that mean? Are we getting the value field out of the structure object foo? Are we executing a simple method that returns a value? Or, in this case, are we creating a network connection, pulling a value from a web server, turning that data in to an integer and then returning it?

Well, yes, but we really don’t any more in:

int x = [foo value];

except that we’re running some code. He writes:

Our first glance tells us we are definitely sending a message. That clues us in that more work is being done, not just a simple memory offset and an assignment.

OK, good, we do know that we’re sending a message, but… well, we still don’t know anything at all about the nature of the message than we did in the first example. Calling these things “foo” and “value” is stacking the deck a bit, too; if the example was either:

int remoteMemoryTotal = remoteConnectionPartner.getRemoteMemoryUse;

or

int remoteMemoryTotal = [remoteConnectionPartner getRemoteMemoryUse];

it’s a bit harder to argue that we have zero information about what’s going on.

My preference for dot notation largely comes from the desire for encapsulation. I like the idea that if I need to swap out an implementation detail, I can do so without a syntax change. For me, the fact that accessing a field off of an object, and invoking a method on that object use the same syntax is a feature, not a bug.

Part of it is cultural, I suspect. Objective-C is an interesting hybrid language. The object model is very dynamic and fluid, very Smalltalk-y, but that’s added on top of C. Not a “C-like language,” but C. K&R C. Thus, there’s a huge difference (in philosophy as well as performance) between reading an int out of a structure and sending a message to an object that returns an int, and the syntax reflects that. In other languages, like C++ and Java, the philosophical difference is reduced (somewhat in C++, greatly in Java), and a founding principle of those languages was to make “get something from an object” be the fundamental operation.

He does make some good points about returning l-values from functions (something that C++ wrestled with unhappily), which is a requirement for writing really nice setters as well as gettings using dot notation. All in all, I agree with his fundamental point that:

You aren’t at home. You’re working with another language.

… even if I don’t accept the premise that using dot notation to invoke methods is fundamentally a bad idea.

12:41

Which Side Are You On?

I have multiple reactions when I read that extremely talented Macintosh developers are boycotting the iPhone App Store.

First, I agree completely with the essential issues. The rejection and approval process is not one that inspires the least bit of confidence, especially for those developers who might write exactly the kind of high-investment, sophisticated application that is exactly (a) the kind of application that makes the iPhone the leader, and (b) is likely to run afoul of an expansive or pedantic interpretation of one of the “rules.” (I put the term in quotes because the rules, as applied, are simply too vague to be considered such.)

As an analogy, suppose Apple told Adobe that it could not ship Photoshop for Macintosh OS X because it duplicates the functionality of Preview. That’s what they did with Google Voice. I agree that this particular scenario is unlikely, while observing that code signing for Mac OS X is going to be mandatory at some point in the future.

When I was at Apple during the early 1990s, the company culture was still one of absolute self-assurance and arrogance, even though those were the wilderness years. I can only imagine what it is like there now. One of the things that struck me about Phil Schiller’s response regarding the Ninjawords situation was that Apple responded, not in a case where they were clearly, undeniably wrong to reject an application, but in one where they felt they were clearly, undeniable right, and were being treated unfairly by that portrayal. It is a mark of how extremely arrogant Apple appears (and is) right now that Schiller can say, in effect, “No, you have it all wrong, we were correct and the developer is wrong, we did everything exactly right and have no case to answer,” and it is considered an step forward in good communication.

Let’s consider Schiller’s closing:

Apple’s goals remain aligned with customers and developers—to create an innovative applications platform on the iPhone and iPod touch and to assist many developers in making as much great software as possible for the iPhone App Store. While we may not always be perfect in our execution of that goal, our efforts are always made with the best intentions, and if we err we intend to learn and quickly improve.

Gruber considers this statement as the “first proof I’ve seen that Apple’s leadership is trying to make the course correction that many of us see as necessary for the long-term success of the platform.” I think he’s being far too generous. Schiller’s statement contains absolutely nothing concrete. Of course they’re not perfect, and one would hope that “if they err” they would do something about it. Those are platitudes, not steps.

The only rational conclusion is: What you see now in the App Store is what you are going to get, absent the FCC or FTC slapping Apple around (and please do not hold your breath for anything substantive in that regard).

Second, I doubt that this is enough for me to stop developing for the iPhone. (The easy answer is “no,” because I have signed agreements to deliver iPhone applications, but what about my own spec work?) The reality is that if you are developing for a mobile, you develop for the iPhone and then worry about any other platform. (Microsoft has pretty much admitted this themselves.) It’s an impossibly large market and pile of money, and it is very difficult to consider walking away from it.

Third, we are watching an important shift happening in the software industry. There is a class of developer, console game developers, who must be regarding this with considerable amusement, since they’ve always developed in a walled garden with constant supervision from the platform developer. Desktop developers aren’t used to this. The only kind of retribution we’ve ever been given was the occasional breaking of a private API in the OS (and even that was often treated as a major betrayal on the OS vendor’s part).

From the point of view of desktop developers, the open development model is a natural law. From the point of view of the platform owners, it is a historical accident, like the lack of DRM on CDs, that they wish they could go back in time and fix. Anyone who thinks that Steve Jobs is not irritated that Adobe can sell Creative Suite Design Premium for $1,800 without paying Apple a penny does not understand Steve Jobs.

2 August 2009

16:42

NULL By Mouth

Jeff Davis has written a superb article about the problems with NULL in SQL. He has it exactly right when he says:

I think the best way to think about NULL is as a Frankenstein monster of several philosophies and systems stitched together by a series of special cases.

The closest single operational definition of NULL I can think of is, “This could be any value, so I-the-database will not treat it as being any particular value.” Of course, that immediately breaks given that aggregates ignore nulls.

This is, sadly, one of those things that you just have to learn. Painfully.