postgresql when it's not your job

4 February 2011

01:59

PostgreSQL for Servoy Developers

The slides from my presentation on PostgreSQL for Servoy Developers, presented at ServoyWorld 2011, are available here.

31 December 2010

13:50

Nobody Here But Us Chickens: Google and Lies We Tell Ourselves

tl;dr: If you make a tradeoff, be honest about it. Don’t lie to yourself that you are making a positive architectural decision when you make a negative tradeoff.

Read the rest of this entry »

22 December 2010

15:54

Extra columns when doing .distinct() in a Django QuerySet

tl;dr: If you are doing a .distinct() query and limiting the results using .values() or .values_list(), you may be in for a surprise if your model has a default ordering using the Meta value ordering. You probably want to clear the ordering using .order_by() with no parameters.

Read the rest of this entry »

00:00

Getting the ID of Related Objects in Django

tl;dr: Don’t retrieve a whole row just to get the primary key you had anyway. Don’t iterate in the app; let the database server do the iteration for you.

Read the rest of this entry »

17 December 2010

11:35

Why I run qmail

There’s a very nasty root exim exploit in the wild.

Updated: To be fair to the hard-working exim team, this bug was fixed some time ago.

09:48

Comparing NULLs Considered Silly

tl;dr: You can’t compare NULLs. A nullable primary key is a contradiction in terms. You can’t join on NULL, so a NULL foreign key refers to nothing, by definition. NULL doesn’t do what you think it does, no matter what you think it does.

Read the rest of this entry »

14 December 2010

17:02

Using Server-Side PostgreSQL Cursors in Django

This is a follow-up to the previous post, in which we talked about ways of handling huge result sets in Django.

Two commenters (thanks!) pointed out that psycopg2 has built-in support for server-side cursors, using the name option on the .cursor() function.

To use this in Django requires a couple of small gyrations.

First, Django wraps the actual database connection inside of the django.db.connection object, as property connection. So, to create a named cursor, you need:

cursor = django.db.connection.connection.cursor(name='gigantic_cursor')

If this is the first call you are making against that connection wrapper object, it’ll fail; the underlying database connection is created lazily. As a rather hacky solution, you can do this:

from django.db import connection

if connection.connection is None:
    cursor = connection.cursor()
       # This is required to populate the connection object properly

cursor = connection.connection.cursor(name='gigantic_cursor')

You can then iterate over the results using the standard iterator or cursor.fetchmany() method, and that will grab results in from the server in the appropriate chunks.

13 December 2010

20:10

Very Large Result Sets in Django using PostgreSQL

tl;dr: Don’t use Django to manage queries that have very large result sets. If you must, be sure you understand how to keep memory usage manageable.

Read the rest of this entry »

6 November 2010

19:04

Waiting for psycopg2.3: NamedTuples

Christmas just came early for me. psycopg2.3, now in beta, includes named tuples as return values from queries.

If you are tired of writing result[4], and would much prefer to write result.column_name, you now can.

Yay!

28 October 2010

11:24

Small PostgreSQL Installations and 9.0 Replication

Yesterday, I commented on a post about how widespread uptake on 9.0 replication will be. I disagreed with the assessment that “users” (by which we mean small installations of PostgreSQL, defined however you care to) will not be interested in 9.0’s hot standby/streaming replication.

Ultimately, of course, we’ll find out. But I strongly feel that 9.0’s streaming replication will be a big deal for small PostgreSQL installations… indeed, I think it will be a much bigger deal for them than big ones.

First, I’ll happily exclude hobbyist and developer installs of PostgreSQL. I don’t back up my development PG databases more often than once a day, and I certainly don’t have any kind of replication set up for them (unless that’s what I’m developing). The important part, the code, lives in a DVCS, and if I had to reconstruct the db from scratch, no big deal… indeed, I do it all the time.

I’m talking about small installations of PG that are used to as authoritative records of business-critical information: Web site transactions, for example. The fact that, traditionally, these users of PG haven’t been all that into replication solutions has nothing to do with their actual need for replication; instead, it has to do with the solutions they had available.

Small installations generally don’t have the time and expertise to search out third-party solutions, or the budget to pay an expert to do so. If it doesn’t come in the base RPM or tarball, they’re not interested in it.
The third-party solutions that are available are all complex and fiddly to set up. I’m certainly not bashing Slony, for example; it’s a great tool. But it is not something that a casual DBA wants to take on.

So, they make do with pg_dumpall and hope for the best… and then call someone like us if that doesn’t work.

But it is fallacious to conclude that because they are not using replication right now, they have no use for it. Ask a corner liquor store if they could afford to have an entire day’s worth of electronic transactions just vanish; I’ll bet a bottle of something cheap that they carry that the answer would be, “Of course not.” It might not be worth a $15,000 consulting engagement to set it up, but it’s worth something, possibly quite a bit.

Indeed, this is one of the things that’s driving adoption of “cloud computing”: The (sometimes erroneous) idea that the cloud provider is managing disaster recovery and high availability for you, included in the cost of your monthly service charge.

tl;dr: PG 9.0’s streaming replication will be widely adopted by smaller installations that use PG to manage business-critical data, specifically because it makes it something a casual DBA can do, something we’ve not had before with PG.