postgresql when it's not your job

00:00

Getting the ID of Related Objects in Django

22 December 2010

tl;dr: Don’t retrieve a whole row just to get the primary key you had anyway. Don’t iterate in the app; let the database server do the iteration for you.


There’s a couple of bad habits I see a lot in Django code (including, sadly, my own), which is abuse of a ForeignKey field. Let’s take the classic example:

class Publisher(Model):
    # We accept the default 'id' column
    name = TextField()
    ...

class Book(Model):
    # Likewise
    title = TextField()
    topic = TextField()
    publisher = ForeignKey(Publisher)
        # Remember this creates a publisher_id column

Now, let’s say we have a book:

b = Book.objects.get(title="Interior Landscapes")

And we want the ID of the publisher.

Don’t do this:

pub_id = b.publisher.id

This works, but it’s absurd: It does a separate select to fetch the entire Publisher object, and then extracts the ID. But, of course, it already had the ID, because that’s how it retrieved the publisher object. Instead, just go straight to the created ID field:

pub_id = b.publisher_id

Next, don’t use iteration to build lists if you can get the data directly out of the database. For example, suppose we want the list of publishers who publish books with topic “Surreal Architecture”. Far too often, I see this:

surreal_books = Books.objects.filter(topic="Surreal Architecture")

surreal_publishers = set([book.publisher.id for book in surreal_books])

In this case, Django will send one query to get the list of books, and then do a separate query for each publisher to get the publisher id… even though they’re already in memory.

surreal_publishers = set([book.publisher_id for book in surreal_books])

This is better, since it doesn’t have to retrieve each publisher, but far better is to make the database do all the work:

surreal_publishers_qs = Books.objects.filter(topic="Surreal Architecture").values('publisher_id').distinct()

The result set, in this case, is a bit of an odd duck: It’s a list of dictionaries, each dict being of the form { 'publisher_id': <id value> }. Of course, Python being Python, it’s not hard to transform that into a set:

surreal_publishers = set([entry['publisher_id'] for entry in surreal_publishers_qs])

And we didn’t have to do any raw SQL!

Tom at 06:14, 22 December 2010:

You can also do

surreal_publishers_qs = Books.objects.filter(topic=”Surreal Architecture”).values_list(‘publisher_id’, flat=True).distinct()

the result will look something like

[122,22,31, …, 77]

Extra columns when doing .distinct() in a Django QuerySet at 15:54, 22 December 2010:

[…] we want to get the list of IDs of the publishers, and we’re using the cool optimization that I described earlier, with the optimization a commenter suggested […]

An easy Django related-object performance +1 | Seek Nuance at 21:15, 22 December 2010:

[…] Today in his blog, Christophe Pettus wrote about an easy way to get better database performance from a ForeignKey reference. […]

Xof at 16:28, 23 December 2010:

Some more tips, including your first one, are here: http://docs.djangoproject.com/en/dev/topics/db/optimization/

Thanks! That’s good, because official documentation on it makes it feel much less hacky.

VidJa at 13:30, 28 December 2010:

cool, but what if you need the publisher name and not the id? You still would have to retrieve the row anyway?