4 November 2009
I love object-oriented programming. The first time I ever encountered an OO language or framework (Object Pascal and MacApp, thank you for asking), it was as if the heavens opened. It simply made sense to me as a way of structuring programs, and I’ve been wedded to it ever since. (As it turns out, if you do the right paperwork, you can marry a concept.)
So, I think object-oriented programming is the bee’s knees. And, in this post, I’m going to tell you to not use it.
I was recently working on a Django application, using PostgreSQL as the database. (This application has been heavily anonymized.) Among many other things, it has a table that is, in essence, a work flow queue. The queue is a list of orders, based on their (integer) primary key, order_pk. Periodically, a task comes through, reads the queue into a Python list, and runs down it, calling a method on each object:
for queue_entry in order_queue order = Orders.objects.get(pk=queue_entry.order_pk) order.allocate_inventory()
allocate_inventory() do? It calls a stored procedure, and returns:
def allocate_inventory(self): cursor = connection.cursor() cursor.execute(""" SELECT allocate_inventory(%s) """, [self.pk])
Lovely! Let’s just look at the database traffic it creates and… oh, gross.
The program is reading in and instantiating an entire
order object, so it can get the primary key out of it. But the way it was able to instantiate the object was because it had the primary key. In this application,
order objects have lots and lots of fields, so it’s dragging all of those fields across the wire to build the object, only to use the one field we already knew before we even creted the object.
If everything was in memory, no problem. Instatiating a Python object isn’t free, but it’s no big deal, and this keeps everything nicely encapsulated. But this is one of the dangers of ORMs: Things that would be cheap in an in-memory world suddenly become much pricier when you have to go out to the database, and it can be somewhat non-obvious when you have to go out to the database, how many times, and in what particular way.
(In one particularly pathological example, in this same application, the Django ORM was doing seven calls to the database in order to populate one pop-up menu … which had exactly one entry in it.)
The fix, of course, is to pass the primary key around rather than object:
for queue_entry in order_queue allocate_inventory(queue_entry.order_pk) @staticmethod def allocate_inventory(order_pk): cursor = connection.cursor() cursor.execute(""" SELECT orders.allocate_inventory(%s) """, [order_pk])
… even though this isn’t nearly as clean from an OO perspective as creating the object.
Whenever you’re developing against kind of ORM (or, really, any database application programming at all), it’s very wise to set all of the logging on the development PostgreSQL server to “log every statement,” and keep a
tail -f of the log file going in another window. That way, if an operation that looks like the proverbial “just a few lines of code” generates an unexpectedly large number of database calls, you can address that before it falls over in production.
ORMs can be very powerful, but the typical usage patterns in a nicely structured OO program can interact with them very badly, and generate a lot more database traffic than you’d expect.