postgresql when it's not your job

19:23

Django, PostgreSQL, and Autocommit

7 November 2009

In part 1, we ran down a list of the standard Django features for controlling transactions. Now, we’re going to look at some ways to optimize how these tranactions happen.

Let’s look at the SQL that our create_order() view function generated, with the transaction middleware turned on, and no transaction decorators on the function:

LOG:  statement: SET DATESTYLE TO 'ISO'
LOG:  statement: SHOW client_encoding
LOG:  statement: SHOW default_transaction_isolation
LOG:  statement: BEGIN; SET TRANSACTION ISOLATION LEVEL READ COMMITTED
LOG:  statement: SET TIME ZONE E'America/Los_Angeles'
LOG:  statement: SELECT version()
LOG:  statement: INSERT INTO "example_address" ("street_address", "city", "state", "zip") VALUES (E'1313 Mockingbird Lane', E'Mockingbird Heights', E'CA', E'90026')
LOG:  statement: SELECT CURRVAL('"example_address_id_seq"')
LOG:  statement: INSERT INTO "example_order" ("customer_name", "shipping_address_id") VALUES (E'Herman Munster', 5)
LOG:  statement: SELECT CURRVAL('"example_order_id_seq"')
LOG:  statement: COMMIT

Setting aside the overhead statements (and, good grief, there are a lot of those), what’s wrong with this? Well, not much, except that as of 8.2, PostgreSQL has a much better way of getting the primary key (or any other field, for that matter) back from an INSERT statement, which is the INSERT … RETURNING construct. It’s perfect for just this situation, when you’re inserting a record and creating the primary key at the same time, and want to et the primary key back for further operations.

Now, let’s look at the SQL we get from our show_order() function:

LOG:  statement: SET DATESTYLE TO 'ISO'
LOG:  statement: SHOW client_encoding
LOG:  statement: SHOW default_transaction_isolation
LOG:  statement: BEGIN; SET TRANSACTION ISOLATION LEVEL READ COMMITTED
LOG:  statement: SET TIME ZONE E'America/Los_Angeles'
LOG:  statement: SELECT version()
LOG:  statement: SELECT "example_order"."id", "example_order"."customer_name", "example_order"."shipping_address_id" FROM "example_order" WHERE "example_order"."id" = 1 
LOG:  statement: ROLLBACK

In this case, we don’t need that transaction at all.

At it happens, there’s a Django feature that can address both these problems at once, as long as you are running PostgreSQL 8.2 with the postgresql_psycopg2 backend…

Enter the autocommit

This functionality is kind of buried in the Django documentation, almost apologically stuck in the Notes about supported databases section. It’s an option you add to your settings.py file:

DATABASE_OPTIONS = {
    "autocommit": True,
}

What does this do? The documentation, I’m afraid, is pretty obscure about the actual effect. Groveling around in the code, we discover that what it does is prevent Django from automatically starting a new transaction on the first database operation. If we switch it on, our SQL traffic for show_order() now looks like:

LOG:  statement: SET DATESTYLE TO 'ISO'
LOG:  statement: SHOW client_encoding
LOG:  statement: SHOW default_transaction_isolation
LOG:  statement: SET TIME ZONE E'America/Los_Angeles'
LOG:  statement: SELECT version()
LOG:  statement: SELECT "example_order"."id", "example_order"."customer_name", "example_order"."shipping_address_id" FROM "example_order" WHERE "example_order"."id" = 1 

That’s much better! Well, how about for create_order?

Environment:

Request Method: GET
Request URL: http://127.0.0.1:8000/createorder/
Django Version: 1.1
Python Version: 2.5.2
Installed Applications:
['djangotrans.example']
Installed Middleware:
('django.middleware.common.CommonMiddleware',
 'django.contrib.sessions.middleware.SessionMiddleware',
 'django.contrib.auth.middleware.AuthenticationMiddleware')


Traceback:
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/django/core/handlers/base.py" in get_response
  92.                 response = callback(request, *callback_args, **callback_kwargs)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/django/db/transaction.py" in _commit_on_success
  240.                 res = func(*args, **kw)
File "/Users/xof/Documents/Dev/examples/djangotrans/../djangotrans/example/views.py" in create_order
  9.     address.save()
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/django/db/models/base.py" in save
  410.         self.save_base(force_insert=force_insert, force_update=force_update)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/django/db/models/base.py" in save_base
  495.                     result = manager._insert(values, return_id=update_pk)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/django/db/models/manager.py" in _insert
  177.         return insert_query(self.model, values, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/django/db/models/query.py" in insert_query
  1087.     return query.execute_sql(return_id)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/django/db/models/sql/subqueries.py" in execute_sql
  324.             return self.connection.ops.fetch_returned_insert_id(cursor)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/django/db/backends/__init__.py" in fetch_returned_insert_id
  171.         return cursor.fetchone()[0]

Exception Type: ProgrammingError at /createorder/
Exception Value: no results to fetch

Hm. That’s not what we expected, at all.

As it happens, there’s a bug in Django 1.1 when using autocommit; it blows up if the very first thing that you do on a new database connection is an INSERT that expects a return value back. As a workaround, we can add a line to our create_order() function to force it to do some traffic before the first insert:

def create_order(request):
    cur = connection.cursor()
        # Work around Django 1.1 autocommit bug

    address = Address(street_address="1112 E Broad St", city="Westfield", state="NJ", zip="07090")
    address.save()

    order = Order(customer_name="Gomez Addams", shipping_address=address)
    order.save()

    return HttpResponse("Created: " + unicode(order))

And now, we get…

LOG:  statement: SET DATESTYLE TO 'ISO'
LOG:  statement: SHOW client_encoding
LOG:  statement: SHOW default_transaction_isolation
LOG:  statement: SET TIME ZONE E'America/Los_Angeles'
LOG:  statement: SELECT version()
LOG:  statement: INSERT INTO "example_address" ("street_address", "city", "state", "zip") VALUES (E'1112 E Broad St', E'Westfield', E'NJ', E'07090') RETURNING "example_address"."id"
LOG:  statement: INSERT INTO "example_order" ("customer_name", "shipping_address_id") VALUES (E'Gomez Addams', 82) RETURNING "example_order"."id"

Wow, that’s great! Notice the nice, efficient RETURNING clause, and notice… uh, the complete lack of a transaction block around the INSERTs. This is why this particular function is called “autocommit”: since we’re not running inside an enclosing transaction, each statement is “autocommitted” upon completion. (Calling a feature “autocommit” when the practical effect is that the framework does not automatically generate COMMIT statements is perhaps not the best choice of nomenclature.)

So, what are our options? Let’s add the @transaction.commit_on_success decorator, and see if that improves matters:

LOG:  statement: SET DATESTYLE TO 'ISO'
LOG:  statement: SHOW client_encoding
LOG:  statement: SHOW default_transaction_isolation
LOG:  statement: BEGIN; SET TRANSACTION ISOLATION LEVEL READ COMMITTED
LOG:  statement: SET TIME ZONE E'America/Los_Angeles'
LOG:  statement: SELECT version()
LOG:  statement: INSERT INTO "example_address" ("street_address", "city", "state", "zip") VALUES (E'1112 E Broad St', E'Westfield', E'NJ', E'07090') RETURNING "example_address"."id"
LOG:  statement: INSERT INTO "example_order" ("customer_name", "shipping_address_id") VALUES (E'Gomez Addams', 84) RETURNING "example_order"."id"
LOG:  statement: COMMIT

Perfect.

Wait. What’s going on?

The following is a digression, and isn’t really required to use autocommit… but you might find it interesting.

You might well be thinking, “That’s all a very nifty trick, but what in the world does INSERT ... RETURNING have to do with autocommit? And why is autocommit limited to PostgreSQL 8.2, when that behavior has been standard with PostgreSQL since ever?”

First, note how Django gets the result of an INSERT if it does have the INSERT … RETURNING syntax available:

LOG:  statement: INSERT INTO "example_address" ("street_address", "city", "state", "zip") VALUES (E'1313 Mockingbird Lane', E'Mockingbird Heights', E'CA', E'90026')
LOG:  statement: SELECT CURRVAL('"example_address_id_seq"')

That is, it calls the currval function. This works great… as long as you are assured that you are the same connection as you were when you issued the insert. But why wouldn’t you be?

Good question, and the answer is: Pooling.

Pretty much any site with any degree of load will, at some point, need pooling software like pgpool-II or PgBouncer to multiplex connections between the application and database server.

The base mode on both of these fine products simulates as closely as possible being directly connected to the database: Each time the client connects to the intermediate connection pooler, it appears that a new, isolated connection has opened up.

However, more aggressive forms of connection pooling are possible. For example, PgBouncer has transaction pooling, in which the connection to the database is only assigned to a client while it has a transaction open. Between transactions, the connection between PgBouncer and the database could switch, which would cause (really strange and hard-to-reproduce) errors.

“But,” you say, having read the PgBouncer docs, “it’s inside a transaction! That keep it from switching to a new connection, even when you’re using transaction pooling!”

That’s true… unless you are running with autocommit turned on, and take no special precautions to make sure that your INSERT is inside of a transation block (remember, it’s not by default, as we saw above).

Note that it is never wrong to use INSERT … RETURNING; it’s just required if you are using transaction pooling (Django doesn’t know if you are or not, so it assumes that you are), with autocommit turned on, and if you do not take special pains to put each INSERT or group thereof into its own transaction.

Back to Reality.

I have to say that it appears that Django’s autocommit support isn’t quite fully baked, but it’s still a very useful feature for getting rid of the redundant BEGIN/ROLLBACK blocks around read-only view functions (which will tend to be the majority of them), while still giving the programmer control over the commit model for those view functions that do write to the database. And, as a nice little bonus, you get the more efficient INSERT ... RETURNING operation as well, as long as you are using PostgreSQL 8.2 or higher. And, of course, you should be!

Django, PostgreSQL, and Transaction Management at 19:24, 7 November 2009:

[…] isn’t actually required at all, setting aside the other overhead. Can we get rid of it? In part two, we’ll see how we […]

Miguel Angel at 03:18, 12 November 2009:

Great job with this posts, very good information that it is usually difficult to find, thanks.

Christophe Pettus: Django, PostgreSQL, and Autocommit | VishwaTech IT News at 17:44, 3 December 2009:

[…] here to read the rest: Christophe Pettus: Django, PostgreSQL, and Autocommit  Leave a […]

VidJa at 02:20, 19 March 2010:

Thanks for this great article. I was wondering about this and couldn’t find nice documentation. I implemented your methods and it works great.

Philip Gatt at 12:41, 27 October 2011:

Great article. I was stumped but a google search brought me here and you saved me a lot of time. Thank you!