My presentation from PGCon 2012, PostgreSQL on AWS with Reduced Tears, is now up.
My presentation from PGCon 2012, PostgreSQL on AWS with Reduced Tears, is now up.
Pickup trucks are great.
No, really. They are great vehicles. You can use them for all sorts of really useful things: Bringing your tools out to a construction gig. Delivering refrigerators. Helping your friend move a sofa. Carting away a reasonable amount of construction debris.
But if you need to deliver 75,000 pounds of steel beams to a construction site, in a single run? A pickup truck will not do it. Not even a big pickup. Not even if you add a new engine. Not even if you are willing to get three pickups. You need equipment designed for that. (And, as a note, the equipment that could handle delivering the steel beams would be a terrible choice for helping a friend move their sofa.)
“But,” I hear you say, “I already know how to drive a pickup! And we have a parking space for it. Can’t we just use the pickup? You’re a truck expert; tell us how to get our pickup to pull that load!”
And I say, “Being a truck expert, I will tell you again, a pickup the wrong kind of truck. There are other trucks that will handle that load with no trouble, but a pickup isn’t one of them. The fact that you have a pickup doesn’t make it the right truck.”
We have many clients that run PostgreSQL, happily, on Amazon Web Services.
Some clients, however, are not happy. They are attempting to haul tractor-trailer loads (such as high volume data warehouses) using pickup trucks (Amazon EC2 instances). They wish us to fix their problem, but are not willing to move off of Amazon in order to get the problem fixed.
I like AWS for a lot of things; it has many virtues, which I will discuss in detail soon. However, AWS is not the right solution for every problem. In particular, if you require a high read or write data rate in order to get the performance you need from your database, you will ultimately not be happy on AWS. AWS has a single block-device storage mechanism, Elastic Block Storage, which simply does not scale up to very high data rates.
That doesn’t mean that AWS is useless, it just means it isn’t the right tool for every job. The problem arises when AWS is considered the fixed point, like the pickup was the fixed point above. At some point, you have to decide:
Sadly, even the best of consultants do not have the magic engine in our back room that will cause EBS to perform as well as high-speed direct attached storage.
Well, I’m not going; are you? This year’s Apple World-Wide Developer’s Conference was sold out by 8am Pacific Time, having gone on sale around 6am. (I missed the boat in 2011 and 2010, too.) I can’t imagine anyone except perhaps Apple thinks that the mad scramble to the keyboard that we’ve experienced in the last few years is a rational way to allocate tickets.
It’s time for Apple to admit that the traditional model of a singular WWDC either requires a venue that can handle the crowd, or split it into multiple, regional events. It would lose the “gathering of the tribe” aspect that has always been one of the best parts of WWDC, but that’s lost now, anyway; the “tribe” is not defined on who was by their keyboards for 90 minutes at 6am on a Wednesday.
If Apple views the WWDC as a way to pack people into seats to create excitement for their announcements early in the week, then I suppose the current system is as good as anything. From any other perspective, it’s time to find another way of doing this.
First, read this essay about the disaster that is PHP. Every word is correct.
Then, view this photo set.
I’ll be speaking at the following conferences through July:
… or, inexcusable things I am tired of seeing in postgresql.conf files.
There are two valid styles for overriding the default values in postgresql.conf: Putting your changes as a cluster at the end, or uncommenting the defaults and overriding in place. Both have advantages and disadvantages. Having some settings one way and some another is pure disadvantage. Do not do this.
Quick, what is
log_min_duration_statement set to here?
log_min_statement_duration = 2000
Now, what is it set to here?
log_min_statement_duration = 2s
Always use units with numeric values if a unit is available.
If you strip out all of the defaults, it becomes impossible to tell what a particular value is set to. Leave the defaults in place, and if you comment out a setting, reset the value to the default (or at least include comments that make it clear what is going on).
If you need to move postgresql.conf (and the other configuration files) to a different location from where the package for your system puts it, don’t leave the old, dead postgresql.conf lying around. Delete any trace of the old installation hierarchy.
I have some acquaintance with the Instagram people, and they are among the smartest technologists I’ve met. Really nice, too. (Of course, they mention this blog in the post, so I’m biased.)
As noted before, Django has a lot of facilities for handling transactions, and it’s not at all clear how to use them. In an attempt to cut through the confusion, here’s a recipe for handling transactions sensibly in Django applications on PostgreSQL.
The goals are:
The bits of the recipe are:
xact()decorator / context manager below, using it like you would the
The quick reasons behind each step:
xact()decorator will set up the connection so that a transaction is started in the relevant block, which is what we want for database-modifying operations.
This recipe a few other nice features:
commit_on_success(), in that it will issue a rollback if an exception escapes from the block or function it is wrapping.
xact()ignores the dirty flag on the Django connection. Since we’re deliberately wrapping stuff that modifies the database with it, the chance of it being dirty is near 100%, and a commit on a transaction that did not modify the database is no more expensive in PostgreSQL than a rollback. It also means you can do raw SQL inside an
xact()block without the foot-gun of forgetting to call
xact()can be nested, giving us nested transactions! If it sees that there is already a transaction open when it starts a new block, it will use a savepoint to set up a nested transaction block. (PostgreSQL does not have nested transactions as such, but you can use savepoints to get 99.9% of the way there.)
xact()works around an outstanding bug in Django’s transaction handling on psycopg2.
xact() also supports the
using parameter for multiple databases.
Of course, a few caveats:
postgresql_psycopg2backend, and PostgreSQL 8.2 or higher. It’s possible it can be hacked to work on other backends that support savepoints.
xact()works just the way you want if it is nested inside a
commit_on_success()block (it will properly create a savepoint insted of a new transaction). However, a
commit_on_success()block nested inside of
xact()will commit or rollback the entire transaction, somewhat defeating the outer
xact(). To the extent possible, use only
xact()in code you write.
xact()block; otherwise, the automatic rollback will be defeated. Allow the exception to escape the
xact()block, and then catch it. (Of course, if the intention is to always commit and to defeat the rollback, by all means catch the exception inside the block.)
To use, just drop the source (one class definition, one function) into a file somewhere in your Django project (such as the omni-present
utils application every Django project seems to have), and include it.
from utils.transaction import xact @xact def my_view_function1(request): # Everything here will be in a transaction. # It'll roll back if an exception escapes, commits otherwise. def my_view_function2(request): # This stuff won't be in a transaction, so don't modify the database here. with xact(): # This stuff will be, and will commit on normal completion, roll back on a exception def my_view_function3(request): with xact(): # Modify the database here (let's call it "part 1"). try: with xact(): # Let's call this "part 2." # This stuff will be in its own savepoint, and can commit or # roll back without losing the whole transaction. except: # Part 2 will be rolled back, but part 1 will still be available to # be committed or rolled back. Of course, if an exception # inside the "part 2" block is not caught, both part 2 and # part 1 will be rolled back.
My presentation from SCALE 10x, “PostgreSQL Performance When It’s Not Your Job” is now available for download.