2 July 2013
06:27
Advisory locks are a very useful feature in PostgreSQL, and they just aren’t used enough.
Here’s a scenario: You have a bulk import job. While that job is running, there’s an analysis job you want to prevent from starting, and you don’t want a bulk import to start while the analysis job is running. But any number of bulk importers can run at the same time. How do you communicate this?
With an advisory lock!
Each of the bulk importers can take a shared advisory lock. Those locks don’t block each other, so they can run freely. But the analysis job takes an exclusive advisory lock of the same type. It will wait until all the importers are done, and the importers will not be issued their shared locks until it completes.
Even better, PostgreSQL cleans them up for you when a session terminates; you don’t have to worry about a lock lingering when you didn’t mean it to.
To make using advisory locks easier in Django, I have a small context manager than can be used to wrap code that should run with an advisory lock held; it makes it much easier to use them in your application. You can find it on GitHub, and it installs using pip
.
04:51
My presentation from FOSDEM 2013, PostgreSQL as a Schemaless Database, is now posted (sorry for the delay!).
23 June 2013
11:32
I often find that I’m in the middle of a loop or something and discover an error. I want to exit the loop in a way that causes the database work I’ve done within it to be rolled back, but I don’t want that exception to propagate further.
This usually looks like:
try:
with xact():
for thing in things:
(etc. etc.)
except Rollback:
pass
Having noticed this pattern a lot, I’ve added it as a feature to Xact
. Xact defines a Rollback
exception. It processes it like any other exception, but then swallows it and normally exits the function or with clause. If you feel motivated, you can subclass Rollback
, although the utility of that escapes me at the moment.
When Django 1.6 is released, Xact
will be deprecated in favor of new functionality there… but for now, have fun with it!
15 May 2013
23:59
The Call for Papers for DjangoCon US 2013 is now open.
7 April 2013
13:17
psycopg2, the Python PostgreSQL interface library, is now up to version 2.5. This includes built-in support for the JSON and range types… yay!
13 March 2013
21:14
tl;dr: Don’t give tables the same name as base PostgreSQL types, even though it will let you.
It’s interesting how synchronicity can occur. In my talk about custom PostgreSQL types in Python, I mentioned that any time you create a table in PostgreSQL, you’re also creating a type: the row type of the table.
While I was presenting the talk, a client sent me email wondering why a pg_restore
of an expression index was failing, because the return type text
was not the same as pg_catalog.text
. OK, that’s strange!
What had happened is that the database has a table with the name text, which PostgreSQL will happily let you do:
postgres=# CREATE TABLE text (
postgres(# i INTEGER
postgres(# );
CREATE TABLE
And both types appear in pg_type
:
postgres=# SELECT typname, typnamespace FROM pg_type WHERE typname='text';
typname | typnamespace
---------+--------------
text | 11
text | 2200
(2 rows)
Needless to say, this isn’t a great idea, because although PostgreSQL seems to keep them straight most of the time, there are times (like a pg_restore
processing an expression index) that it can get confused.
So, don’t do this.
16:57
The slides for my presentation, “PostgreSQL, Python and Squid” (otherwise known as, “using Python in PostgreSQL and PostgreSQL from Python”) presented at PyPgDay 2013 at PyCon 2013, are available for download.
10 March 2013
16:17
tl;dr: Each and every tablespace is critical to the operation of your PostgreSQL database. If you lose one, you’ve lost the entire database.
This one can be short and sweet: If you use tablespaces in PostgreSQL, each and every one of them is a critical part of your database. If you lose one, your database is corrupted, probably irretrievably. Never, ever think that if you lose a tablespace, you’ve just lost the data in the tables and indexes on the tablespace; you’ve lost the whole database.
In a couple of cases, clients have had what they thought were clever uses of tablespaces. In each case, they could have lead to disaster:
“We’ll keep cached and recent data on a tablespace on AWS instance storage on SSDs, and the main database on EBS. If the instance storage goes away, we’ll just recreate that data from the main database.”
“We’ll keep old historical data on a SAN, and more recent data on local storage.”
In each case, there was the assumption that if the tablespace was lost, the rest of the database would be intact. This assumption is false.
Each and every tablespace is a part of your database. They are the limbs of your database. PostgreSQL is an elephant, not a lizard; it won’t regrow a limb it has to leave behind. Don’t treat any tablespace as disposable!
4 October 2012
13:04
A client of ours recently had me log into their server to set up a tablespace scheme for them. While I was in, I noticed that the secondary of the streaming replication pair wasn’t connecting to the primary. A quick check showed that the primary had been moved from one internal IP address to another, and in doing so everything had been updated except the pg_hba.conf file… so the secondary wasn’t able to connect.
This had happened several weeks prior.
The good news is that in addition to streaming replication, we had set up WAL archiving from the primary to the secondary, so the secondary was staying up to date using the WAL segments. We didn’t have to reimage the secondary; just fixing the pghba.conf and reloading the primary fixed the problem. Thanks to pgarchivecleanup, neither side was building up WAL segments.
There are several good reasons for including WAL archiving in your streaming replication setup. This kind of accidental problem is one of them.
4 September 2012