12 April 2012
12:09
Instagram has been in the news lately. In this really great post on Tumblr, Instagram talks about its technology stack.
I have some acquaintance with the Instagram people, and they are among the smartest technologists I’ve met. Really nice, too. (Of course, they mention this blog in the post, so I’m biased.)
19 March 2012
12:33
As noted before, Django has a lot of facilities for handling transactions, and it’s not at all clear how to use them. In an attempt to cut through the confusion, here’s a recipe for handling transactions sensibly in Django applications on PostgreSQL.
The goals are:
- Database operations that do not modify the database aren’t wrapped in a transactions at all.
- Database operations that modify the database are wrapped in a transactions.
- We have a lot of fine-grained control over sections that modify the databse vs those that don’t.
The bits of the recipe are:
The quick reasons behind each step:
- Turning on autocommit prevents psycopg2 from automatically starting a new transaction on the first database operation on each connection; this means that the transaction only starts when we want it to.
- Similarly, the transaction middleware will set the connection state to “managed,” which will defeat the autocommit option above, so we leave it out.
- The
xact()
decorator will set up the connection so that a transaction is started in the relevant block, which is what we want for database-modifying operations.
This recipe a few other nice features:
xact()
operates like commit_on_success()
, in that it will issue a rollback if an exception escapes from the block or function it is wrapping.
xact()
ignores the dirty flag on the Django connection. Since we’re deliberately wrapping stuff that modifies the database with it, the chance of it being dirty is near 100%, and a commit on a transaction that did not modify the database is no more expensive in PostgreSQL than a rollback. It also means you can do raw SQL inside an xact()
block without the foot-gun of forgetting to call set_dirty
.
- Like the built-in Django transaction decorators, it can be used either as a decorator or as a context manager with the
with
statement.
xact()
can be nested, giving us nested transactions! If it sees that there is already a transaction open when it starts a new block, it will use a savepoint to set up a nested transaction block. (PostgreSQL does not have nested transactions as such, but you can use savepoints to get 99.9% of the way there.)
- By not wrapping operations that do not modify the database, we get better behavior when using pgPool II (more on that in a future post).
xact()
works around an outstanding bug in Django’s transaction handling on psycopg2.
xact()
also supports the using
parameter for multiple databases.
Of course, a few caveats:
xact()
requires the postgresql_psycopg2
backend, and PostgreSQL 8.2 or higher. It’s possible it can be hacked to work on other backends that support savepoints.
xact()
works just the way you want if it is nested inside a commit_on_success()
block (it will properly create a savepoint insted of a new transaction). However, a commit_on_success()
block nested inside of xact()
will commit or rollback the entire transaction, somewhat defeating the outer xact()
. To the extent possible, use only xact()
in code you write.
- Be sure you catch exceptions outside of the
xact()
block; otherwise, the automatic rollback will be defeated. Allow the exception to escape the xact()
block, and then catch it. (Of course, if the intention is to always commit and to defeat the rollback, by all means catch the exception inside the block.)
To use, just drop the source (one class definition, one function) into a file somewhere in your Django project (such as the omni-present utils
application every Django project seems to have), and include it.
Examples:
from utils.transaction import xact
@xact
def my_view_function1(request):
# Everything here will be in a transaction.
# It'll roll back if an exception escapes, commits otherwise.
def my_view_function2(request):
# This stuff won't be in a transaction, so don't modify the database here.
with xact():
# This stuff will be, and will commit on normal completion, roll back on a exception
def my_view_function3(request):
with xact():
# Modify the database here (let's call it "part 1").
try:
with xact():
# Let's call this "part 2."
# This stuff will be in its own savepoint, and can commit or
# roll back without losing the whole transaction.
except:
# Part 2 will be rolled back, but part 1 will still be available to
# be committed or rolled back. Of course, if an exception
# inside the "part 2" block is not caught, both part 2 and
# part 1 will be rolled back.
The source is available on GitHub. It’s licensed under the PostgreSQL License.
24 January 2012
23:03
My presentation from SCALE 10x, “PostgreSQL Performance When It’s Not Your Job” is now available for download.
30 September 2011
22:39
I’d like to recommend an interesting post, “Sharding & IDs at Instagram”, about sharding using Postgres.
27 September 2011
22:25
If you are not familiar with it already, Bucardo is a nifty multi-master replication system for PostgreSQL, written by Greg Sabino Mullane. Written in Perl, it is great if you need replication that doesn’t have the restrictions associated with PG 9’s streaming replication.
To keep your Bucardo installation clean and tidy, a few regular cron jobs are required. One of them cleans up the archived replicated data (stored in a separate database by Bucardo) once you know you are done with it.
The Bucardo page above has a recommended script using all sorts of bash
ing, but I wanted something a bit more pure-PostgreSQL; it also doesn’t purge more than one old table at a time. So, I whipped up the following PL/pgSQL function.
(Note that this is for Bucardo 4.4. I haven’t played with the forthcoming Bucardo 5, so I’m not sure if this is still required.)
CREATE OR REPLACE FUNCTION bucardo.purge_frozen_child_qs(far_back interval)
RETURNS SETOF TEXT AS
$purge_frozen_child_qs$
DECLARE
t TEXT;
qt TEXT;
BEGIN
IF far_back IS NULL THEN
RAISE EXCEPTION 'Interval cannot be null.'
USING HINT = 'So, do not do that.';
END IF;
IF (now() + far_back) > now() THEN
RAISE EXCEPTION 'Interval must be negative.'
USING HINT = 'Consider using the "ago" form of intervals.';
END IF;
FOR t IN
SELECT tablename
FROM pg_tables
WHERE schemaname='freezer'
AND tablename like 'child_q_%'
AND (replace(tablename, 'child_q_', '')::timestamp with time zone) < now() + far_back::interval
ORDER BY tablename
LOOP
qt := 'freezer.' || t;
EXECUTE 'DROP TABLE ' || qt;
RETURN NEXT qt;
END LOOP;
DELETE FROM bucardo.q
WHERE (started < now() + far_back::interval
OR ended < now() + far_back::interval
OR aborted < now() + far_back::interval
OR cdate < now() + far_back::interval)
AND (ended IS NULL OR aborted IS NULL);
RETURN;
END
$purge_frozen_child_qs$
LANGUAGE plpgsql
VOLATILE;
To use it, just call it repeatedly from a cron job with the appropriate argument, along the lines of:
SELECT * FROM bucardo.purge_frozen_child_qs('7 days ago'::interval);
It returns the names of the tables it deleted.
This particular function doesn’t need to be run more often than once a day. And it keeps your Bucardo goats nice and clean.
(A “bucardo” is a now-extinct species of goat. For why Bucardo is goat-related, ask Greg.)
26 July 2011
14:14
My tutorial at OSCON 2011, Unbreaking Your Django Application, is now available for download.
18 May 2011
11:31
The slides from my talk at PGCon 2011 are now available.
28 March 2011
09:49
DjangoCon Europe and the Apple WWDC are at the exact same time. This is going to be a tough call.
Update: Well, that was quick. WWDC sold out in 10 hours, while I was dithering.
18 March 2011
18:08
To bring everyone up to date:
- Justin Vincent wrote a post offering an opinion about the downsides of the chase of tech entrepreneurs over VC funding.
- Amy Hoy wrote a post expanding on Mr Vincent’s post.
- Alex Payne wrote a post criticizing this position, while finding it necessary to describe “long time acquaintance” Amy Hoy’s product as “duping credulous customers into overpaying for a time-tracking tool styled with this month’s CSS trends.”
- Unaccountably, he seems to have been surprised by the negative reaction this post generated, so he posted an explanation and partial retraction here.
Sadly, I find his last post as incoherent as his first one is vitriolic.
Rather than go through it point by point, the crux of his argument is:
Building a business around maximizing your individual happiness is not particularly useful or admirable. That is my position, and I’m well aware that it may be unpopular with some.
I am pleased to report, then, that Mr Payne has absolutely nothing to worry about, because no business that is built around the happiness of the owner as a primary goal has a hope of every getting anywhere, unless the business consists of the owner taking money out of one pocket and putting it in the other. Any business, unless it is operating in a grotesquely distorted marketplace, is primarily about pleasing its customers in exchange for their money.
I’m really not sure what these vaguely masturbatory companies Mr Payne is talking about do for a living, but every (successful) micro-business I know of is insanely, intensely focused on pleasing its customers. They have to be, because they don’t have an installed base, government-granted advantages, or (yes) piles of venture capital in the bank to fall back on if they fail to do so.
Mr Payne wants to run a big company. I wish him all the best. He seems to have his young heart in the right place. I have to say, though, that his emotional overreaction to the idea that someone might want to run a micro-business instead strikes me as the Puritan reacting to the idea that someone, somewhere, might be happy.
13 March 2011
15:42
Advisory locks are one of the cool unsung features of PostgreSQL. In 9.1, they are getting even cooler with transaction level locks. Many details here.