09:34
“Corruption War Stories” from PGConf US 2017
The slides from my presentation, Corruption War Stories, are now available.
09:34
The slides from my presentation, Corruption War Stories, are now available.
09:33
The slides from my presentation, Django and PostgreSQL, are now available.
07:08
My slides from my PGConf US 2017 tutorial, PostgreSQL When It’s Not Your Job, are available now.
03:14
The slides for my talk Corruption War Stories are now available.
12:00
(One of an occasional series of posts about strange errors you can get out of Django, and what it is trying to tell you.)
This exception was being thrown when I was trying to render a template:
dictionary update sequence element #0 has length 1; 2 is required
It turned out to be a slightly bad url()
in a urlconf. I had forgotten the name=
before the URL’s name. Oops!
10:00
So, what’s weird about this plan, from a query on a partitioned table? (PostgreSQL 9.3, in this case.)
test=> explain select distinct id from orders where order_timestamp > '2016-05-01';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=347341.56..347343.56 rows=200 width=10)
Group Key: orders.id
-> Append (cost=0.00..337096.10 rows=4098183 width=10)
-> Seq Scan on orders (cost=0.00..0.00 rows=1 width=178)
Filter: (order_timestamp > '2016-05-01 00:00:00'::timestamp without time zone)
-> Index Scan using orders_20160425_order_timestamp_idx on orders_20160425 (cost=0.43..10612.30 rows=120838 width=10)
Index Cond: (order_timestamp > '2016-05-01 00:00:00'::timestamp without time zone)
-> Seq Scan on orders_20160502 (cost=0.00..80539.89 rows=979431 width=10)
Filter: (order_timestamp > '2016-05-01 00:00:00'::timestamp without time zone)
-> Seq Scan on orders_20160509 (cost=0.00..74780.41 rows=909873 width=10)
Filter: (order_timestamp > '2016-05-01 00:00:00'::timestamp without time zone)
-> Seq Scan on orders_20160516 (cost=0.00..68982.25 rows=845620 width=10)
Filter: (order_timestamp > '2016-05-01 00:00:00'::timestamp without time zone)
-> Seq Scan on orders_20160523 (cost=0.00..65777.68 rows=796054 width=10)
Filter: (order_timestamp > '2016-05-01 00:00:00'::timestamp without time zone)
-> Seq Scan on orders_20160530 (cost=0.00..36403.57 rows=446366 width=10)
Filter: (order_timestamp > '2016-05-01 00:00:00'::timestamp without time zone)
(17 rows)
That estimate on the HashAggregate certainly looks wonky, doesn’t it? Just 200 rows even with a huge number of rows below it?
What if we cut down the number of partitions being hit?
test=> explain select distinct id from orders where order_timestamp > '2016-05-01' and order_timestamp < '2016-05-15';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=169701.02..169703.02 rows=200 width=10)
Group Key: orders.id
-> Append (cost=0.00..165026.92 rows=1869642 width=10)
-> Seq Scan on orders (cost=0.00..0.00 rows=1 width=178)
Filter: ((order_timestamp > '2016-05-01 00:00:00'::timestamp without time zone) AND (order_timestamp < '2016-05-15 00:00:00'::timestamp without time zone))
-> Index Scan using orders_20160425_order_timestamp_idx on orders_20160425 (cost=0.43..10914.39 rows=120838 width=10)
Index Cond: ((order_timestamp > '2016-05-01 00:00:00'::timestamp without time zone) AND (order_timestamp < '2016-05-15 00:00:00'::timestamp without time zone))
-> Seq Scan on orders_20160502 (cost=0.00..82988.46 rows=979431 width=10)
Filter: ((order_timestamp > '2016-05-01 00:00:00'::timestamp without time zone) AND (order_timestamp < '2016-05-15 00:00:00'::timestamp without time zone))
-> Index Scan using orders_20160509_order_timestamp_idx on orders_20160509 (cost=0.42..71124.06 rows=769372 width=10)
Index Cond: ((order_timestamp > '2016-05-01 00:00:00'::timestamp without time zone) AND (order_timestamp < '2016-05-15 00:00:00'::timestamp without time zone))
(11 rows)
Still 200 exactly. OK, that’s bizarre. Let’s select exactly one partition:
test=> explain select distinct id from orders where order_timestamp > '2016-05-14' and order_timestamp < '2016-05-15';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=14669.26..14671.26 rows=200 width=10)
Group Key: orders.id
-> Append (cost=0.00..14283.05 rows=154481 width=10)
-> Seq Scan on orders (cost=0.00..0.00 rows=1 width=178)
Filter: ((order_timestamp > '2016-05-14 00:00:00'::timestamp without time zone) AND (order_timestamp < '2016-05-15 00:00:00'::timestamp without time zone))
-> Index Scan using orders_20160509_order_timestamp_idx on orders_20160509 (cost=0.42..14283.05 rows=154480 width=10)
Index Cond: ((order_timestamp > '2016-05-14 00:00:00'::timestamp without time zone) AND (order_timestamp < '2016-05-15 00:00:00'::timestamp without time zone))
(7 rows)
Still 200 exactly. What happens if we select from the child directly?
test=> explain select distinct id from orders_20160509;
QUERY PLAN
-------------------------------------------------------------------------------------
HashAggregate (cost=74780.41..75059.51 rows=27910 width=10)
Group Key: id
-> Seq Scan on orders_20160509 (cost=0.00..72505.73 rows=909873 width=10)
(3 rows)
A much more reasonable estimate. So, what’s going on?
That 200 should be something of a flag, as that’s a compiled-in constant that PostgreSQL uses when it doesn’t have ndistinct information for a particular table, usually because there are no statistics collected on a table.
In this case, the issue was that an ANALYZE had never been done on the parent table. This isn’t surprising: Autovacuum would never hit that table, since (like most parent tables in a partition set), it has no rows and is never updated or inserted to. That lack-of-information gets passed up through the Append node, and the HashAggregate just uses the default 200.
Sure enough, when the parent table was ANALYZE’d, the estimates became much more reasonable.
So: It can pay to do an initial ANALYZE on a newly created partitioned table so that the planner gets statistics for the parent table, even if those statistics are “no rows here.”
16:53
I love Django a lot; it still surprises me how productive I can be in it. And, especially in more recent versions, the error handling for weird configuration problems is much, much better than it used to be.
But sometimes, you get an error whose origin is slightly mysterious. Thus, it can be helpful to have a log of what Django says, and what it means.
Today’s is:
Site matching query does not exist
This usually means one of two things:
django_site
table. If so, create a data migration or fixture!SITE_ID
setting in your settings.py entirely. I did this while moving a project from an old version of Django.13:50
The one thing that everyone knows about compositive indexes is: If you have an index on (A, B, C), it can’t be used for queries on (B) or (B, C) or (C), just (A), (A, B) or (A, B, C), right? I’ve said that multiple times in talks. It’s clearly true, right?
Well, no, it’s not. It’s one of those things that is not technically true, but it is still good advice.
The documentation on multi-column indexes is pretty clear:
A multicolumn B-tree index can be used with query conditions that involve any subset of the index’s columns, but the index is most efficient when there are constraints on the leading (leftmost) columns. The exact rule is that equality constraints on leading columns, plus any inequality constraints on the first column that does not have an equality constraint, will be used to limit the portion of the index that is scanned.
Let’s try this out!
First, create a table and index:
xof=# CREATE TABLE x (
xof(# i integer,
xof(# f float,
xof(# g float
xof(# );
CREATE TABLE
xof=# CREATE INDEX ON x(i, f, g);
CREATE INDEX
And fill it with some test data:
xof=# INSERT INTO x SELECT 1, random(), random() FROM generate_series(1, 10000000);
INSERT 0 10000000
xof=# INSERT INTO x SELECT 2, random(), random() FROM generate_series(1, 10000000);
INSERT 0 10000000
xof=# INSERT INTO x SELECT 3, random(), random() FROM generate_series(1, 10000000);
INSERT 0 10000000
xof=# ANALYZE x;
ANALYZE
And away we go!
xof=# EXPLAIN ANALYZE SELECT SUM(g) FROM x WHERE f BETWEEN 0.11 AND 0.12;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=599859.50..599859.51 rows=1 width=8) (actual time=91876.057..91876.057 rows=1 loops=1)
-> Index Only Scan using x_i_f_g_idx on x (cost=0.56..599097.71 rows=304716 width=8) (actual time=1820.699..91652.409 rows=300183 loops=1)
Index Cond: ((f >= '0.11'::double precision) AND (f <= '0.12'::double precision))
Heap Fetches: 300183
Planning time: 3.384 ms
Execution time: 91876.165 ms
(6 rows)
And sure enough, it uses the index, even though we didn’t include column i
in the query. In this case, the planner thinks that this will be more efficient than just doing a sequential scan on the whole table, even though it has to walk the whole index.
Is it right? Let’s turn off index scans and find out.
xof=# SET enable_indexonlyscan = 'off';
SET
xof=# EXPLAIN ANALYZE SELECT SUM(g) FROM x WHERE f BETWEEN 0.11 AND 0.12;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=599859.50..599859.51 rows=1 width=8) (actual time=39691.081..39691.081 rows=1 loops=1)
-> Index Scan using x_i_f_g_idx on x (cost=0.56..599097.71 rows=304716 width=8) (actual time=1820.676..39624.144 rows=300183 loops=1)
Index Cond: ((f >= '0.11'::double precision) AND (f <= '0.12'::double precision))
Planning time: 0.181 ms
Execution time: 39691.128 ms
(5 rows)
PostgreSQL, you’re not helping!
xof=# SET enable_indexscan = 'off';
SET
xof=# EXPLAIN ANALYZE SELECT SUM(g) FROM x WHERE f BETWEEN 0.11 AND 0.12;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=689299.60..689299.61 rows=1 width=8) (actual time=40593.427..40593.428 rows=1 loops=1)
-> Bitmap Heap Scan on x (cost=513444.70..688537.81 rows=304716 width=8) (actual time=37901.773..40542.900 rows=300183 loops=1)
Recheck Cond: ((f >= '0.11'::double precision) AND (f <= '0.12'::double precision))
Rows Removed by Index Recheck: 8269763
Heap Blocks: exact=98341 lossy=53355
-> Bitmap Index Scan on x_i_f_g_idx (cost=0.00..513368.52 rows=304716 width=0) (actual time=37860.366..37860.366 rows=300183 loops=1)
Index Cond: ((f >= '0.11'::double precision) AND (f <= '0.12'::double precision))
Planning time: 0.160 ms
Execution time: 40593.764 ms
(9 rows)
Ugh, fine!
xof=# SET enable_bitmapscan='off';
xof=# EXPLAIN ANALYZE SELECT SUM(g) FROM x WHERE f BETWEEN 0.11 AND 0.12;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Aggregate (cost=641836.33..641836.34 rows=1 width=8) (actual time=27270.666..27270.666 rows=1 loops=1)
-> Seq Scan on x (cost=0.00..641074.54 rows=304716 width=8) (actual time=0.081..27195.552 rows=300183 loops=1)
Filter: ((f >= '0.11'::double precision) AND (f <= '0.12'::double precision))
Rows Removed by Filter: 29699817
Planning time: 0.157 ms
Execution time: 27270.726 ms
(6 rows)
It turns out the seq scan is faster, which isn’t that much of a surprise. Of course, what’s really fast is using the index properly:
xof=# // reset all query planner settings
xof=# EXPLAIN ANALYZE SELECT SUM(g) FROM x WHERE i IN (1, 2, 3) AND f BETWEEN 0.11 AND 0.12;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=92459.82..92459.83 rows=1 width=8) (actual time=6283.162..6283.162 rows=1 loops=1)
-> Index Only Scan using x_i_f_g_idx on x (cost=0.56..91698.03 rows=304716 width=8) (actual time=1.295..6198.409 rows=300183 loops=1)
Index Cond: ((i = ANY ('{1,2,3}'::integer[])) AND (f >= '0.11'::double precision) AND (f <= '0.12'::double precision))
Heap Fetches: 300183
Planning time: 1.264 ms
Execution time: 6283.567 ms
(6 rows)
And, of course, a dedicated index for that particular operation is the fastest of all:
xof=# CREATE INDEX ON x(f);
CREATE INDEX
xof=# EXPLAIN ANALYZE SELECT SUM(g) FROM x WHERE f BETWEEN 0.11 AND 0.12;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=188492.00..188492.01 rows=1 width=8) (actual time=5536.940..5536.940 rows=1 loops=1)
-> Bitmap Heap Scan on x (cost=4404.99..187662.16 rows=331934 width=8) (actual time=209.854..5466.633 rows=300183 loops=1)
Recheck Cond: ((f >= '0.11'::double precision) AND (f <= '0.12'::double precision))
Rows Removed by Index Recheck: 8258716
Heap Blocks: exact=98337 lossy=53359
-> Bitmap Index Scan on x_f_idx (cost=0.00..4322.00 rows=331934 width=0) (actual time=163.402..163.402 rows=300183 loops=1)
Index Cond: ((f >= '0.11'::double precision) AND (f <= '0.12'::double precision))
Planning time: 5.586 ms
Execution time: 5537.235 ms
(9 rows)
Although, interestingly enough, PostgreSQL doesn’t quite get it right here:
xof=# SET enable_bitmapscan='off';
SET
xof=# EXPLAIN ANALYZE SELECT SUM(g) FROM x WHERE f BETWEEN 0.11 AND 0.12;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=203875.29..203875.30 rows=1 width=8) (actual time=2178.215..2178.216 rows=1 loops=1)
-> Index Scan using x_f_idx on x (cost=0.56..203045.45 rows=331934 width=8) (actual time=0.161..2110.903 rows=300183 loops=1)
Index Cond: ((f >= '0.11'::double precision) AND (f <= '0.12'::double precision))
Planning time: 0.170 ms
Execution time: 2178.279 ms
(5 rows)
So, we conclude:
And there we are.
xof=# DROP TABLE x;
DROP TABLE
20:24
On a PostgreSQL primary / secondary pair, it’s very important to monitor replication lag. Increasing replication lag is often the first sign of trouble, such as a network issue, the secondary disconnecting for some reason (or for no reason at all, which does happen rarely), disk space issues, etc.
You can find all kinds of complex scripts that do math on the various WAL positions that are available from the secondary and from pgstatreplication on the primary.
Or you can do this. It’s very cheap and cheerful, and for many installations, it gets the job done.
First, on the primary (and thus on the secondary), we create a one-column table:
CREATE TABLE replication_monitor (
last_timestamp TIMESTAMPTZ
);
Then, we insert a singel row into the table (you can probably already see where this is going):
INSERT INTO replication_monitor VALUES(now());
Having that, we can start a cron job that runs every minute, updating that value:
* * * * * /usr/bin/psql -U postgres -c "update replication_monitor set last_update=now()" postgres > /dev/null
On the secondary (which is kept in sync with the primary via NTP, so make sure ntpd is running on both!), we have a script, also run from cron, that complains if the value in has fallen more than a certain amount behind now()
. Here’s a (pretty basic) Python 2 version:
#!/usr/bin/python
import sys
import psycopg2
conn = psycopg2.connect("dbname=postgres user=postgres")
cur = conn.cursor()
cur.execute("select (now()-last_update)>'5 minutes'::interval from replication_monitor")
problem = cur.fetchone()[0]
if problem:
print >>sys.stderr, "replication lag over 5 minutes."
We make sure we get the output from stderr
for cron jobs on the secondary, set it up to run every so often, and we’re done!
Of course, this has its limitations:
It only has ±2 minutes of resolution, based on how often cron runs. For a basic “is replication working?” check, this is probably fine.
It creates traffic in the replication stream and WAL, but if you are really worried about a TIMESTAMPTZ’s worth of update once a minute, you probably have other things to worry about.
It has the advantage that it works even if the server is otherwise not taking traffic, since it creates traffic all by itself.
As a final note, check_postgres has this check integrated as one of the (many, many) checks it can do, as the replicate_row
check. If you are using check_postgres, by all means use that one!
06:51
The slides for my talk, Securing PostgreSQL at PGConf EU 2016 are now available.