postgresql when it's not your job

18 January 2023

00:00

A foreign key pathology to avoid

There’s a particular anti-pattern in database design that PostgreSQL handles… not very well.

For example, let’s say you are building something like Twitch. (The real Twitch doesn’t work this way! At least, not as far as I know!) So, you have streams, and you have users, and users watch streams. So, let’s do a schema!

CREATE TABLE stream (stream_id bigint PRIMARY KEY);

CREATE TABLE "user" (user_id bigint PRIMARY KEY);

CREATE TABLE stream_viewer ( stream_id bigint REFERENCES stream(stream_id), user_id bigint REFERENCES "user"(user_id), PRIMARY KEY (stream_id, user_id));

OK, schema complete! Not bad for a day’s work. (Note the double quotes around "user". USER is a keyword in PostgreSQL, so we have to put it in double quotes to use as a table name. This is not great practice, but more about double quotes some other time.)

Let’s say we persuade a very popular streamer over to our platform. They go on-line, and all 1,252,136 of our users simultaneously log on and start following that stream.

So, we now have to insert 1,252,136 new records into stream_viewer. That’s pretty bad. But what’s worse is now we have 1,252,136 records with a foreign key relationship to a single record in stream. During the operation of the INSERT statement, the transaction that is doing the INSERT will take a FOR KEY SHARE lock on that record. This means that at any one moment, several thousand different transactions will have a FOR KEY SHARE lock on that record.

This is very bad.

If more than one transaction at a time has a lock on a single record, the MultiXact system handles this. MultiXact puts a special transaction ID in the record that’s locked, and then builds an external data structure that holds all of the transaction IDs that have locked the record. This works great… up to a certain size. But that data structure is of fixed size, and when it fills up, it spills onto secondary storage.

As you might imagine, that’s slow. You can see this with lots of sessions suddenly waiting on various MultiXact* lightweight locks.

You can get around this in a few ways:

Not many systems have this particular design issue. (You would never actually build a streaming site using that schema, just to start.) But if you do, this particular behavior is a good thing to avoid.

16 January 2023

00:00

OK, sometimes you can lock tables.

Previously, I wrote that you should never lock tables. And you usually shouldn’t! But sometimes, there’s a good reason to. Here’s one.

When you are doing a schema-modifying operation, like adding a column to a table, PostgreSQL needs to take an ACCESS EXCLUSIVE lock on the table while it is modifying the system catalogs. Unless it needs to rewrite the table, this lock isn’t held for very long.

However, locks in PostgreSQL are first-come, first-served. If the system is busy, there may be conflicting locks on the table that you are attempting to modify. (Even just a SELECT statement takes lock on the tables it is operating on; it just doesn’t conflict with much.) If the ALTER TABLE statement can’t get the lock right away, it enters a queue, waiting to get to the front and get the lock.

However, now, every lock after that enters the queue, too, behind that ALTER TABLE. This can create the result of a long-running ACCESS EXCLUSIVE lock, even though it’s not granted. On a busy table on a busy system, this can shut things down.

So, what to do?

You can do this:

DO $$ BEGIN FOR i IN 1 .. 1000 LOOP BEGIN LOCK TABLE t NOWAIT; ALTER TABLE t ADD COLUMN i BIGINT; RETURN; EXCEPTION WHEN lock_not_available THEN PERFORM pg_sleep(1); CONTINUE; END; END LOOP; RAISE lock_not_available; END; $$;

This loops until it can acquire the lock, but doesn’t sit in the queue if it can’t. Once it acquires the lock, it does the modification and exits. If it can’t acquire the lock after a certain number of cycles, it exits with an error (you can set the number of cycles to anything, and you can adjust time it sleeps after failing to get the lock).

15 January 2023

00:00

How slow is DECIMAL, anyway?

In PostgreSQL, NUMERIC is a variable length type of fixed precision. You can have as many digits as you want (and you want to pay the storage for). DOUBLE PRECISION is a floating point type, with variable precision.

Sometimes, the question comes up: How much slower is NUMERIC than DOUBLE PRECISION, anyway?

Here’s a quick, highly unscientific benchmark:

Doing a simple test (100 million rows), a straight SUM() across a NUMERIC was about 2.2x slower than OUBLE PRECISION. It went up to about 4x slower if there was a simple calculation, SUM(n*12). It was about 5x slower if the calculation involved the same type, SUM(n*n). Of course, these are just on my laptop, but I would expect that the ratios would remain constant on other machines.

Inserting the 100 million rows took 72.2 seconds for DOUBLE PRECISION, 146.2 seconds for NUMERIC. The resulting table size was 3.5GB for DOUBLE PRECISION, 4.2GB for NUMERIC.

So, yes, NUMERIC is slower. But it’s not absurdly slower. NUMERIC is much slower than bigint (exercise left to the reader), so using NUMERIC for things like primary keys is definitely not a good idea.

15 November 2019

10:00

Why submit a paper to PgDay 2020?

If you have something interesting to day about PostgreSQL, we [would love to get a proposal from you]. Even if you have never spoken before, consider responding to the CfP! PgDay 2020 is particularly friendly to first-time and inexperienced speakers. You’re among friends! If you use PostgreSQL, you almost certainly have opinions and experiences that others would love to hear about… go for it!

13 November 2019

13:26

PgDaySF 2020!

The very first PgDay San Francisco is coming to the Swedish-American Hall on January 21, 2020. It’s going to be an amazing event.

If you have something to say about PostgreSQL…

… the Call for Proposals is now open through November 22, 2019. We are looking for 40 minute talks about anything related to PostgreSQL. First-time speakers are particularly encouraged to send in proposals.

If you are interested in or use PostgreSQL…

Early-Bird Tickets are now available! Attendance is limited, so be sure to get your seat now.

If your company uses PostgreSQL…

consider sponsoring the event! We can’t do it without our sponsors, and it is a great way to recruit PostgreSQL people. Show off your company to the PostgreSQL community!

20 March 2019

06:34

“Look It Up: Practical PostgreSQL Indexing” at Nordic PGDay 2019

The slides from my presentation at PGDay Nordic 2019 are now available.

11 February 2019

15:44

What’s up with SET TRANSACTION SNAPSHOT?

A feature of PostgreSQL that most people don’t even know exists is the ability to export and import transaction snapshots.

The documentation is accurate, but it doesn’t really describe why one might want to do such a thing.

First, what is a “snapshot”? You can think of a snapshot as the current set of committed tuples in the database, a consistent view of the database. When you start a transaction and set it to REPEATABLE READ mode, the snapshot remains consistent throughout the transaction, even if other sessions commit transactions. (In the default transaction mode, READ COMMITTED, each statement starts a new snapshot, so newly committed work could appear between statements within the transaction.)

However, each snapshot is local to a single transaction. But suppose you wanted to write a tool that connected to the database in multiple sessions, and did analysis or extraction? Since each session has its own transaction, and the transactions start asynchronously from each other, they could have different views of the database depending on what other transactions got committed. This might generate inconsistent or invalid results.

This isn’t theoretical: Suppose you are writing a tool like pg_dump, with a parallel dump facility. If different sessions got different views of the database, the resulting dump would be inconsistent, which would make it useless as a backup tool!

The good news is that we have the ability to “synchronize” various sessions so that they all use the same base snapshot.

First, a transaction opens and sets itself to REPEATABLE READ or SERIALIZABLE mode (there’s no point in doing exported snapshots in READ COMMITTED mode, since the snapshot will get replaced at the very next statement). Then, that session calls pg_export_snapshot. This creates an identifier for the current transaction snapshot.

Then, the client running the first session passes that identifier to the clients that will be using it. You’ll need to do this via some non-database channel. For example, you can’t use LISTEN / NOTIFY, since the message isn’t actually sent until COMMIT time.

Each client that receives the snapshot ID can then do SET TRANSACTION SNAPSHOT ... to use the snapshot. The client needs to call this before it does any work in the session (even SELECT). Now, each of the clients has the same view into the database, and that view will remain until it COMMITs or ABORTs.

Note that each transaction is still fully autonomous; the various sessions are not “inside” the same transaction. They can’t see each other’s work, and if two different clients modify the database, those modifications are not visible to any other session, including the ones that are sharing the snapshot. You can think of the snapshot as the “base” view of the database, but each session can modify it (subject, of course, to the usual rules involved in modifying the same tuples, or getting serialization failures).

This is a pretty specialized use-case, of course; not many applications need to have multiple sessions with a consistent view of the database. But if you do, PostgreSQL has the facilities to do it!

8 February 2019

18:52

Do not change autovacuum age settings

PostgreSQL has two autovacuum-age related settings, autovacuum_freeze_max_age, and vacuum_freeze_table_age.

Both of them are in terms of the transaction “age” of a table: That is, how long it has been since the table has been scanned completely for “old” tuples that can be marked as “frozen” (a “frozen” tuple is one that no open transaction can cause to disappear by a rollback). In short, the “oldest” a table can become in PostgreSQL is 2^31-1 transactions; if a table were ever to reach that, data loss would occur. PostgreSQL takes great pains to prevent you from eaching that point.

The “vacuum freeze” process is the process that scans the table and marks these tuples as frozen.

vacuum_freeze_table_age causes a regular autovacuum run to be an “autovacuum (to prevent xid wraparound)” run, that is, an (auto)vacuum freeze, if the age of the table is higher than vacuum_freeze_table_age.

autovacuum_freeze_max_age will cause PostgreSQL to start an “autovacuum (to prevent xid wraparound)” run even if it has no other reason to vacuum the table, should a table age exceed that setting.

By default, vacuum_freeze_table_age = 100000000 (one hundred million), and autovacuum_freeze_max_age = 200000000 (two hundred million).

Do not change them.

In the past, I made a recommendation I now deeply regret. Because, before 9.6, each autovacuum freeze run scanned the entire table, and (on its first pass) potentially rewrote the entire table, it could be very high I/O, and when it woke up suddenly, it could cause performance issues. I thus recommended two things:

  1. Increase autovacuum_freeze_max_age and vacuum_freeze_table_age, and,
  2. Do manual VACUUM FREEZE operations on the “oldest” tables during low-traffic periods.

Unfortunately, far too many installations adopted recommendation #1, but didn’t do #2. The result was that they cranked up autovacuum_freeze_max_age so high that by the time the mandatory autovacuum freeze operation began, they were so close to transaction XID wraparound point, they had no choice but to take the system offline and do the operation in single-user mode.

Thus, I am forever rescinding that advice. Especially now that 9.6 is incremental, the I/O penalty of an autovacuum freeze is greatly reduced, and the consequences of not doing it are severe.

Don’t increase those parameters. Let autovacuum freeze do its job. If you want to stay ahead of it, we have a script to do opportunistic freezing that might be helpful.

4 February 2019

14:00

“Breaking PostgreSQL at Scale” at FOSDEM 2019

The slides for my talk, “Breaking PostgreSQL at Scale” at FOSDEM 2019 are available.

26 September 2018

11:06

Find the value of all database sequences

Upon occasion, you want to get the current value of all the sequences in the database. For example, you may have fields that are integer rather than bigint, and you’re concerned how close you are to overflowing one of them (since sequences are bigint and will happily crash through the size of a 32-bit integer).

Unfortunately, currval() requires that you have actually accessed the sequence value using nextval() in the current session… but you don’t want to increment the value, just test it!

Here’s a cheap and cheerful PL/pgSQL function that returns all of the current sequence values:

CREATE OR REPLACE FUNCTION sequence_values() RETURNS TABLE(name text, value bigint) AS $sequence_values$
DECLARE
   nsp_name TEXT;
   seq_name TEXT;
BEGIN
   FOR nsp_name, seq_name IN
       SELECT nspname::text, relname::text
          FROM pg_class 
          JOIN pg_namespace
          ON pg_class.relnamespace = pg_namespace.oid WHERE relkind='S'
   LOOP
       RETURN QUERY EXECUTE 'SELECT ''' || nsp_name || '.' || seq_name || '''::text, last_value FROM "' || nsp_name || '"."' || seq_name || '"';
   END LOOP;
END;
$sequence_values$
LANGUAGE plpgsql;

« Older Entries