postgresql when it's not your job

15 February 2023

22:47

“Database Patterns and How to Find Them” at SCaLE 2023

I’ll be speaking on Database Antipatterns and How to Find Them at SCaLE 2023, March 9-12, 2023 in Pasadena, CA.

18:30

“Extreme PostgreSQL” at PgDay/MED

I’m very happy that I’ll be presenting “Extreme PostgreSQL” at PgDay/MED in Malta (yay, Malta!) on 13 April 2023.

8 February 2023

16:52

Xtreme PostgreSQL!

The slides from my talk at the February 2023 SFPUG Meeting are now available.

30 January 2023

00:00

Nordic PgDay 2023

I’m very pleased to be talking about real-life logical replication at Nordic PgDay 2023, in beautiful Stockholm.

18 January 2023

00:00

A foreign key pathology to avoid

There’s a particular anti-pattern in database design that PostgreSQL handles… not very well.

For example, let’s say you are building something like Twitch. (The real Twitch doesn’t work this way! At least, not as far as I know!) So, you have streams, and you have users, and users watch streams. So, let’s do a schema!

CREATE TABLE stream (stream_id bigint PRIMARY KEY);

CREATE TABLE "user" (user_id bigint PRIMARY KEY);

CREATE TABLE stream_viewer (    
    stream_id bigint REFERENCES stream(stream_id),    
    user_id bigint REFERENCES "user"(user_id),   
    PRIMARY KEY (stream_id, user_id));

OK, schema complete! Not bad for a day’s work. (Note the double quotes around "user". USER is a keyword in PostgreSQL, so we have to put it in double quotes to use as a table name. This is not great practice, but more about double quotes some other time.)

Let’s say we persuade a very popular streamer over to our platform. They go on-line, and all 1,252,136 of our users simultaneously log on and start following that stream.

So, we now have to insert 1,252,136 new records into stream_viewer. That’s pretty bad. But what’s worse is now we have 1,252,136 records with a foreign key relationship to a single record in stream. During the operation of the INSERT statement, the transaction that is doing the INSERT will take a FOR KEY SHARE lock on that record. This means that at any one moment, several thousand different transactions will have a FOR KEY SHARE lock on that record.

This is very bad.

If more than one transaction at a time has a lock on a single record, the MultiXact system handles this. MultiXact puts a special transaction ID in the record that’s locked, and then builds an external data structure that holds all of the transaction IDs that have locked the record. This works great… up to a certain size. But that data structure is of fixed size, and when it fills up, it spills onto secondary storage.

As you might imagine, that’s slow. You can see this with lots of sessions suddenly waiting on various MultiXact* lightweight locks.

You can get around this in a few ways:

Not many systems have this particular design issue. (You would never actually build a streaming site using that schema, just to start.) But if you do, this particular behavior is a good thing to avoid.

16 January 2023

00:00

OK, sometimes you can lock tables.

Previously, I wrote that you should never lock tables. And you usually shouldn’t! But sometimes, there’s a good reason to. Here’s one.

When you are doing a schema-modifying operation, like adding a column to a table, PostgreSQL needs to take an ACCESS EXCLUSIVE lock on the table while it is modifying the system catalogs. Unless it needs to rewrite the table, this lock isn’t held for very long.

However, locks in PostgreSQL are first-come, first-served. If the system is busy, there may be conflicting locks on the table that you are attempting to modify. (Even just a SELECT statement takes lock on the tables it is operating on; it just doesn’t conflict with much.) If the ALTER TABLE statement can’t get the lock right away, it enters a queue, waiting to get to the front and get the lock.

However, now, every lock after that enters the queue, too, behind that ALTER TABLE. This can create the result of a long-running ACCESS EXCLUSIVE lock, even though it’s not granted. On a busy table on a busy system, this can shut things down.

So, what to do?

You can do this:

DO $$
   BEGIN
   FOR i IN 1 .. 1000 LOOP
      BEGIN
         LOCK TABLE t NOWAIT;
         ALTER TABLE t ADD COLUMN i BIGINT;
         RETURN;
      EXCEPTION WHEN lock_not_available THEN
         PERFORM pg_sleep(1);
         CONTINUE;
      END;
   END LOOP;
   RAISE lock_not_available;
   END;
$$;

This loops until it can acquire the lock, but doesn’t sit in the queue if it can’t. Once it acquires the lock, it does the modification and exits. If it can’t acquire the lock after a certain number of cycles, it exits with an error (you can set the number of cycles to anything, and you can adjust time it sleeps after failing to get the lock).

15 January 2023

00:00

How slow is DECIMAL, anyway?

In PostgreSQL, NUMERIC is a variable length type of fixed precision. You can have as many digits as you want (and you want to pay the storage for). DOUBLE PRECISION is a floating point type, with variable precision.

Sometimes, the question comes up: How much slower is NUMERIC than DOUBLE PRECISION, anyway?

Here’s a quick, highly unscientific benchmark:

Doing a simple test (100 million rows), a straight SUM() across a NUMERIC was about 2.2x slower than OUBLE PRECISION. It went up to about 4x slower if there was a simple calculation, SUM(n*12). It was about 5x slower if the calculation involved the same type, SUM(n*n). Of course, these are just on my laptop, but I would expect that the ratios would remain constant on other machines.

Inserting the 100 million rows took 72.2 seconds for DOUBLE PRECISION, 146.2 seconds for NUMERIC. The resulting table size was 3.5GB for DOUBLE PRECISION, 4.2GB for NUMERIC.

So, yes, NUMERIC is slower. But it’s not absurdly slower. NUMERIC is much slower than bigint (exercise left to the reader), so using NUMERIC for things like primary keys is definitely not a good idea.

15 November 2019

10:00

Why submit a paper to PgDay 2020?

If you have something interesting to day about PostgreSQL, we [would love to get a proposal from you]. Even if you have never spoken before, consider responding to the CfP! PgDay 2020 is particularly friendly to first-time and inexperienced speakers. You’re among friends! If you use PostgreSQL, you almost certainly have opinions and experiences that others would love to hear about… go for it!

13 November 2019

13:26

PgDaySF 2020!

The very first PgDay San Francisco is coming to the Swedish-American Hall on January 21, 2020. It’s going to be an amazing event.

If you have something to say about PostgreSQL…

… the Call for Proposals is now open through November 22, 2019. We are looking for 40 minute talks about anything related to PostgreSQL. First-time speakers are particularly encouraged to send in proposals.

If you are interested in or use PostgreSQL…

Early-Bird Tickets are now available! Attendance is limited, so be sure to get your seat now.

If your company uses PostgreSQL…

consider sponsoring the event! We can’t do it without our sponsors, and it is a great way to recruit PostgreSQL people. Show off your company to the PostgreSQL community!

20 March 2019

06:34

“Look It Up: Practical PostgreSQL Indexing” at Nordic PGDay 2019

The slides from my presentation at PGDay Nordic 2019 are now available.

« Older Entries

Newer Entries »