All Your GUCs in a Row: data_directory

data_directory names the location of the cluster’s data — the directory people mean when they write $PGDATA, the one holding base/, pg_wal/, global/, and the rest. Context is postmaster: settable in postgresql.conf or on the command line, never at runtime. And like a small handful of others, it cannot be set with ALTER SYSTEM — for the obvious reason that ALTER SYSTEM writes to postgresql.auto.conf, which lives in the data directory, which is the thing you’d be trying to locate. You can’t use a file inside the box to tell PostgreSQL where the box is.

That circularity is the whole story of this parameter, so let’s pull on it.

The bootstrapping puzzle

If data_directory can be set in postgresql.conf, and postgresql.conf normally lives in the data directory, how does PostgreSQL find the config file in order to read where the data directory is? The answer is that in the ordinary case it doesn’t need to. postgres is started with -D (or the PGDATA environment variable) pointing at the data directory, the config files are found sitting inside it, and data_directory is never set at all — the -D value is the data directory, full stop. The parameter exists for the other case.

That other case is keeping configuration somewhere other than the data. Then the roles split: -D (or PGDATA) points at the directory holding the config files, PostgreSQL reads postgresql.conf from there, and data_directory inside that file tells it where the actual data lives. The bootstrap resolves because -D always identifies where the config is, and data_directory, once read, overrides -D for where the data is — but not for where the config is. Those are two separate questions and PostgreSQL answers them from two separate places.

This is exactly how Debian and Ubuntu lay things out. Their packaging puts postgresql.conf under /etc/postgresql/<version>/<cluster>/ and the data under /var/lib/postgresql/<version>/<cluster>/, with data_directory in the config file wiring the two together. Config belongs to the sysadmin’s world under /etc; data belongs under /var/lib. Red Hat’s packaging, by contrast, keeps the traditional all-in-one layout with config inside the data directory. Neither is wrong; they’re different conventions, and data_directory is the hinge that makes the split-out one possible.

The companion parameters config_file, hba_file, and ident_file finish the job. config_file can only be given on the command line — same circularity, it’s the file that would have to contain its own location — while hba_file and ident_file can live inside postgresql.conf. Set all of those plus data_directory explicitly and you needn’t pass -D or PGDATA at all, though almost nobody assembles a cluster that way by hand.

What to do with it

For the overwhelming majority of installations: nothing. If you installed from a distribution package, the packaging already set data_directory (or didn’t, on Red Hat-style layouts) to match its filesystem conventions, and the init script or systemd unit passes the right -D. Changing it means moving the data directory, which is a stop-the-server, move-the-files, fix-the-unit-file operation — and the parameter is the last and easiest part of it, not the hard part.

If you’re building a cluster by hand and want config under /etc in the Debian style, this is the parameter that does it: point -D at the config directory, set data_directory to the data location, done. Otherwise, leave it to the packaging, and when you need to know where your data actually is, SHOW data_directory will tell you without your having to reason about which of -D, PGDATA, and the config file won.

The bootstrapping puzzle

What to do with it

Related