postgresql when it's not your job

07:50

An Interlude: Why you do not want to use C/POSIX locale.

30 October 2024

(A quick interlude in our series on locales and character encodings in PostgreSQL.)

C/POSIX locale is only useful if you know that you will never have characters in your strings other than 7-bit ASCII, or you are 100% sure that you are never going to have mixed client encodings. If you do have non-7-bit-ASCII characters with mixed client encodings, zaniness can result:

c=# \l
                                                         List of databases
   Name    |      Owner       | Encoding | Locale Provider |   Collate   |    Ctype    |  Locale   | ICU Rules | Access privileges 
-----------+------------------+----------+-----------------+-------------+-------------+-----------+-----------+-------------------
 c         | xof              | UTF8     | libc            | POSIX       | POSIX       |           |           | 
 (17 rows [othes deleted])

c=# set client_encoding='UTF8';
SET
c=# CREATE TABLE m (v varchar);
CREATE TABLE
c=# INSERT INTO m VALUES('Í');
INSERT 0 1
c=# set client_encoding='LATIN1';
SET
c=# INSERT INTO m VALUES('é');
INSERT 0 1
c=# SELECT * FROM m;
 v  
----
 ?
 é
(2 rows)

c=# set client_encoding='UTF8';
SET
c=# SELECT * FROM m;
 v  
----
 Í
 é
(2 rows)

c=# INSERT INTO m VALUES('‰');
INSERT 0 1
c=# SELECT * FROM m;
 v  
----
 Í
 é
 ‰
(3 rows)

c=# set client_encoding='LATIN1';
SET
c=# SELECT * FROM m;
ERROR:  character with byte sequence 0xe2 0x80 0xb0 in encoding "UTF8" has no equivalent in encoding "LATIN1"

Even worse (in this regard) than C locale is SQL_ASCII locale. Please never use this locale.

Comment