All Your GUCs in a Row: client_encoding

client_encoding declares what character encoding the client speaks. The server uses it to convert between its own internal encoding (set at initdb time, per-database, often UTF-8) and whatever the client is sending and expecting. Default is “the server’s encoding” — that is, no conversion. Context is user, with PGCLIENTENCODING as the environment variable that libpq honors automatically, mirroring application_name’s pattern.

In a UTF-8 world the parameter rarely surfaces. Server is UTF-8, client is UTF-8, no conversion happens, life is good. But “rarely surfaces” is not the same as “doesn’t exist,” and the cases where it does surface are the ones worth understanding.

What conversion actually means

When client_encoding differs from the server encoding, every character of every value passing between client and server goes through a conversion table. UTF-8 in, LATIN1 out, or vice versa. The conversion is honest about its limits: if the source contains a character the target encoding cannot represent, the operation fails with an error like:

1 ERROR: character with byte sequence 0xe1 0x83 0xa5 in encoding "UTF8"
2        has no equivalent in encoding "LATIN1"

This is the right behavior. The wrong behavior — silent substitution or data loss — is what you get from systems that try too hard. PostgreSQL would rather refuse than corrupt.

The pathologies

Three failure modes account for most of the production trouble:

1. Mismatched defaults during connection setup. A client library decides on a client_encoding based on locale environment variables, the connection-string default, or a hard-coded constant. If the resulting encoding can’t represent some character in the database, you get the error above on the row that contains it — usually a row a user inserted six months ago in a different locale, and which now causes a Cyrillic-name attachment to crash an application. This is a real category of bug, not a hypothetical.

2. SQL_ASCII server encoding. PostgreSQL clusters initialized without explicit locale settings — common on hand-rolled installs and some older distribution packages — end up with SQL_ASCII as the server encoding. SQL_ASCII is a special case: it disables encoding validation entirely and stores bytes as-is. The server cannot convert to anything because it doesn’t actually know what is in the database; it’s bytes all the way down. Setting client_encoding = 'UTF8' on an SQL_ASCII server tells PostgreSQL “trust me, the bytes are UTF-8,” which is true exactly as often as the operator believes it is. Years-old databases full of mixed Latin-1 and UTF-8 content in SQL_ASCII columns are a real and frustrating archaeological problem.

3. Restore-time encoding confusion. A dump file produced by a LATIN1 client of a UTF8 server contains data that was converted at dump time. Restoring it to a UTF8 server with a UTF8 client works. Restoring it to a LATIN1 server with a UTF8 client doesn’t — the data was already in UTF-8 in the dump, and the server can’t store it. PGCLIENTENCODING=UTF8 pg_restore ... and explicit SET client_encoding statements at the top of dumps exist for this reason.

Setting it

Three scopes, in order of preference:

Connection string or environment variable. Set PGCLIENTENCODING in the application’s environment, or pass client_encoding=UTF8 in the connection URL. This is the right place — the encoding is a property of the client, not the database.
Per-role. ALTER ROLE legacy_app SET client_encoding = 'LATIN1'; if one application genuinely speaks a different encoding from the rest.
Per-session. SET client_encoding = 'UTF8'; for one-off conversions or debugging.

Avoid setting it in postgresql.conf. It’s a per-client concern; baking it into server config makes the next client’s connection silently inherit a setting that has nothing to do with that client.

What modern clients actually do

Most current client libraries — psycopg, JDBC, ActiveRecord’s pg gem, node-postgres — set client_encoding to UTF-8 by default or negotiate it from the application’s locale. If your server is UTF-8 and your clients are reasonably modern, this parameter is invisible to you. The cases where it isn’t invisible are: legacy applications still emitting LATIN1, SQL_ASCII servers carrying the sins of past installations, and migration scenarios involving dumps that crossed encoding boundaries.

Recommendation: Leave it unset and let libpq negotiate. If you have an SQL_ASCII server, you have a bigger problem than this GUC; the long-term fix is a dump-and-restore into a UTF-8 cluster, which the docs cover at length. If you genuinely need a non-UTF-8 client encoding, set it per-role or in the connection string, document the reason, and put a calendar reminder to revisit it in a year — same pattern as bytea_output.

1	ERROR: character with byte sequence 0xe1 0x83 0xa5 in encoding "UTF8"
2	has no equivalent in encoding "LATIN1"

What conversion actually means

The pathologies

Setting it

What modern clients actually do

Related