Managed Postgres, Examined: Amazon RDS for PostgreSQL

A photorealistic image of a blue elephant wearing a tabord with the large letters "RDS" on the side.

First in a series of dispassionate surveys of the major managed-Postgres offerings. This post is about Amazon RDS for PostgreSQL — what AWS calls “traditional RDS,” as distinct from Aurora PostgreSQL, which is a separate product with a separate architecture and will get its own post.

Overview

Amazon RDS for PostgreSQL is the oldest and most widely deployed managed-Postgres product in the market. The architecture is straightforward: AWS runs community PostgreSQL, unmodified, on an EC2 instance that you don’t have shell access to, backed by EBS storage, with a control plane that handles provisioning, patching, backups, monitoring, and failover. It’s a managed wrapper around the upstream database, not a fork.

Two things flow from that. First, community Postgres features work the way the documentation says they work; this is not the case for every managed offering. Second, the operational envelope is defined by the constraints AWS imposes on the wrapper — no shell access, no filesystem access, no true superuser, a curated list of available extensions, parameter changes via parameter groups rather than postgresql.conf edits, and backups/HA built around AWS’s infrastructure rather than Postgres’s own facilities.

flowchart TB subgraph AZ1["Availability Zone A"] P["PostgreSQL (primary) on EC2 instance"] P --> EBS1["EBS volume (gp3 / io1 / io2 Block Express)"] end subgraph AZ2["Availability Zone B"] S["PostgreSQL (standby) on EC2 instance"] S --> EBS2["EBS volume (mirror)"] end EBS1 <-. "synchronous EBS-layer block replication" .-> EBS2 subgraph CP["AWS control plane"] CW["CloudWatch / Enhanced Monitoring"] PI["Performance Insights"] BK["Automated backups → S3"] end P --> CW P --> PI P --> BK

What RDS actually is, architecturally

The database process

Each RDS for PostgreSQL instance is one EC2 VM running one postgres process tree, exactly as you’d expect on any other Linux host. You don’t see the VM. You don’t see the host. You interact with it via the Postgres wire protocol on a hostname AWS gives you, and via the AWS API for anything operational.

AWS does not fork PostgreSQL. The binary on an RDS instance is upstream Postgres with a small set of AWS-specific extensions (rds_tools, IAM integration glue, the rds_superuser role, a couple of internal modules), plus whatever curated extensions AWS has bundled. Query behavior, planner behavior, on-disk format — all identical to community Postgres of the same version.

Storage

RDS storage is Amazon EBS. You pick a volume type:

gp3 — general-purpose SSD, baseline 3,000 IOPS and 125 MB/s, provisionable higher. Good default.
io1 / io2 — provisioned IOPS SSD, up to 64,000 IOPS per volume. Older.
io2 Block Express — up to 256,000 IOPS and 16,000 MB/s per volume, on supported instance families (r5b, r6i, r7i, x2idn, and a few others).

Storage can be resized up (online) but not down. Storage autoscaling is available but has a lower bound on re-scale frequency (six hours have passed since the last storage modification, or storage optimization has completed on the instance, whichever is longer). There is no XFS, no ZFS, no filesystem-level tuning, no direct_io toggling. The filesystem underneath is AWS’s problem.

High availability: the part most people get wrong

Amazon RDS Multi-AZ for PostgreSQL is not PostgreSQL streaming replication. It is synchronous block-level replication at the storage layer, implemented by a replication layer that sits between the database process and EBS. Writes go to the primary’s EBS volume and to the standby’s EBS volume synchronously; commit does not acknowledge until both have durably written. The AWS Database Blog’s “Amazon RDS Under the Hood: Multi-AZ” post is the clearest primary source on this.

This has three consequences worth understanding:

Commit latency is worse than single-AZ. Two EBS writes, cross-AZ, per commit. AWS documents a typical commit overhead in the 2–5 ms range; in practice I’ve measured more on busier instances, but the shape of the overhead is what it is.
The standby is not readable. There is no running Postgres serving queries on the standby; there is a hot EBS volume ready to be attached to a Postgres process during failover. The standby’s instance exists but is not accepting connections.
Failover is fast but not instant. AWS publishes “typically 60–120 seconds” for Multi-AZ failover. During failover, the DNS endpoint cuts over; existing connections are severed and must reconnect.

The alternate HA model, Multi-AZ DB Cluster, is a different architecture. It uses PostgreSQL’s own streaming replication, semisynchronously — commit requires ack from at least one of two readable standbys, across three AZs. The two standbys are readable. Failover is reportedly faster than the single-standby model. This is a newer option, available on a narrower set of instance classes, and is PostgreSQL-native in the way the single-standby option is not.

You pick between the two at cluster creation. The trade-off is roughly: single-standby Multi-AZ is cheaper, older, more widely supported, readable-standby-unavailable, and uses block replication; Multi-AZ DB Cluster is more expensive, newer, limited in instance options, readable-standby-available, and uses streaming replication.

Read replicas

Separate from HA. RDS read replicas use PostgreSQL streaming replication (physical, asynchronous). You can create up to 15 read replicas per primary. Cross-region replicas are supported. Replicas can be promoted to standalone primaries. You can create read replicas of read replicas (chained replication).

Read replica lag is the usual async-replication lag; there’s no sync option on the traditional single-standby architecture. On Multi-AZ DB Cluster, the two reader instances are the replication targets, and they are semisynchronous — so “read replica” in that context has tighter consistency semantics than it does in the older model.

The `rds_superuser` role

You do not get PostgreSQL SUPERUSER. The master user account created when you provision the instance is a member of rds_superuser, which is a role AWS maintains that has most of what you want — CREATE DATABASE, CREATE ROLE, most CREATE EXTENSION calls, the ability to terminate other sessions — but a hard set of things it cannot do:

CREATE EXTENSION on extensions not in the approved list.
Access the filesystem (COPY FROM / TO PROGRAM, pg_read_server_files, lo_import/export on server-side paths).
Change shared_preload_libraries except via parameter groups.
Alter the postgres user itself.
Take a physical base backup via pg_basebackup against the instance.
Create untrusted PL (plpython3u, for instance, is not available on RDS; plperlu is not; plv8 is not).
Modify system catalogs directly.

Anyone who has run a DBA playbook from a pre-managed-service decade will hit several of these in their first week. Most of them have workarounds. A few of them don’t.

Features that matter

Extensions

The current supported set includes the extensions you’d expect from a serious platform: pgvector, postgis, pg_cron, pg_partman, pg_repack, pg_stat_statements, pg_trgm, hstore, pgaudit, pglogical, postgres_fdw, tds_fdw, oracle_fdw, plpgsql, plperl, pltcl, and most of the trusted-extension set. The authoritative list lives in the PostgreSQL extensions supported on Amazon RDS release notes and in your instance’s default parameter group.

Two gotchas worth knowing:

Extension versions lag upstream. RDS qualifies extension versions against RDS-supported PostgreSQL versions and ships them on its own schedule. If you need the newest pgvector release the day it comes out, RDS is not where you get it. The lag is typically weeks to months.
rds.allowed_extensions is a parameter you can set to restrict which of the supported extensions can actually be installed. Useful for compliance postures; easy to misconfigure if you don’t know it exists.

Unsupported extensions are unsupported. plpython3u is not on the list, for example. Extensions that require shared library code AWS has not qualified are not on the list. If an extension you need is not available, you are not on RDS.

Backups and point-in-time recovery

Automated backups are daily snapshots to S3 plus continuous WAL archiving, also to S3. Point-in-time recovery restores to any second within your retention window (1–35 days). Restore creates a new instance; there is no in-place restore.

Manual snapshots are also snapshots to S3 and are kept until you delete them. Cross-region snapshot copy is supported. Snapshot-to-S3 export (exporting snapshot data to Parquet in S3 for analytics queries via Athena) is a useful feature most people don’t know exists.

Backup restore speed is roughly proportional to database size; AWS doesn’t publish a hard number and the actual time varies with EBS throughput provisioning on the restored volume.

Major version upgrades

Two models. The older one is in-place upgrade: AWS runs pg_upgrade on the instance with downtime. The newer one is Blue/Green Deployments: AWS builds a new green cluster at the target version, replicates to it via logical replication, and lets you cut over with a switchover when ready. Blue/Green is the right default for anything production; the downtime is the switchover window rather than the full upgrade window. Logical replication caveats still apply (sequences, tables without PKs, DDL during replication).

Performance Insights and Enhanced Monitoring

Performance Insights is AWS’s query-level performance monitoring, built on pg_stat_activity sampling. The free tier retains seven days of data; longer retention is a paid add-on. It surfaces wait events, top queries, and load over time. It is genuinely useful and not a trivial marketing feature.

Enhanced Monitoring gives you OS-level metrics (CPU, memory, disk, network) at configurable granularity down to 1 second. Regular CloudWatch metrics are at 1-minute granularity by default. For a database the 1-second data is often what you want during an incident.

Networking, security, and auth

RDS is VPC-only. You control network access via security groups and NACLs. Encryption at rest is KMS-backed and on by default for most recent configurations; encryption in transit is TLS with a rotating AWS-signed cert chain (rds-ca-rsa2048, most recently; rotation is an operational event — clients need the updated CA bundle). Authentication supports the standard Postgres SCRAM-SHA-256/MD5 flow plus IAM authentication (ephemeral tokens via the AWS SDK) and Kerberos via AWS Managed Microsoft AD.

RDS Proxy is a separate connection-pooler service that sits in front of RDS and multiplexes connections; it is pgbouncer-shaped but is AWS’s own code and has its own quirks (cursor handling, session pinning triggers, transaction-mode constraints). Worth using on Lambda-heavy workloads; worth reading the docs carefully about what it doesn’t handle the way pgbouncer does.

The non-brochure concerns

These are the things that matter in practice that you will not learn from the AWS landing page.

Block-level replication means your Multi-AZ standby cannot be queried. This is a recurring surprise for people coming from any Postgres-native HA setup — Patroni, repmgr, pg_auto_failover — where the standby is a running Postgres you can point read traffic at. On single-standby Multi-AZ RDS, it is not. If you want readable replicas, you provision them separately as read replicas, which are async, or you use Multi-AZ DB Cluster.

Parameter groups are immutable copies, not editable configs. You don’t edit postgresql.conf. You create a parameter group, associate it with your instance, and change values through the API. Some parameters are “dynamic” (apply immediately), some are “pending-reboot” (require an instance reboot), and some are effectively immovable once set. Parameter group changes do not apply to existing instances automatically — you have to associate the new group and reboot. The documentation marks which parameters are dynamic vs static; the UI does not always make this obvious.

Storage is monotonic. You can scale storage up; you cannot scale storage down. If you over-provisioned storage and want to reduce the bill, you restore a snapshot to a smaller instance and cut over.

Logical replication is supported but has setup constraints. rds.logical_replication = 1 requires a reboot. Subscriptions from RDS to an external downstream generally work; subscriptions into RDS have historically had more issues with things like CREATE SUBSCRIPTION permissions and missing ALTER SUBSCRIPTION ... DISABLE guardrails. Row-level filtering on publications (PG15+) does work on RDS.

pg_dump/pg_restore work; pg_basebackup does not. You cannot take a physical base backup of an RDS instance from the outside. If you need a physical copy (for forensics, for migrating to self-hosted, for a detailed I/O analysis), your options are snapshot restore to a new instance, or logical dump. This is a design choice, not a bug.

COPY FROM and COPY TO on the server side are constrained. Client-side COPY via \copy or COPY FROM STDIN works. Server-side COPY FROM '/path/to/file' doesn’t, because there’s no filesystem you can write files to. S3 integration via aws_s3.table_import_from_s3 and aws_s3.query_export_to_s3 extensions is the replacement; it works but it is not the same interface.

Autovacuum tuning matters more than usual. AWS defaults are fine for small instances and wrong for big ones; autovacuum_max_workers, autovacuum_vacuum_cost_limit, and the scale-factor thresholds all need scrutiny on any instance with a heavy write workload. The RDS default behavior around autovacuum is conservative in ways that produce bloat on busy tables. This is not specific to RDS — it’s specific to community Postgres defaults — but RDS does not save you from it.

The certificate rotation is a real event. AWS rotates the server certificate root periodically. Clients that pin the old CA bundle must be updated before the cutover or they will fail to connect. Most client libraries handle this gracefully if the SSL mode is permissive; some don’t. This is the kind of thing that silently breaks CI on a Tuesday if you aren’t paying attention to AWS operational announcements.

Performance Insights has a retention cliff. Seven days is free; longer retention is a separate line item. If your incident postmortem process depends on being able to go back 30 days to a query-level view, you’re paying for extended retention or you don’t have that data. Many organizations discover this during a postmortem.

There is no equivalent to pgbench running from inside the instance. You can’t SSH in. Benchmarks against RDS measure the client-to-RDS path, which includes the network. Plan your benchmark harness accordingly.

Positives

The boring, critical ones: it works. It has for over a decade. The control plane is mature, the feature set is broad, the documentation is good, AWS support actually knows the product, and the integration with the rest of AWS (VPC, IAM, KMS, CloudWatch, Secrets Manager, S3) is the deepest of any managed Postgres offering.

The engineering team at AWS has done serious work on the things underneath the wrapper: Multi-AZ Cluster is a genuinely well-designed HA topology, Blue/Green Deployments for major version upgrades are better than any manual pg_upgrade procedure you’re likely to run yourself, the extension curation is conservative-but-wide and includes most of what a serious workload needs, and the region/AZ coverage is larger than any competitor.

If you are already on AWS, the integration cost of RDS is lower than any other managed Postgres. Other services can be made to work inside an AWS VPC; RDS is AWS.

Negatives

The constraints come from the “managed” side and are not specific to RDS; every managed Postgres has most of them. The ones that bite hardest on RDS specifically:

No true superuser. Fine for most workloads, painful for the minority that need it.
The single-standby Multi-AZ model is storage-layer replication, which means the standby cannot be queried and the HA model is different from what a Postgres DBA’s instincts expect. Multi-AZ DB Cluster addresses this but is newer and narrower.
Extension curation lags. If you want the current upstream pgvector or a niche extension AWS hasn’t qualified, you’re out of luck.
Parameter group tooling is clunky. Immutable copies, reboot-to-apply semantics, and a UI that doesn’t clearly distinguish dynamic from static changes conspire to create preventable incidents.
No filesystem or OS access. Server-side COPY, pg_basebackup, pg_read_server_files, filesystem-level diagnostics (iostat, perf, bpftrace) — none available. Some of this can be proxied through AWS tools; some cannot.
Cost scales predictably but not cheaply on large instances. EBS io2 Block Express storage with high provisioned IOPS is not inexpensive, and the Multi-AZ doubling applies to the full instance cost, not just the database data.

Best-fit workloads and organizations

Amazon RDS for PostgreSQL is the right choice for:

Organizations already on AWS that want managed Postgres with deep platform integration and don’t need any of the constraints listed above to be different.
Workloads within the extension set — which, to be fair, is most of them.
Mid-sized OLTP where Multi-AZ + a couple of read replicas is the full HA/read-scale story, and the size of the data and the I/O pattern fits comfortably inside EBS.
Teams without dedicated DBAs who value AWS handling the operational surface.
Compliance-bound environments where AWS’s posture (HIPAA, FedRAMP, SOC, ISO) is part of the reason the project is on AWS in the first place.

It is not the right choice for:

Workloads that need a specific extension AWS hasn’t qualified, and where rewriting the workload to avoid that extension isn’t an option.
Workloads where physical-replication access matters (logical decoding into a non-AWS destination, or replication-based cross-cloud DR, or any architecture that depends on running pg_basebackup against the primary).
Workloads that outgrow EBS. There are Postgres deployments whose I/O profiles need something that EBS is not; those workloads either go to Aurora, to self-hosted with local NVMe, or to a specialized platform.
Workloads that need OS-level access for diagnostic or tuning reasons that cannot be served by Performance Insights and Enhanced Monitoring.
Shops where the AWS cost model is structurally wrong — typically predictable, steady workloads on very large instances where a self-hosted EC2 with reserved pricing or a non-AWS platform is meaningfully cheaper and the operational delta isn’t worth the spread.

RDS is the default. It’s not always the right default, but it is the default — and understanding why it might not be right for your specific workload is the main reason to write posts like this one.

✦

Next in this series: Amazon Aurora PostgreSQL, which sits in the same AWS console next to RDS and is architecturally unlike anything else on the list.