Managed Postgres, Examined: Google Cloud SQL for PostgreSQL

Third in a series of dispassionate tours of managed PostgreSQL services. Previously: Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL. After Aurora’s distributed-storage architecture, Cloud SQL reads like a return to first principles — a conventional PostgreSQL instance in a VM with a regional disk under it.

✦

Overview

Cloud SQL for PostgreSQL is Google Cloud’s managed PostgreSQL service. A Cloud SQL instance is a community PostgreSQL process running in a Google-managed VM (a Compute Engine instance that the user does not have direct access to), backed by a Persistent Disk for data storage, with the usual managed-service apparatus around it: automated backups, point-in-time recovery, optional high availability, read replicas, maintenance windows, and an IAM-integrated connection path.

Cloud SQL ships in two editions: Enterprise (the classic offering) and Enterprise Plus (introduced more recently, with a different compute/storage model and meaningful operational differences). The distinction matters: some of the features that read as “Cloud SQL features” are Enterprise Plus-only, and the two editions have different architectures under the hood for HA and caching. This post covers both, noting which is which.

Compared to AWS’s two-service split (RDS traditional vs. Aurora), Google’s two-edition split is narrower in scope — Enterprise Plus is not a different storage architecture in the way Aurora is, but it does have distinct behaviors around failover, caching, and maintenance that are worth understanding before choosing between them.

✦

Architecture

Enterprise edition: the default

A Cloud SQL Enterprise instance is a single PostgreSQL process running in a Google-managed VM. Data lives on a Persistent Disk attached to that VM. Persistent Disks come in several flavors — pd-standard (HDD), pd-balanced, pd-ssd, and pd-extreme — with different throughput and IOPS characteristics.

For a single-zone (non-HA) instance, this is effectively “PostgreSQL on a VM with a disk,” plus Google’s operational surface. The instance can be stopped and restarted, can be resized (with downtime), can have its disk grown (without downtime), and has its backups and logs managed by the service rather than by the user.

Enterprise edition: HA with regional Persistent Disk

Cloud SQL Enterprise HA uses synchronous block-level replication at the storage layer, via Regional Persistent Disks. A Regional Persistent Disk is a Google Cloud storage feature that replicates every block write synchronously to a second zone within the same region. The primary VM in one zone writes to its local half of the disk; the write is acknowledged only after the replica half in the other zone has also durably accepted it.

On failover, the standby VM in the second zone attaches to the regional disk and takes over. There is no PostgreSQL streaming replication in this configuration; the replication is happening at the disk level, below the PostgreSQL process.

This is the same architectural pattern as RDS Multi-AZ single-standby: synchronous block-level replication, a cold-until-needed standby, and a non-queryable HA peer. The important operational consequence is the same: the HA standby is not a read replica. If a workload needs read scaling in addition to HA, that requires separate read replicas, which do use PostgreSQL streaming replication.

Enterprise Plus edition

Enterprise Plus introduces a different compute/storage profile. It uses a machine family tuned for database workloads and a local data cache (memory/NVMe) to reduce the number of reads that traverse the Persistent Disk. The data cache is a read-through buffer tier that sits between PostgreSQL’s shared buffers and the Persistent Disk.

Enterprise Plus also has materially different HA behavior. Google documents “near-zero-downtime planned maintenance” for Enterprise Plus, achieved by warm-standby switchover rather than restart-in-place. Failover times for unplanned events are also documented as shorter for Enterprise Plus than for Enterprise.

The data cache is a genuinely distinctive feature. A workload whose hot set fits comfortably in PostgreSQL shared_buffers will not benefit from it meaningfully. A workload whose hot set is larger than shared_buffers but smaller than the cache tier — a common situation — will see meaningful latency improvements because cache hits avoid a Persistent Disk round trip.

flowchart TB subgraph Zone1[Zone A] Primary[(Primary Instance PostgreSQL process)] DiskA[Regional PD Zone A half] Primary -- writes --> DiskA end subgraph Zone2[Zone B] Standby[(Standby Instance cold until failover)] DiskB[Regional PD Zone B half] Standby -. attaches on failover .-> DiskB end DiskA <-- synchronous block replication --> DiskB subgraph Zone3[Zone C - optional] Replica[(Read Replica streaming replication)] end Primary -- WAL stream --> Replica classDef primary fill:#c3e0ff,stroke:#2a6fb3 classDef standby fill:#f5d5d5,stroke:#a52a2a classDef replica fill:#d5f0d5,stroke:#2a8a3e classDef disk fill:#ffe9b3,stroke:#b58900 class Primary primary class Standby standby class Replica replica class DiskA,DiskB disk

Cloud SQL Enterprise HA uses regional Persistent Disk for synchronous block replication between two zones. The standby VM in the second zone is not queryable and only takes over on failover. Read scaling is provided by separate read replica instances using PostgreSQL streaming replication.

Read replicas

Cloud SQL supports read replicas via PostgreSQL streaming replication (asynchronous by default). Replicas can be in the same region or a different region. A read replica can itself be promoted to a standalone instance — a path sometimes used for region cutover.

Cross-region read replicas introduce the usual cross-region replication lag. A cross-region replica can also be promoted to break the replication relationship and become a standalone primary — useful for DR but, once promoted, a one-way operation; rebuilding the replica relationship from the new primary back to the old region is a separate exercise.

Cascading replicas (a replica of a replica) are supported, with the usual caveats about lag amplification.

Permissions and `cloudsqlsuperuser`

Cloud SQL does not grant the true PostgreSQL SUPERUSER role to customers. Instead, a cloudsqlsuperuser role is provisioned, with most of the privileges a typical operator needs but without the ability to do things that would break the managed-service contract — direct filesystem access, loading arbitrary C extensions, certain replication-slot operations, and so on.

This is the same pattern as rds_superuser on RDS. The practical implications are similar: most migrations work without code changes; a small fraction that depend on true superuser privileges require accommodation.

Networking

Cloud SQL instances can be configured with public IP, private IP, or both. Private IP comes in two flavors with meaningfully different operational characteristics:

Private Services Access is the older model. Cloud SQL exposes the instance on an IP in a Google-managed service producer VPC, peered to the customer’s VPC. This is the model most existing Cloud SQL deployments use.

Private Service Connect (PSC) is the newer model. The instance is exposed via a PSC endpoint inside the customer’s VPC directly, giving finer-grained control over routing, service attachment, and cross-VPC connectivity. Customers with complex networking requirements — cross-project, cross-region, or hub-and-spoke VPC topologies — will generally prefer PSC.

The migration path from Private Services Access to PSC for an existing instance is not seamless. A new instance created with PSC is straightforward; converting an existing one is closer to a re-create than a reconfiguration.

Cloud SQL Auth Proxy

The Cloud SQL Auth Proxy is a client-side binary (or sidecar) that establishes an authenticated, encrypted connection to Cloud SQL using Google Cloud IAM credentials, without the application needing to manage TLS certificates or IP allow-lists. It is the recommended connection path from GKE, Cloud Run, Compute Engine, or any workload that has Google Cloud credentials available.

This is a small piece of operational surface that has an outsized effect on day-to-day developer experience: connecting to a Cloud SQL instance from a laptop is as simple as running the proxy locally, without any VPN or network-level access.

✦

Features

Backups and PITR

Cloud SQL takes automated daily backups (configurable window) and retains WAL archives for point-in-time recovery. PITR must be enabled explicitly — it is on by default for new instances in most cases, but the retention period (measured in days of WAL retention) is a separate setting and is worth checking.

The current default maximum for PITR retention is seven days; longer retention is available but requires explicit configuration.

Backups can be on-demand or scheduled. Manual backups can be retained indefinitely; automated backups are subject to the retention-count setting.

Cross-region backup copies are supported — a backup stored in a different region from the instance, for regional-disaster-recovery scenarios.

Major version upgrades

Cloud SQL supports in-place major version upgrades. The upgrade process involves downtime — historically on the order of tens of minutes for a typical database, though it depends on the size of the database and the complexity of the upgrade. For Enterprise Plus, the downtime is shorter due to the warm-standby switchover mechanism, but a major version upgrade still involves more than a trivial restart.

For databases where even tens of minutes of downtime is unacceptable, the supported zero-downtime path is logical-replication-based migration: provision a new instance at the target version, replicate from old to new, cut over. Google’s Database Migration Service (DMS) supports this path directly.

Cloud SQL’s PostgreSQL major-version support tracks upstream with a lag. The lag is generally shorter than Aurora’s (Cloud SQL Enterprise is closer to the stock Postgres release cadence, because the engine is not forked to the degree Aurora’s is), but it is not zero.

Extensions

Cloud SQL maintains an allow-list of extensions that can be enabled via CREATE EXTENSION. The list is well-documented and includes the commonly-needed extensions — PostGIS, pg_stat_statements, pgcrypto, pg_trgm, pgvector, pg_partman, and many others. The list of extensions differs from RDS’s in small but occasionally important ways; an extension present on RDS is not automatically present on Cloud SQL and vice versa.

Custom extensions — third-party C extensions not on the allow-list — cannot be installed on Cloud SQL. There is no Cloud SQL equivalent of RDS’s “Trusted Language Extensions” (pg_tle) path for customer-defined SQL-level extensions.

Query Insights

Cloud SQL’s Query Insights is the GCP equivalent of RDS Performance Insights. It provides top-query aggregation, wait-event breakdown, and query plan inspection. The query plan inspection — surfacing EXPLAIN ANALYZE output for specific query executions — is a feature RDS Performance Insights historically did not have in the same form.

Query Insights is generally available on both Enterprise and Enterprise Plus, with longer retention on Enterprise Plus.

IAM authentication

Cloud SQL supports PostgreSQL authentication via Google Cloud IAM identities — a user can connect with a short-lived token derived from their IAM credentials rather than a password. This works for both human users (via the gcloud CLI or the Auth Proxy) and service accounts (for workloads running as a service account on GCE, GKE, Cloud Run, etc.).

IAM auth coexists with password-based auth; the two can be mixed on the same instance. For service accounts, this eliminates the usual problem of managing database passwords in a secrets store.

Cloning

Cloud SQL supports cloning an instance via the clone operation, which produces a new instance from a point-in-time snapshot of an existing one. This is a full physical clone — it does not use copy-on-write in the way Aurora cloning does, so the clone takes time proportional to database size and consumes the storage equivalent of the source.

It is still operationally useful — a clone-for-test workflow is straightforward — but it is not the same rapid-clone primitive Aurora offers.

Database Migration Service

Google’s DMS is the supported path for migrating into Cloud SQL from self-hosted PostgreSQL, from RDS, from Aurora, or from one Cloud SQL instance to another at a different major version. DMS performs continuous logical replication and supports a cutover workflow with minimal downtime.

Migrating out of Cloud SQL uses the same mechanisms in reverse: either logical replication (Cloud SQL supports publications and subscriptions) or pg_dump/pg_restore.

Physical streaming replication to an external (non-Cloud SQL) PostgreSQL server is not supported in general. A Cloud SQL instance can be a source for logical replication to an external subscriber; it cannot be a physical source via pg_basebackup or streaming WAL to an external follower.

✦

Non-brochure concerns

The HA standby is not a read replica

The most-commonly-misunderstood Cloud SQL fact: enabling HA on an Enterprise instance does not give you a queryable warm replica. The standby is a standby. It exists to fail over to. If the application also needs to scale reads, that requires separate read replica instances, each of which is a full additional PostgreSQL instance billed accordingly.

This is the same architectural reality as RDS Multi-AZ single-standby. It catches teams off guard at the same rate it catches teams off guard on RDS.

Regional PD replication is synchronous at the block level

Because HA replication is block-level and synchronous, the primary’s write latency is bounded by the round-trip to the second zone’s disk half. In most cases this is low single-digit milliseconds and not noticeable. Under degraded inter-zone network conditions — which do happen — write latency increases accordingly, and the application will observe the degradation directly as elevated commit times.

This is not a Cloud SQL bug; it is the durability guarantee the architecture provides. It is worth being aware of because an inexperienced observer will look at elevated write latency and start hunting for a Postgres configuration problem, when the issue is inter-zone networking.

Storage is monotonic (by default)

Like RDS and Aurora, Cloud SQL storage grows on request (or automatically, if storage auto-resize is enabled) but does not shrink. Deleting data does not reclaim the provisioned disk space. Reclaiming provisioned storage requires creating a smaller instance and migrating data to it — logical replication, pg_dump/pg_restore, or DMS.

Storage auto-resize is a useful feature; it is also a commitment to monotonic growth. For workloads where data volume fluctuates significantly, it pays to set a sensible maximum.

Enterprise vs. Enterprise Plus: failover semantics differ

On Enterprise, failover is regional PD detach-and-reattach; the standby boots fresh and the cache is empty. On Enterprise Plus, failover uses a warm-standby mechanism that preserves a meaningful portion of the cache state. This affects post-failover latency profiles materially — an Enterprise failover on a large database can produce a minutes-long window of elevated read latency as the buffer cache refills; Enterprise Plus typically does not.

If a workload has tight p99 latency requirements that must hold across failover events, Enterprise Plus is worth the premium for this reason alone.

Extension version lag

Cloud SQL’s available versions of specific extensions can lag upstream by meaningful amounts. pgvector, during its period of rapid development, was a frequent example — the latest upstream version would not land on Cloud SQL for weeks or months. Teams whose workloads depend on a specific new extension feature should verify the available version on Cloud SQL before committing.

Connection limits and Cloud SQL Auth Proxy

Cloud SQL instances have connection limits that scale with instance size, and the Auth Proxy itself holds connections on behalf of clients. A naive application architecture that spawns an Auth Proxy per worker process and opens many connections per proxy can exhaust the instance connection limit faster than the math suggests.

PgBouncer is supported as an external pooler against Cloud SQL. There is no equivalent of RDS Proxy provided by Cloud SQL itself; pooling is the customer’s problem.

The `logical_decoding_work_mem` and `max_replication_slots` story

Cloud SQL’s defaults for replication-related parameters have historically been conservative. Teams using logical replication in production — for CDC to BigQuery, for heterogeneous replication, for zero-downtime migrations — should review these parameters explicitly and tune them rather than relying on defaults. The consequences of under-tuning are abandoned WAL, ballooning disk usage from retained WAL, and eventually a primary that stops accepting writes because of replication slot lag holding WAL on disk.

This is true of any Postgres deployment using logical replication; it is called out here because the Cloud SQL defaults are not well-suited to heavy CDC workloads out of the box.

Maintenance windows and the “disable maintenance” feature

Cloud SQL has a maintenance window configuration and, for Enterprise Plus, the ability to defer maintenance longer than the standard allowed window. The maintenance window is when Google can apply updates; it is not whether updates happen. Maintenance will be applied eventually. Teams that operate under an assumption that maintenance can be deferred indefinitely will be surprised.

For Enterprise, maintenance causes a brief instance restart — on the order of a minute or two. For HA instances, this is typically invisible to applications because the failover handles the transition. For non-HA instances, it is observable downtime.

Log routing and Cloud Logging cost

Cloud SQL instances emit Postgres logs to Cloud Logging. Under default settings, this is typically low-volume. With log_min_duration_statement enabled at a low threshold — a reasonable thing to want for query analysis — log volume increases substantially, and Cloud Logging ingestion is billed separately from Cloud SQL. A team that turns up query logging without adjusting Cloud Logging sinks can produce a noticeable Cloud Logging bill.

This is the sort of operational edge the service does not advertise and that becomes a line item on a finance-team ticket somewhere around month three.

No direct access to the underlying VM

As with RDS and Aurora, there is no SSH access to the VM running the PostgreSQL instance. Everything observable about the instance comes through the Cloud SQL API, Cloud Monitoring, Cloud Logging, or PostgreSQL’s own system views. For teams used to being able to log into a database server and run top, iostat, or perf, this is a limitation to be prepared for — not necessarily a dealbreaker, but an adjustment.

Private Service Connect is the future

Google is clearly investing in PSC as the go-forward networking model. New features (cross-region PSC, more granular IAM bindings) tend to land on PSC first. Teams starting a new Cloud SQL deployment today should strongly prefer PSC over Private Services Access unless there is a specific reason not to. Teams on the older model should have a migration plan in mind, because the capability gap will continue to widen.

✦

Positives

The architecture is straightforward. Cloud SQL Enterprise is a PostgreSQL process on a VM with a disk under it. The mental model required to operate it is the mental model of operating PostgreSQL, plus a small amount of Google-specific operational surface. This is a real advantage for teams whose DBA knowledge is generic PostgreSQL knowledge.

Enterprise Plus data cache. The read cache tier is genuinely useful for workloads whose hot set exceeds shared_buffers. Latency improvements are measurable and real.

IAM authentication and Cloud SQL Auth Proxy. The authenticated-connection-without-managing-passwords story is cleaner than the AWS equivalent for most common deployment patterns, particularly from GKE and Cloud Run.

Query Insights with plan capture. The ability to see actual EXPLAIN ANALYZE output for specific slow queries, integrated into the managed-service UI, is a non-trivial operational improvement over digging it out manually.

Cross-region read replicas with in-place promotion. Simple, straightforward DR path for regional-failure scenarios. Not as exotic as Aurora Global Database, and for many workloads exactly what is needed.

Database Migration Service. DMS is a capable migration tool and the supported, documented path for getting in and out of Cloud SQL. It is not magic, but it is there, and it works for the standard patterns.

Versions track upstream more closely than Aurora. Cloud SQL Enterprise typically supports new PostgreSQL major versions faster than Aurora does, because the engine is not as heavily modified. Teams that value being on a current major version will find Cloud SQL more accommodating.

PSC networking model. For organizations with sophisticated VPC topologies, PSC is a meaningfully better networking primitive than the peered-service-VPC models typical of other clouds.

✦

Negatives

HA standby is not queryable. A workload that needs both HA and read scale-out pays for the standby and pays for separate read replicas. This is not unique to Cloud SQL, but it is a surprise for teams who expected a warm-readable HA peer.

Enterprise Plus has meaningfully better operational behavior than Enterprise, at a premium. For latency-sensitive workloads, the choice is effectively forced toward Enterprise Plus, and the Enterprise edition becomes a development-or-cost-optimized option rather than a serious production target.

Extension allow-list and extension version lag. Common extensions are supported. Uncommon ones are not, and common ones sometimes trail upstream.

No customer-defined extension mechanism. No Cloud SQL equivalent of pg_tle. Customer-specific SQL-level extensions are not a supported artifact.

Clone is a full physical copy. Fine, but not a copy-on-write primitive. Workflows that would benefit from frequent, large clones (staging environments, per-feature-branch databases) are less natural on Cloud SQL than on Aurora.

No RDS Proxy equivalent. Connection pooling is the customer’s responsibility. PgBouncer works; there is no built-in managed pooler.

No true SUPERUSER. Same limitation as RDS/Aurora. Most workloads are unaffected; a minority are not.

Cloud Logging cost for verbose query logging. Can be a surprise. Requires explicit sink configuration to manage.

Networking migration from PSA to PSC is non-trivial. Teams starting today have a simple choice; teams with existing PSA deployments have a migration on the list.

✦

Best-fit workloads and organizations

General-purpose OLTP on Google Cloud. Cloud SQL is the obvious choice for applications running on GKE, Cloud Run, Compute Engine, or App Engine that need a managed PostgreSQL with conventional semantics. The integration with GCP IAM and networking is clean, the operational model is standard-Postgres, and the path into and out of the service is well-defined.

Workloads where the hot set exceeds shared_buffers but fits in a larger cache. Enterprise Plus’s data cache is a distinctive advantage here.

Teams with strong standard-PostgreSQL expertise who want to bring that expertise directly to a managed service without learning a forked variant. Cloud SQL Enterprise stays close to stock PostgreSQL behavior; most DBA intuition transfers.

Organizations with sophisticated VPC architectures — multiple projects, shared VPCs, hub-and-spoke topologies, cross-region service exposure. PSC is a better primitive for these scenarios than most managed-service networking alternatives.

Applications that use DMS for migration or continuous cross-system replication. DMS integrates naturally with Cloud SQL and handles the common patterns well.

Poor fits

Workloads requiring custom C extensions that are not on Google’s allow-list. Cloud SQL has no path for these; a self-hosted deployment or a different managed service with TLE-style support is a better match.

Applications with hard requirements for physical replication to external followers. Logical replication out is supported; physical is not.

Scale-out analytical workloads requiring many read replicas across regions with minimal lag. Aurora’s shared-storage read-replica model or AlloyDB (covered later in this series) will generally be better matches than Cloud SQL’s streaming-replication-based replicas.

Workloads that need to clone fast and often. The lack of a copy-on-write cloning primitive makes this operationally heavier on Cloud SQL.

Teams with zero tolerance for maintenance-driven restarts on non-HA instances. This is a Cloud SQL Enterprise operational reality; the workaround is HA, which brings back the “standby isn’t queryable” tradeoff.

✦

Verification notes for this post

Items flagged for verification before external publication:

Current supported Persistent Disk types for Cloud SQL
Current HA implementation for Cloud SQL Enterprise PostgreSQL (regional PD vs. failover replica — confirm current default)
Current documented failover times for Enterprise vs. Enterprise Plus
Current PITR maximum retention
Current DMS capabilities for PostgreSQL major version upgrades
Current status of any customer-extensible mechanism analogous to pg_tle
Current feature parity between Query Insights and RDS Performance Insights
Current status of PgBouncer / managed connection pooling on Cloud SQL
Current PSC vs. PSA feature matrix and conversion path
Current pgvector version on Cloud SQL vs. upstream
Enterprise Plus current minimum/maximum instance sizes and machine families

The architectural claims — VM-backed, Persistent Disk storage, regional-PD-based HA, streaming replication for read replicas, PostgreSQL allow-list extension model, cloudsqlsuperuser role — are well-documented and do not require verification. The items above are the product details that shift with Google Cloud’s release cadence.

✦

Next in this series: Google AlloyDB for PostgreSQL — Google’s answer to Aurora, architecturally distinct from Cloud SQL, and the one managed PostgreSQL that comes from a vendor that has materially contributed to upstream PostgreSQL.