pgvector 0.8.2 and the Trouble With Parallel HNSW

pgvector 0.8.2 is out. It fixes CVE-2026-3172, a heap buffer overflow in parallel HNSW index builds that can leak data from other relations or crash the backend. If you run pgvector and have it pinned to a version below 0.8.2, upgrade. If you are on a managed provider, check which pgvector version they actually ship — a non-trivial number of them lag the upstream release by weeks, and “we support pgvector” does not mean “we are on the version that fixed the parallel HNSW bug.”

What broke

HNSW is a layered graph. Building one requires choosing entry points, finding the closest neighbors among a candidate set, pruning that set down to M per layer, and threading the graph downward through layers in proportion to the level distribution. The construction logic is not complicated, but it is dense, and there is a lot of bookkeeping per vertex. pgvector does that bookkeeping in shared memory when the build is parallel, which means the bookkeeping is now subject to all the usual constraints of leader-worker coordination in PostgreSQL.

A PostgreSQL parallel index build runs a leader and N workers, all attached to a dynamic shared memory segment that holds the shared state for the build. The workers process tuples in parallel and write into shared structures; the leader stitches the results together at the end. This works well for access methods where the per-worker state is small and the merge step is straightforward, which is to say, for B-tree. For HNSW, where the per-worker state is a partial graph and the merge step is “weave several graphs together while respecting layer-by-layer neighbor selection,” it is a more interesting problem.

CVE-2026-3172 is, by the description in the advisory, a buffer overflow in this construction path. The advisory’s wording — “leak data from other relations or crash the database server” — is consistent with a write that spills out of a per-worker buffer into adjacent shared memory that another backend happens to have mapped, but the exact failure mode is whatever the merge or per-worker write code did wrong, and the upstream commit is the place to look once it is convenient to do so. What matters operationally is the impact and the patch.

Why this one matters more than the bug count suggests

pgvector is the default vector extension in production RAG stacks. Every managed PostgreSQL provider supports it, several of them ship it preinstalled, and a meaningful number of customer deployments treat CREATE INDEX ... USING hnsw as a routine operation. Parallel HNSW builds, in particular, are exactly the path you take when you have a few million vectors and want the build to finish before lunch. The bug lives in the path production users hit, not the path you would only hit if you were specifically trying to find a bug.

The threat model is also broader than the usual extension CVE. The “leak data from other relations” part is not about a malicious user — it is about the shared memory layout. A parallel build by user A that overflows its buffer can corrupt or read memory that user B’s backend has mapped. There is no SQL-level privilege boundary that prevents this; the shared memory model assumes the writes are correct, and when they are not, the corruption goes wherever it lands.

What to do

Three steps, in order.

Upgrade pgvector to 0.8.2 or later. The diff is small. The upgrade is ALTER EXTENSION pgvector UPDATE after the binary is in place — no reindex required for the fix itself.

Audit your provider. If you are on RDS, Aurora, AlloyDB, Cloud SQL, Azure Database for PostgreSQL, Tiger Cloud, Lakebase, Snowflake Postgres, or any of the other managed services that ship pgvector, check the version actually loaded on your instance. The SELECT extversion FROM pg_extension WHERE extname = 'pgvector' query will tell you, regardless of what the provider’s release notes claim.

If you cannot upgrade promptly and you are about to rebuild an HNSW index, set max_parallel_maintenance_workers = 0 for the duration of the build. It will be slower. It will not be wrong.

The broader picture

Vector indexes inside relational databases are a relatively new combination of two well-understood things that did not used to interact. HNSW construction is a heuristic graph algorithm whose reference implementations are written for ML workloads, not for shared-memory parallelism inside a transactional server. Bolting one onto the other requires getting the coordination layer right, and the coordination layer is exactly where this bug lived.

Expect more of these. pgvector is the most carefully maintained vector extension in the ecosystem and it still shipped a parallel-build buffer overflow. The less carefully maintained ones are sitting on more.

What broke

Why this one matters more than the bug count suggests

What to do

The broader picture

Related