Why I chose pgvector over a dedicated vector database
On the first RAG project I worked on, the team set up Qdrant as a separate service — yet another thing to deploy, monitor, and back up. After a few months of running it, the question came up: if PostgreSQL is already running reliably, why do we need an entirely separate system just to store vectors?
I’ve worked with MySQL, PostgreSQL, and MongoDB across many different projects — tools like DBeaver make switching between them seamless. PostgreSQL with the pgvector extension lets you store vector embeddings directly in your existing database, use familiar SQL, and take advantage of transactions, backups, and replication you already have. For small teams or mid-sized projects, this is a far more pragmatic choice than maintaining an additional standalone system.
pgvector supports:
- Storing vectors as
float4[]with thevector(n)data type - Nearest neighbor search (L2, cosine, inner product)
- IVFFlat and HNSW indexes for fast search on large datasets
- Direct integration with
pgbouncer,pg_stat_statements, and other extensions just like any other PostgreSQL extension
Installing pgvector
Installing on Ubuntu/Debian
If you’re using PostgreSQL from the official PGDG repository, a single command is all you need:
# PostgreSQL 16
sudo apt install postgresql-16-pgvector
# Restart to load the extension
sudo systemctl restart postgresql
No PGDG repo? Build from source:
sudo apt install -y postgresql-server-dev-16 git build-essential
git clone https://github.com/pgvector/pgvector.git
cd pgvector
make
sudo make install
Installing with Docker
The fastest way to get started testing — the pgvector/pgvector image comes with the extension bundled, no additional setup required:
docker run -d \
--name pgvector-dev \
-e POSTGRES_PASSWORD=secret \
-e POSTGRES_DB=vectordb \
-p 5432:5432 \
pgvector/pgvector:pg16
Enabling the extension in a database
One nice thing about PostgreSQL: extensions are installed at the system level but enabled or disabled per database. No risk of one database’s extension affecting another:
-- Connect to the target database
\c vectordb
-- Enable the extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Verify
SELECT * FROM pg_extension WHERE extname = 'vector';
Configuration and detailed usage
Creating a table to store vector embeddings
Say you’re building a RAG system for an internal knowledge base — documents are chunked into smaller passages, each with a 1536-dimensional embedding vector (the output of OpenAI’s text-embedding-3-small):
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
content TEXT NOT NULL,
source TEXT,
metadata JSONB,
embedding vector(1536),
created_at TIMESTAMPTZ DEFAULT NOW()
);
The dimension must exactly match the embedding model you’re using — a dimension mismatch will throw an error immediately on insert:
text-embedding-3-small: 1536 dimstext-embedding-3-large: 3072 dimsnomic-embed-text: 768 dimsall-MiniLM-L6-v2: 384 dims
Inserting data from Python
import psycopg2
from openai import OpenAI
import json
client = OpenAI()
def get_embedding(text: str) -> list[float]:
response = client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding
def insert_document(conn, content: str, source: str, metadata: dict):
embedding = get_embedding(content)
with conn.cursor() as cur:
cur.execute(
"""
INSERT INTO documents (content, source, metadata, embedding)
VALUES (%s, %s, %s, %s)
""",
(content, source, json.dumps(metadata), embedding)
)
conn.commit()
# Connect
conn = psycopg2.connect(
host="localhost",
dbname="vectordb",
user="postgres",
password="secret"
)
insert_document(
conn,
content="pgvector allows you to store vector embeddings directly in PostgreSQL",
source="docs/pgvector.md",
metadata={"chapter": 1, "topic": "overview"}
)
Semantic search
Find the 5 most relevant document chunks for a user query — using cosine distance (best suited for text embeddings):
def semantic_search(conn, query: str, top_k: int = 5) -> list[dict]:
query_embedding = get_embedding(query)
with conn.cursor() as cur:
cur.execute(
"""
SELECT
id,
content,
source,
metadata,
1 - (embedding <=> %s::vector) AS similarity
FROM documents
ORDER BY embedding <=> %s::vector
LIMIT %s
""",
(query_embedding, query_embedding, top_k)
)
rows = cur.fetchall()
return [
{
"id": row[0],
"content": row[1],
"source": row[2],
"metadata": row[3],
"similarity": float(row[4])
}
for row in rows
]
results = semantic_search(conn, "How to install pgvector on Ubuntu")
for r in results:
print(f"[{r['similarity']:.3f}] {r['content'][:100]}")
pgvector provides 3 distance operators — choosing the wrong one directly impacts result quality:
<=>— cosine distance (use for text embeddings)<->— Euclidean / L2 distance<#>— negative inner product (use when vectors are already normalized)
Creating indexes to speed up search
Below 50,000 rows, sequential scans are still fast — latency is typically under 10ms. Beyond that threshold without an index, queries slow down noticeably, especially under concurrent load.
HNSW index (recommended for production — faster queries than IVFFlat, no training required upfront):
-- HNSW index with cosine distance
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Increase ef_search at query time to balance accuracy vs. speed
SET hnsw.ef_search = 100;
IVFFlat index (good when the dataset is stable with few new inserts):
-- Data must exist before creating an IVFFlat index
-- lists ~ sqrt(number of rows), minimum 100
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Increase probes at query time (default 1, higher = more accurate)
SET ivfflat.probes = 10;
I generally go with HNSW for production because you don’t need to know the data volume upfront, and accuracy stays consistent as the dataset grows. IVFFlat is better suited for cases where the dataset is nearly static and you need faster index build times.
Filtering with metadata
This is where pgvector really shines compared to a dedicated vector database — you can filter and JOIN with plain SQL, no new API to learn:
-- Search within documents belonging to chapter 1
SELECT content, 1 - (embedding <=> '[0.1, 0.2, ...]'::vector) AS sim
FROM documents
WHERE metadata->>'chapter' = '1'
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 5;
-- Search and join with another table
SELECT d.content, u.name AS author, sim
FROM (
SELECT id, content, 1 - (embedding <=> %s::vector) AS sim
FROM documents
ORDER BY embedding <=> %s::vector
LIMIT 10
) d
JOIN document_authors u ON d.id = u.document_id
WHERE sim > 0.7;
Inspection and Monitoring
Verifying the index is being used
Reading the query plan with EXPLAIN ANALYZE is essential for confirming your index is actually being used — look for “Index Scan” instead of “Seq Scan”:
-- Check query plan — you should see "Index Scan" instead of "Seq Scan"
EXPLAIN (ANALYZE, BUFFERS)
SELECT content
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 5;
-- View index information
SELECT
indexname,
pg_size_pretty(pg_relation_size(indexname::regclass)) AS index_size
FROM pg_indexes
WHERE tablename = 'documents';
Performance monitoring
-- Table and index sizes
SELECT
pg_size_pretty(pg_table_size('documents')) AS table_size,
pg_size_pretty(pg_indexes_size('documents')) AS indexes_size,
pg_size_pretty(pg_total_relation_size('documents')) AS total_size;
-- Slow queries (requires pg_stat_statements)
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
WHERE query LIKE '%embedding%'
ORDER BY mean_exec_time DESC
LIMIT 5;
Routine maintenance
HNSW requires no rebuilding. IVFFlat is a different story — after many deletes and updates, accumulated dead tuples gradually degrade accuracy. Periodic REINDEX resolves this:
-- Rebuild index (no table lock — use CONCURRENTLY)
REINDEX INDEX CONCURRENTLY documents_embedding_idx;
-- Vacuum to reclaim dead tuples after heavy updates
VACUUM ANALYZE documents;
Quick benchmark
import time
import random
def benchmark_search(conn, n_queries=100):
# Fetch random existing embeddings to use as test queries
with conn.cursor() as cur:
cur.execute("SELECT embedding FROM documents ORDER BY random() LIMIT %s", (n_queries,))
sample_embeddings = [row[0] for row in cur.fetchall()]
start = time.time()
for emb in sample_embeddings:
with conn.cursor() as cur:
cur.execute(
"SELECT id FROM documents ORDER BY embedding <=> %s::vector LIMIT 5",
(emb,)
)
cur.fetchall()
elapsed = time.time() - start
print(f"{n_queries} queries in {elapsed:.2f}s")
print(f"Average: {elapsed/n_queries*1000:.1f}ms per query")
# HNSW on ~100k rows typically achieves 5-20ms/query depending on ef_search
benchmark_search(conn)
Practical lessons learned
After running pgvector in production for several months, here’s what I wish I had known earlier:
- Choose the right dimension: Smaller embedding models (384–768 dims) are both faster and more storage-efficient — 100k docs at 384 dims only takes about 150MB. Only go to 1536+ when you genuinely need higher accuracy.
- HNSW vs IVFFlat: If your dataset is continuously growing, go with HNSW — no debate. IVFFlat only makes sense when the data is nearly static and you need faster index build times on first run.
- No backup headaches: pgvector data is included in a regular
pg_dump— no extra tooling needed, no additional steps to add to your existing PostgreSQL backup workflow. - Connection pooling: pgbouncer in front works perfectly fine with pgvector just like any other PostgreSQL workload — nothing special to configure here.
- When to consider Qdrant or Weaviate: When your vector dataset exceeds tens of millions of records and you need horizontal sharding or specialized multi-tenancy. Under 5 million records, pgvector is more than sufficient — and far simpler to operate.
