pgvector Guide: Storing and Searching Vector Embeddings in PostgreSQL for AI and RAG Applications – ITFROMZERO

Table of Contents

Why I chose pgvector over a dedicated vector database

On the first RAG project I worked on, the team set up Qdrant as a separate service — yet another thing to deploy, monitor, and back up. After a few months of running it, the question came up: if PostgreSQL is already running reliably, why do we need an entirely separate system just to store vectors?

I’ve worked with MySQL, PostgreSQL, and MongoDB across many different projects — tools like DBeaver make switching between them seamless. PostgreSQL with the pgvector extension lets you store vector embeddings directly in your existing database, use familiar SQL, and take advantage of transactions, backups, and replication you already have. For small teams or mid-sized projects, this is a far more pragmatic choice than maintaining an additional standalone system.

pgvector supports:

Storing vectors as float4[] with the vector(n) data type
Nearest neighbor search (L2, cosine, inner product)
IVFFlat and HNSW indexes for fast search on large datasets
Direct integration with pgbouncer, pg_stat_statements, and other extensions just like any other PostgreSQL extension

Installing pgvector

Installing on Ubuntu/Debian

If you’re using PostgreSQL from the official PGDG repository, a single command is all you need:

# PostgreSQL 16
sudo apt install postgresql-16-pgvector

# Restart to load the extension
sudo systemctl restart postgresql

No PGDG repo? Build from source:

sudo apt install -y postgresql-server-dev-16 git build-essential
git clone https://github.com/pgvector/pgvector.git
cd pgvector
make
sudo make install

Installing with Docker

The fastest way to get started testing — the pgvector/pgvector image comes with the extension bundled, no additional setup required:

docker run -d \
  --name pgvector-dev \
  -e POSTGRES_PASSWORD=secret \
  -e POSTGRES_DB=vectordb \
  -p 5432:5432 \
  pgvector/pgvector:pg16

Enabling the extension in a database

One nice thing about PostgreSQL: extensions are installed at the system level but enabled or disabled per database. No risk of one database’s extension affecting another:

-- Connect to the target database
\c vectordb

-- Enable the extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Verify
SELECT * FROM pg_extension WHERE extname = 'vector';

Configuration and detailed usage

Creating a table to store vector embeddings

Say you’re building a RAG system for an internal knowledge base — documents are chunked into smaller passages, each with a 1536-dimensional embedding vector (the output of OpenAI’s text-embedding-3-small):

CREATE TABLE documents (
  id          BIGSERIAL PRIMARY KEY,
  content     TEXT NOT NULL,
  source      TEXT,
  metadata    JSONB,
  embedding   vector(1536),
  created_at  TIMESTAMPTZ DEFAULT NOW()
);

The dimension must exactly match the embedding model you’re using — a dimension mismatch will throw an error immediately on insert:

text-embedding-3-small: 1536 dims
text-embedding-3-large: 3072 dims
nomic-embed-text: 768 dims
all-MiniLM-L6-v2: 384 dims

Inserting data from Python

import psycopg2
from openai import OpenAI
import json

client = OpenAI()

def get_embedding(text: str) -> list[float]:
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

def insert_document(conn, content: str, source: str, metadata: dict):
    embedding = get_embedding(content)
    with conn.cursor() as cur:
        cur.execute(
            """
            INSERT INTO documents (content, source, metadata, embedding)
            VALUES (%s, %s, %s, %s)
            """,
            (content, source, json.dumps(metadata), embedding)
        )
    conn.commit()

# Connect
conn = psycopg2.connect(
    host="localhost",
    dbname="vectordb",
    user="postgres",
    password="secret"
)

insert_document(
    conn,
    content="pgvector allows you to store vector embeddings directly in PostgreSQL",
    source="docs/pgvector.md",
    metadata={"chapter": 1, "topic": "overview"}
)

Semantic search

Find the 5 most relevant document chunks for a user query — using cosine distance (best suited for text embeddings):

def semantic_search(conn, query: str, top_k: int = 5) -> list[dict]:
    query_embedding = get_embedding(query)
    
    with conn.cursor() as cur:
        cur.execute(
            """
            SELECT
              id,
              content,
              source,
              metadata,
              1 - (embedding <=> %s::vector) AS similarity
            FROM documents
            ORDER BY embedding <=> %s::vector
            LIMIT %s
            """,
            (query_embedding, query_embedding, top_k)
        )
        rows = cur.fetchall()
    
    return [
        {
            "id": row[0],
            "content": row[1],
            "source": row[2],
            "metadata": row[3],
            "similarity": float(row[4])
        }
        for row in rows
    ]

results = semantic_search(conn, "How to install pgvector on Ubuntu")
for r in results:
    print(f"[{r['similarity']:.3f}] {r['content'][:100]}")

pgvector provides 3 distance operators — choosing the wrong one directly impacts result quality:

<=> — cosine distance (use for text embeddings)
<-> — Euclidean / L2 distance
<#> — negative inner product (use when vectors are already normalized)

Creating indexes to speed up search

Below 50,000 rows, sequential scans are still fast — latency is typically under 10ms. Beyond that threshold without an index, queries slow down noticeably, especially under concurrent load.

HNSW index (recommended for production — faster queries than IVFFlat, no training required upfront):

-- HNSW index with cosine distance
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- Increase ef_search at query time to balance accuracy vs. speed
SET hnsw.ef_search = 100;

IVFFlat index (good when the dataset is stable with few new inserts):

-- Data must exist before creating an IVFFlat index
-- lists ~ sqrt(number of rows), minimum 100
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- Increase probes at query time (default 1, higher = more accurate)
SET ivfflat.probes = 10;

I generally go with HNSW for production because you don’t need to know the data volume upfront, and accuracy stays consistent as the dataset grows. IVFFlat is better suited for cases where the dataset is nearly static and you need faster index build times.

Filtering with metadata

This is where pgvector really shines compared to a dedicated vector database — you can filter and JOIN with plain SQL, no new API to learn:

-- Search within documents belonging to chapter 1
SELECT content, 1 - (embedding <=> '[0.1, 0.2, ...]'::vector) AS sim
FROM documents
WHERE metadata->>'chapter' = '1'
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 5;

-- Search and join with another table
SELECT d.content, u.name AS author, sim
FROM (
  SELECT id, content, 1 - (embedding <=> %s::vector) AS sim
  FROM documents
  ORDER BY embedding <=> %s::vector
  LIMIT 10
) d
JOIN document_authors u ON d.id = u.document_id
WHERE sim > 0.7;

Inspection and Monitoring

Verifying the index is being used

Reading the query plan with EXPLAIN ANALYZE is essential for confirming your index is actually being used — look for “Index Scan” instead of “Seq Scan”:

-- Check query plan — you should see "Index Scan" instead of "Seq Scan"
EXPLAIN (ANALYZE, BUFFERS)
SELECT content
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 5;

-- View index information
SELECT
  indexname,
  pg_size_pretty(pg_relation_size(indexname::regclass)) AS index_size
FROM pg_indexes
WHERE tablename = 'documents';

Performance monitoring

-- Table and index sizes
SELECT
  pg_size_pretty(pg_table_size('documents')) AS table_size,
  pg_size_pretty(pg_indexes_size('documents')) AS indexes_size,
  pg_size_pretty(pg_total_relation_size('documents')) AS total_size;

-- Slow queries (requires pg_stat_statements)
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
WHERE query LIKE '%embedding%'
ORDER BY mean_exec_time DESC
LIMIT 5;

Routine maintenance

HNSW requires no rebuilding. IVFFlat is a different story — after many deletes and updates, accumulated dead tuples gradually degrade accuracy. Periodic REINDEX resolves this:

-- Rebuild index (no table lock — use CONCURRENTLY)
REINDEX INDEX CONCURRENTLY documents_embedding_idx;

-- Vacuum to reclaim dead tuples after heavy updates
VACUUM ANALYZE documents;

Quick benchmark

import time
import random

def benchmark_search(conn, n_queries=100):
    # Fetch random existing embeddings to use as test queries
    with conn.cursor() as cur:
        cur.execute("SELECT embedding FROM documents ORDER BY random() LIMIT %s", (n_queries,))
        sample_embeddings = [row[0] for row in cur.fetchall()]
    
    start = time.time()
    for emb in sample_embeddings:
        with conn.cursor() as cur:
            cur.execute(
                "SELECT id FROM documents ORDER BY embedding <=> %s::vector LIMIT 5",
                (emb,)
            )
            cur.fetchall()
    elapsed = time.time() - start
    
    print(f"{n_queries} queries in {elapsed:.2f}s")
    print(f"Average: {elapsed/n_queries*1000:.1f}ms per query")
    # HNSW on ~100k rows typically achieves 5-20ms/query depending on ef_search

benchmark_search(conn)

Practical lessons learned

After running pgvector in production for several months, here’s what I wish I had known earlier:

Choose the right dimension: Smaller embedding models (384–768 dims) are both faster and more storage-efficient — 100k docs at 384 dims only takes about 150MB. Only go to 1536+ when you genuinely need higher accuracy.
HNSW vs IVFFlat: If your dataset is continuously growing, go with HNSW — no debate. IVFFlat only makes sense when the data is nearly static and you need faster index build times on first run.
No backup headaches: pgvector data is included in a regular pg_dump — no extra tooling needed, no additional steps to add to your existing PostgreSQL backup workflow.
Connection pooling: pgbouncer in front works perfectly fine with pgvector just like any other PostgreSQL workload — nothing special to configure here.
When to consider Qdrant or Weaviate: When your vector dataset exceeds tens of millions of records and you need horizontal sharding or specialized multi-tenancy. Under 5 million records, pgvector is more than sufficient — and far simpler to operate.