Setting Up Milvus Vector Database with Docker: Building a Knowledge Store for RAG and AI – ITFROMZERO

Table of Contents

Running Milvus in 5 Minutes with Docker Compose

Milvus isn’t like Redis or PostgreSQL. It was built from the ground up to do one thing: store and search vectors — the data type AI uses to “understand” semantics, which is fundamentally different from ordinary keyword matching. I tested several vector databases before settling on Milvus for a RAG project processing 500K technical documents, and the deciding factor was its ability to scale with large data volumes.

Create a docker-compose.yml file and run it right away:

version: '3.5'

services:
  etcd:
    container_name: milvus-etcd
    image: quay.io/coreos/etcd:v3.5.5
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_AUTO_COMPACTION_RETENTION=1000
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
      - ETCD_SNAPSHOT_COUNT=50000
    volumes:
      - ./volumes/etcd:/etcd
    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd

  minio:
    container_name: milvus-minio
    image: minio/minio:RELEASE.2023-03-13T19-46-17Z
    environment:
      MINIO_ACCESS_KEY: minioadmin
      MINIO_SECRET_KEY: minioadmin
    volumes:
      - ./volumes/minio:/minio_data
    command: minio server /minio_data --console-address ":9001"

  standalone:
    container_name: milvus-standalone
    image: milvusdb/milvus:v2.4.0
    command: ["milvus", "run", "standalone"]
    environment:
      ETCD_ENDPOINTS: etcd:2379
      MINIO_ADDRESS: minio:9000
    volumes:
      - ./volumes/milvus:/var/lib/milvus
    ports:
      - "19530:19530"
      - "9091:9091"
    depends_on:
      - etcd
      - minio

docker compose up -d

# Check status
docker compose ps

# Logs if there are errors
docker compose logs standalone --tail=50

Wait about 30 seconds for Milvus to fully start up (etcd and MinIO need to be ready first). Then install the Python SDK:

pip install pymilvus openai  # openai for generating embeddings

Understanding How Milvus Works

Milvus organizes data in the model: Database → Collection → Partition → Entity. Mapped to PostgreSQL, that’s: Database → Table → (no equivalent) → Row. The core difference: every entity must have at least one vector field — this is what Milvus was built to handle, and it’s non-negotiable.

When searching, instead of WHERE content LIKE '%keyword%', Milvus computes the distance between vectors — cosine similarity or L2 distance — to find the entities semantically closest to the query.

Creating Your First Collection

from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType, utility

# Connect
connections.connect("default", host="localhost", port="19530")

# Define schema
fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema(name="source", dtype=DataType.VARCHAR, max_length=500),
    FieldSchema(name="content", dtype=DataType.VARCHAR, max_length=5000),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=1536),  # OpenAI text-embedding-3-small
]

schema = CollectionSchema(fields, description="Knowledge base for RAG")
collection = Collection("knowledge_base", schema)

# Create index for vector field (REQUIRED before searching)
index_params = {
    "metric_type": "COSINE",
    "index_type": "IVF_FLAT",
    "params": {"nlist": 1024},
}
collection.create_index("embedding", index_params)
print("Collection created successfully:", collection.name)

Inserting Documents into Milvus

from openai import OpenAI

client = OpenAI()  # requires OPENAI_API_KEY in env

def get_embedding(text: str) -> list[float]:
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

# Prepare data
documents = [
    {"source": "linux_guide.md", "content": "The chmod command changes file permissions in Linux..."},
    {"source": "docker_tips.md", "content": "Docker volumes help persist data when containers restart..."},
    {"source": "git_workflow.md", "content": "Git rebase creates a cleaner commit history compared to merge..."},
]

# Create embeddings and insert
for doc in documents:
    embedding = get_embedding(doc["content"])
    collection.insert({
        "source": [doc["source"]],
        "content": [doc["content"]],
        "embedding": [embedding],
    })

collection.flush()  # Ensure data is written to disk
print(f"Inserted {collection.num_entities} entities")

Building a Complete RAG Pipeline

This is the pattern I’m currently using in production. A lesson from migrating 100GB from MySQL to PostgreSQL — 3 days planning, 1 day execution — is that designing your schema carefully upfront saves a lot of headaches down the road. With Milvus this is even more true: pick the wrong dim and you have to drop the collection and start over, there’s no migration path.

Semantic Search Function

def semantic_search(query: str, top_k: int = 5) -> list[dict]:
    # Load collection into memory before searching
    collection.load()
    
    query_embedding = get_embedding(query)
    
    search_params = {
        "metric_type": "COSINE",
        "params": {"nprobe": 10},  # Increase for higher accuracy, decrease for faster speed
    }
    
    results = collection.search(
        data=[query_embedding],
        anns_field="embedding",
        param=search_params,
        limit=top_k,
        output_fields=["source", "content"],
    )
    
    hits = []
    for hit in results[0]:
        hits.append({
            "source": hit.entity.get("source"),
            "content": hit.entity.get("content"),
            "score": hit.score,
        })
    
    return hits

# Try it now
results = semantic_search("how to change file permissions in Linux")
for r in results:
    print(f"[{r['score']:.3f}] {r['source']}: {r['content'][:80]}...")

Connecting to an LLM to Answer Questions

def rag_answer(question: str) -> str:
    # Step 1: Find relevant context
    relevant_docs = semantic_search(question, top_k=3)
    context = "\n\n".join([d["content"] for d in relevant_docs])
    
    # Step 2: Send to LLM with context
    prompt = f"""Based on the following documents, answer the question:

Documents:
{context}

Question: {question}

Answer:"""
    
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.3,
    )
    
    return response.choices[0].message.content

# Test
answer = rag_answer("How do I change file permissions in Linux?")
print(answer)

Optimizing Milvus for Production

Choosing the Right Index Type

This is the point many people overlook and then complain that Milvus is slow. Each index type has different trade-offs:

IVF_FLAT: Balances speed and accuracy — good for most use cases, data under 10M vectors
HNSW: Faster at query time but uses more RAM — use when you need low latency and have sufficient RAM
IVF_SQ8: Compresses vectors 4x, saves RAM — use when data is large but RAM is limited, accepting ~2% accuracy loss
FLAT: Brute force, 100% accurate but slow — only for testing or small datasets under 1M vectors

# HNSW index — consider for production if you have sufficient RAM
hnsw_params = {
    "metric_type": "COSINE",
    "index_type": "HNSW",
    "params": {
        "M": 16,           # Connections per node, higher M = more accurate, more RAM usage
        "efConstruction": 200,  # Accuracy during index build
    },
}

# When searching with HNSW
hnsw_search_params = {
    "metric_type": "COSINE",
    "params": {"ef": 64},  # Higher ef = more accurate, slightly slower
}

Using Partitions to Organize Data

If your knowledge base has many different sources — documents, FAQs, code snippets — use partitions to speed up searches:

# Create partitions by document type
collection.create_partition("docs")
collection.create_partition("faq")
collection.create_partition("code")

# Insert into specific partition
collection.insert(
    {"source": ["manual.pdf"], "content": ["..."], "embedding": [[...]]},
    partition_name="docs"
)

# Search only in docs partition — faster than a full collection search
results = collection.search(
    data=[query_embedding],
    anns_field="embedding",
    param=search_params,
    limit=5,
    partition_names=["docs"],  # Filter by partition
    output_fields=["source", "content"],
)

Hybrid Search: Combining Vectors and Filters

# Semantic search + filter by source file
results = collection.search(
    data=[query_embedding],
    anns_field="embedding",
    param=search_params,
    limit=5,
    expr='source like "linux%"',  # Only fetch documents from linux_* files
    output_fields=["source", "content"],
)

Practical Tips for Using Milvus

Chunking Strategy Matters More Than You Think

I’ve tried many ways to chunk documents and arrived at one clear conclusion: chunks under 100 tokens lose context, while chunks over 1000 tokens make the vector embedding too “diluted” — everything matches a little, nothing really stands out. The sweet spot is 300–500 tokens, with 50–100 token overlap between chunks to avoid breaking the flow of ideas at boundaries.

def chunk_text(text: str, chunk_size: int = 400, overlap: int = 80) -> list[str]:
    words = text.split()
    chunks = []
    start = 0
    while start < len(words):
        end = start + chunk_size
        chunk = " ".join(words[start:end])
        chunks.append(chunk)
        start += chunk_size - overlap  # Overlap to preserve context at chunk boundaries
    return chunks

Backing Up Milvus Data

# Backup all volumes (etcd + minio)
docker compose stop
tar -czf milvus-backup-$(date +%Y%m%d).tar.gz ./volumes/
docker compose start

# Or use the official Milvus Backup tool
pip install pymilvus[bulk_writer]
# https://github.com/zilliztech/milvus-backup

Monitoring with the Milvus Web UI

Attu is Milvus’s official web UI — quite handy for viewing collection stats, running queries directly, and monitoring index status without typing commands:

docker run -d \
  --name attu \
  -p 3000:3000 \
  -e MILVUS_URL=localhost:19530 \
  zilliz/attu:v2.4

# Access: http://localhost:3000

When to Choose Milvus Over Other Solutions

This is a question I get asked often. The short answer: choose Milvus when your vector data exceeds 1 million records and you need distributed scaling, or when your team is already comfortable with the Kubernetes ecosystem. Smaller data, running single-node? There are lighter-weight options that fit better. Milvus truly shines at enterprise scale — billions of vectors, multi-tenancy, high availability.