Running Milvus in 5 Minutes with Docker Compose
Milvus isn’t like Redis or PostgreSQL. It was built from the ground up to do one thing: store and search vectors — the data type AI uses to “understand” semantics, which is fundamentally different from ordinary keyword matching. I tested several vector databases before settling on Milvus for a RAG project processing 500K technical documents, and the deciding factor was its ability to scale with large data volumes.
Create a docker-compose.yml file and run it right away:
version: '3.5'
services:
etcd:
container_name: milvus-etcd
image: quay.io/coreos/etcd:v3.5.5
environment:
- ETCD_AUTO_COMPACTION_MODE=revision
- ETCD_AUTO_COMPACTION_RETENTION=1000
- ETCD_QUOTA_BACKEND_BYTES=4294967296
- ETCD_SNAPSHOT_COUNT=50000
volumes:
- ./volumes/etcd:/etcd
command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
minio:
container_name: milvus-minio
image: minio/minio:RELEASE.2023-03-13T19-46-17Z
environment:
MINIO_ACCESS_KEY: minioadmin
MINIO_SECRET_KEY: minioadmin
volumes:
- ./volumes/minio:/minio_data
command: minio server /minio_data --console-address ":9001"
standalone:
container_name: milvus-standalone
image: milvusdb/milvus:v2.4.0
command: ["milvus", "run", "standalone"]
environment:
ETCD_ENDPOINTS: etcd:2379
MINIO_ADDRESS: minio:9000
volumes:
- ./volumes/milvus:/var/lib/milvus
ports:
- "19530:19530"
- "9091:9091"
depends_on:
- etcd
- minio
docker compose up -d
# Check status
docker compose ps
# Logs if there are errors
docker compose logs standalone --tail=50
Wait about 30 seconds for Milvus to fully start up (etcd and MinIO need to be ready first). Then install the Python SDK:
pip install pymilvus openai # openai for generating embeddings
Understanding How Milvus Works
Milvus organizes data in the model: Database → Collection → Partition → Entity. Mapped to PostgreSQL, that’s: Database → Table → (no equivalent) → Row. The core difference: every entity must have at least one vector field — this is what Milvus was built to handle, and it’s non-negotiable.
When searching, instead of WHERE content LIKE '%keyword%', Milvus computes the distance between vectors — cosine similarity or L2 distance — to find the entities semantically closest to the query.
Creating Your First Collection
from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType, utility
# Connect
connections.connect("default", host="localhost", port="19530")
# Define schema
fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
FieldSchema(name="source", dtype=DataType.VARCHAR, max_length=500),
FieldSchema(name="content", dtype=DataType.VARCHAR, max_length=5000),
FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=1536), # OpenAI text-embedding-3-small
]
schema = CollectionSchema(fields, description="Knowledge base for RAG")
collection = Collection("knowledge_base", schema)
# Create index for vector field (REQUIRED before searching)
index_params = {
"metric_type": "COSINE",
"index_type": "IVF_FLAT",
"params": {"nlist": 1024},
}
collection.create_index("embedding", index_params)
print("Collection created successfully:", collection.name)
Inserting Documents into Milvus
from openai import OpenAI
client = OpenAI() # requires OPENAI_API_KEY in env
def get_embedding(text: str) -> list[float]:
response = client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding
# Prepare data
documents = [
{"source": "linux_guide.md", "content": "The chmod command changes file permissions in Linux..."},
{"source": "docker_tips.md", "content": "Docker volumes help persist data when containers restart..."},
{"source": "git_workflow.md", "content": "Git rebase creates a cleaner commit history compared to merge..."},
]
# Create embeddings and insert
for doc in documents:
embedding = get_embedding(doc["content"])
collection.insert({
"source": [doc["source"]],
"content": [doc["content"]],
"embedding": [embedding],
})
collection.flush() # Ensure data is written to disk
print(f"Inserted {collection.num_entities} entities")
Building a Complete RAG Pipeline
This is the pattern I’m currently using in production. A lesson from migrating 100GB from MySQL to PostgreSQL — 3 days planning, 1 day execution — is that designing your schema carefully upfront saves a lot of headaches down the road. With Milvus this is even more true: pick the wrong dim and you have to drop the collection and start over, there’s no migration path.
Semantic Search Function
def semantic_search(query: str, top_k: int = 5) -> list[dict]:
# Load collection into memory before searching
collection.load()
query_embedding = get_embedding(query)
search_params = {
"metric_type": "COSINE",
"params": {"nprobe": 10}, # Increase for higher accuracy, decrease for faster speed
}
results = collection.search(
data=[query_embedding],
anns_field="embedding",
param=search_params,
limit=top_k,
output_fields=["source", "content"],
)
hits = []
for hit in results[0]:
hits.append({
"source": hit.entity.get("source"),
"content": hit.entity.get("content"),
"score": hit.score,
})
return hits
# Try it now
results = semantic_search("how to change file permissions in Linux")
for r in results:
print(f"[{r['score']:.3f}] {r['source']}: {r['content'][:80]}...")
Connecting to an LLM to Answer Questions
def rag_answer(question: str) -> str:
# Step 1: Find relevant context
relevant_docs = semantic_search(question, top_k=3)
context = "\n\n".join([d["content"] for d in relevant_docs])
# Step 2: Send to LLM with context
prompt = f"""Based on the following documents, answer the question:
Documents:
{context}
Question: {question}
Answer:"""
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
temperature=0.3,
)
return response.choices[0].message.content
# Test
answer = rag_answer("How do I change file permissions in Linux?")
print(answer)
Optimizing Milvus for Production
Choosing the Right Index Type
This is the point many people overlook and then complain that Milvus is slow. Each index type has different trade-offs:
- IVF_FLAT: Balances speed and accuracy — good for most use cases, data under 10M vectors
- HNSW: Faster at query time but uses more RAM — use when you need low latency and have sufficient RAM
- IVF_SQ8: Compresses vectors 4x, saves RAM — use when data is large but RAM is limited, accepting ~2% accuracy loss
- FLAT: Brute force, 100% accurate but slow — only for testing or small datasets under 1M vectors
# HNSW index — consider for production if you have sufficient RAM
hnsw_params = {
"metric_type": "COSINE",
"index_type": "HNSW",
"params": {
"M": 16, # Connections per node, higher M = more accurate, more RAM usage
"efConstruction": 200, # Accuracy during index build
},
}
# When searching with HNSW
hnsw_search_params = {
"metric_type": "COSINE",
"params": {"ef": 64}, # Higher ef = more accurate, slightly slower
}
Using Partitions to Organize Data
If your knowledge base has many different sources — documents, FAQs, code snippets — use partitions to speed up searches:
# Create partitions by document type
collection.create_partition("docs")
collection.create_partition("faq")
collection.create_partition("code")
# Insert into specific partition
collection.insert(
{"source": ["manual.pdf"], "content": ["..."], "embedding": [[...]]},
partition_name="docs"
)
# Search only in docs partition — faster than a full collection search
results = collection.search(
data=[query_embedding],
anns_field="embedding",
param=search_params,
limit=5,
partition_names=["docs"], # Filter by partition
output_fields=["source", "content"],
)
Hybrid Search: Combining Vectors and Filters
# Semantic search + filter by source file
results = collection.search(
data=[query_embedding],
anns_field="embedding",
param=search_params,
limit=5,
expr='source like "linux%"', # Only fetch documents from linux_* files
output_fields=["source", "content"],
)
Practical Tips for Using Milvus
Chunking Strategy Matters More Than You Think
I’ve tried many ways to chunk documents and arrived at one clear conclusion: chunks under 100 tokens lose context, while chunks over 1000 tokens make the vector embedding too “diluted” — everything matches a little, nothing really stands out. The sweet spot is 300–500 tokens, with 50–100 token overlap between chunks to avoid breaking the flow of ideas at boundaries.
def chunk_text(text: str, chunk_size: int = 400, overlap: int = 80) -> list[str]:
words = text.split()
chunks = []
start = 0
while start < len(words):
end = start + chunk_size
chunk = " ".join(words[start:end])
chunks.append(chunk)
start += chunk_size - overlap # Overlap to preserve context at chunk boundaries
return chunks
Backing Up Milvus Data
# Backup all volumes (etcd + minio)
docker compose stop
tar -czf milvus-backup-$(date +%Y%m%d).tar.gz ./volumes/
docker compose start
# Or use the official Milvus Backup tool
pip install pymilvus[bulk_writer]
# https://github.com/zilliztech/milvus-backup
Monitoring with the Milvus Web UI
Attu is Milvus’s official web UI — quite handy for viewing collection stats, running queries directly, and monitoring index status without typing commands:
docker run -d \
--name attu \
-p 3000:3000 \
-e MILVUS_URL=localhost:19530 \
zilliz/attu:v2.4
# Access: http://localhost:3000
When to Choose Milvus Over Other Solutions
This is a question I get asked often. The short answer: choose Milvus when your vector data exceeds 1 million records and you need distributed scaling, or when your team is already comfortable with the Kubernetes ecosystem. Smaller data, running single-node? There are lighter-weight options that fit better. Milvus truly shines at enterprise scale — billions of vectors, multi-tenancy, high availability.

