Cài đặt Milvus Vector Database với Docker: Xây dựng kho lưu trữ tri thức cho RAG và AI – ITFROMZERO

Table of Contents

Chạy Milvus trong 5 phút với Docker Compose

Milvus không giống Redis hay PostgreSQL. Nó được xây từ đầu để làm đúng một việc: lưu và tìm kiếm vector — kiểu dữ liệu mà AI dùng để “hiểu” ngữ nghĩa, khác hoàn toàn với so khớp từ khóa thông thường. Mình đã test qua vài vector DB trước khi chốt dùng Milvus cho dự án RAG xử lý 500K tài liệu kỹ thuật, và lý do quyết định là khả năng scale khi data lớn.

Tạo file docker-compose.yml và chạy ngay:

version: '3.5'

services:
  etcd:
    container_name: milvus-etcd
    image: quay.io/coreos/etcd:v3.5.5
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_AUTO_COMPACTION_RETENTION=1000
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
      - ETCD_SNAPSHOT_COUNT=50000
    volumes:
      - ./volumes/etcd:/etcd
    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd

  minio:
    container_name: milvus-minio
    image: minio/minio:RELEASE.2023-03-13T19-46-17Z
    environment:
      MINIO_ACCESS_KEY: minioadmin
      MINIO_SECRET_KEY: minioadmin
    volumes:
      - ./volumes/minio:/minio_data
    command: minio server /minio_data --console-address ":9001"

  standalone:
    container_name: milvus-standalone
    image: milvusdb/milvus:v2.4.0
    command: ["milvus", "run", "standalone"]
    environment:
      ETCD_ENDPOINTS: etcd:2379
      MINIO_ADDRESS: minio:9000
    volumes:
      - ./volumes/milvus:/var/lib/milvus
    ports:
      - "19530:19530"
      - "9091:9091"
    depends_on:
      - etcd
      - minio

docker compose up -d

# Kiểm tra status
docker compose ps

# Logs nếu có lỗi
docker compose logs standalone --tail=50

Chờ khoảng 30 giây cho Milvus khởi động hoàn toàn (etcd và MinIO cần sẵn sàng trước). Sau đó cài Python SDK:

pip install pymilvus openai  # openai để tạo embedding

Hiểu Milvus hoạt động như thế nào

Milvus tổ chức dữ liệu theo mô hình: Database → Collection → Partition → Entity. Đối chiếu với PostgreSQL thì là: Database → Table → (không có tương đương) → Row. Điểm khác cốt lõi: mỗi entity phải có ít nhất một trường vector — đây là thứ Milvus được xây ra để xử lý, không thể thiếu.

Khi search, thay vì WHERE content LIKE '%từ khóa%', Milvus tính khoảng cách giữa các vector — cosine similarity hoặc L2 distance — để tìm ra những entity có ngữ nghĩa gần với query nhất.

Tạo Collection đầu tiên

from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType, utility

# Kết nối
connections.connect("default", host="localhost", port="19530")

# Định nghĩa schema
fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema(name="source", dtype=DataType.VARCHAR, max_length=500),
    FieldSchema(name="content", dtype=DataType.VARCHAR, max_length=5000),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=1536),  # OpenAI text-embedding-3-small
]

schema = CollectionSchema(fields, description="Knowledge base cho RAG")
collection = Collection("knowledge_base", schema)

# Tạo index cho vector field (BẮT BUỘC trước khi search)
index_params = {
    "metric_type": "COSINE",
    "index_type": "IVF_FLAT",
    "params": {"nlist": 1024},
}
collection.create_index("embedding", index_params)
print("Collection đã tạo xong:", collection.name)

Đưa tài liệu vào Milvus

from openai import OpenAI

client = OpenAI()  # cần OPENAI_API_KEY trong env

def get_embedding(text: str) -> list[float]:
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

# Chuẩn bị data
documents = [
    {"source": "linux_guide.md", "content": "Lệnh chmod thay đổi quyền truy cập file trong Linux..."},
    {"source": "docker_tips.md", "content": "Docker volume giúp persist data khi container restart..."},
    {"source": "git_workflow.md", "content": "Git rebase làm lịch sử commit gọn hơn so với merge..."},
]

# Tạo embedding và insert
for doc in documents:
    embedding = get_embedding(doc["content"])
    collection.insert({
        "source": [doc["source"]],
        "content": [doc["content"]],
        "embedding": [embedding],
    })

collection.flush()  # Đảm bảo data được ghi xuống disk
print(f"Đã insert {collection.num_entities} entities")

Xây dựng RAG pipeline hoàn chỉnh

Đây là pattern mình đang dùng trong production. Bài học từ hồi migrate 100GB từ MySQL sang PostgreSQL — 3 ngày planning, 1 ngày thực thi — là: thiết kế schema kỹ từ đầu tiết kiệm rất nhiều đau đầu về sau. Với Milvus càng đúng hơn: chọn sai dim là phải drop collection và làm lại từ đầu, không có migration path.

Hàm search semantic

def semantic_search(query: str, top_k: int = 5) -> list[dict]:
    # Load collection vào memory trước khi search
    collection.load()
    
    query_embedding = get_embedding(query)
    
    search_params = {
        "metric_type": "COSINE",
        "params": {"nprobe": 10},  # Tăng lên để chính xác hơn, giảm xuống để nhanh hơn
    }
    
    results = collection.search(
        data=[query_embedding],
        anns_field="embedding",
        param=search_params,
        limit=top_k,
        output_fields=["source", "content"],
    )
    
    hits = []
    for hit in results[0]:
        hits.append({
            "source": hit.entity.get("source"),
            "content": hit.entity.get("content"),
            "score": hit.score,
        })
    
    return hits

# Thử ngay
results = semantic_search("cách thay đổi permission file Linux")
for r in results:
    print(f"[{r['score']:.3f}] {r['source']}: {r['content'][:80]}...")

Kết nối với LLM để trả lời câu hỏi

def rag_answer(question: str) -> str:
    # Bước 1: Tìm context liên quan
    relevant_docs = semantic_search(question, top_k=3)
    context = "\n\n".join([d["content"] for d in relevant_docs])
    
    # Bước 2: Gửi cho LLM kèm context
    prompt = f"""Dựa vào tài liệu sau, trả lời câu hỏi bằng tiếng Việt:

Tài liệu:
{context}

Câu hỏi: {question}

Trả lời:"""
    
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.3,
    )
    
    return response.choices[0].message.content

# Test
answer = rag_answer("Làm thế nào để thay đổi quyền truy cập file trong Linux?")
print(answer)

Tối ưu Milvus cho production

Chọn index type phù hợp

Đây là điểm nhiều người bỏ qua rồi sau đó phàn nàn Milvus chậm. Mỗi index type có trade-off khác nhau:

IVF_FLAT: Cân bằng giữa tốc độ và độ chính xác — dùng cho hầu hết trường hợp, data dưới 10M vectors
HNSW: Nhanh hơn khi query nhưng tốn RAM hơn — dùng khi cần latency thấp, có đủ RAM
IVF_SQ8: Nén vector 4x, tiết kiệm RAM — dùng khi data lớn mà RAM hạn chế, chấp nhận mất ~2% accuracy
FLAT: Brute force, chính xác 100% nhưng chậm — chỉ dùng để test hoặc data nhỏ dưới 1M vectors

# HNSW index — cân nhắc dùng cho production nếu có đủ RAM
hnsw_params = {
    "metric_type": "COSINE",
    "index_type": "HNSW",
    "params": {
        "M": 16,           # Số kết nối mỗi node, tăng M = chính xác hơn, tốn RAM hơn
        "efConstruction": 200,  # Độ chính xác lúc build index
    },
}

# Khi search với HNSW
hnsw_search_params = {
    "metric_type": "COSINE",
    "params": {"ef": 64},  # Tăng ef = chính xác hơn, chậm hơn một chút
}

Partition để tổ chức data

Nếu knowledge base có nhiều nguồn khác nhau — documents, FAQs, code snippets — dùng partition để tăng tốc search:

# Tạo partition theo loại tài liệu
collection.create_partition("docs")
collection.create_partition("faq")
collection.create_partition("code")

# Insert vào partition cụ thể
collection.insert(
    {"source": ["manual.pdf"], "content": ["..."], "embedding": [[...]]},
    partition_name="docs"
)

# Search chỉ trong partition docs — nhanh hơn search toàn bộ
results = collection.search(
    data=[query_embedding],
    anns_field="embedding",
    param=search_params,
    limit=5,
    partition_names=["docs"],  # Lọc partition
    output_fields=["source", "content"],
)

Hybrid search: kết hợp vector + filter

# Tìm kiếm semantic + lọc theo source file
results = collection.search(
    data=[query_embedding],
    anns_field="embedding",
    param=search_params,
    limit=5,
    expr='source like "linux%"',  # Chỉ lấy tài liệu từ file linux_*
    output_fields=["source", "content"],
)

Tips thực tế khi dùng Milvus

Chunking strategy quan trọng hơn bạn nghĩ

Mình đã thử nhiều cách chunk tài liệu và rút ra một điều khá rõ: chunk dưới 100 token thì mất context, trên 1000 token thì vector embedding trở nên quá “loãng” — cái gì cũng match một chút, không có gì thực sự nổi bật. Khoảng tối ưu là 300–500 token, với overlap 50–100 token giữa các chunk để không đứt mạch ý ở ranh giới.

def chunk_text(text: str, chunk_size: int = 400, overlap: int = 80) -> list[str]:
    words = text.split()
    chunks = []
    start = 0
    while start < len(words):
        end = start + chunk_size
        chunk = " ".join(words[start:end])
        chunks.append(chunk)
        start += chunk_size - overlap  # Overlap để không mất context ở ranh giới
    return chunks

Backup data Milvus

# Backup toàn bộ volumes (etcd + minio)
docker compose stop
tar -czf milvus-backup-$(date +%Y%m%d).tar.gz ./volumes/
docker compose start

# Hoặc dùng Milvus Backup tool chính thức
pip install pymilvus[bulk_writer]
# https://github.com/zilliztech/milvus-backup

Monitor với Milvus web UI

Attu là web UI chính thức của Milvus — khá tiện để xem collection stats, query trực tiếp, và theo dõi index status mà không cần gõ lệnh:

docker run -d \
  --name attu \
  -p 3000:3000 \
  -e MILVUS_URL=localhost:19530 \
  zilliz/attu:v2.4

# Truy cập: http://localhost:3000

Khi nào dùng Milvus thay vì giải pháp khác

Câu này mình hay được hỏi. Ngắn gọn: chọn Milvus khi data vector vượt 1 triệu records và cần distributed scaling, hoặc khi team đã quen hệ sinh thái Kubernetes. Data nhỏ hơn, chạy single-node? Có những lựa chọn nhẹ hơn phù hợp hơn. Milvus thực sự phát huy ở enterprise scale — hàng tỷ vectors, multi-tenant, high availability.