Getting Started with ChromaDB: The Open-Source Vector Database for Building AI Agent Knowledge Systems – ITFROMZERO

Table of Contents

Install ChromaDB and Run It in 5 Minutes

If you’re building an AI Agent or a RAG system, ChromaDB is something you need to know right now. Let’s dive straight into the code first, explanations after.

Install:

pip install chromadb
# If you need a local embedding model
pip install chromadb sentence-transformers

Initialize and add data right away:

import chromadb

# In-memory client — quick testing, no file needed
client = chromadb.Client()

# Create a collection (equivalent to a table in SQL)
collection = client.create_collection(name="my_knowledge_base")

# Add documents
collection.add(
    documents=[
        "Docker is an application containerization platform",
        "Kubernetes manages containers at scale",
        "Redis is a high-speed in-memory database"
    ],
    ids=["doc1", "doc2", "doc3"]
)

# Semantic search
results = collection.query(
    query_texts=["container orchestration tool"],
    n_results=2
)

print(results['documents'])
# Output: [['Kubernetes manages containers at scale', 'Docker is an application containerization platform']]

Done. That’s ChromaDB in its most basic form — semantic search without writing a single line of SQL.

What Is ChromaDB and Why Do You Need It?

ChromaDB is an open-source vector database originally built to serve AI applications. What sets it apart from traditional databases: instead of searching by exact value (exact match), it searches by semantic meaning (semantic similarity) — a query for “container orchestration” returns results about Kubernetes even though none of the words match exactly.

When I was building an internal tool to query our team’s documentation, I tried Elasticsearch full-text search — the results were pretty poor. When a user typed “container crashing” it couldn’t find the article about “pod restart loops in Kubernetes”. After switching to ChromaDB with embeddings, the results improved dramatically.

How Does ChromaDB Work?

Input text → Embedding model converts it into a vector (array of floating-point numbers)
The vector is stored in ChromaDB along with optional metadata
When querying, the query text is also embedded into a vector
ChromaDB computes cosine similarity and returns the closest documents

By default, ChromaDB uses all-MiniLM-L6-v2 from sentence-transformers — small, fast, and good enough for most use cases. If you need higher accuracy, you can swap in OpenAI embeddings or a custom model — just replace the embedding function when creating the collection.

Persistent Storage: Saving Data to Disk

The in-memory client is convenient for quick testing, but a restart wipes all data. For production, use PersistentClient:

import chromadb

# Save to a local directory
client = chromadb.PersistentClient(path="./chroma_db")

collection = client.get_or_create_collection(
    name="devops_knowledge",
    metadata={"hnsw:space": "cosine"}  # cosine similarity
)

Data is saved to ./chroma_db as soon as you add documents. Restart the process and your data is still there.

Building a Knowledge Base for Your AI Agent

RAG (Retrieval-Augmented Generation) is the use case I reach for ChromaDB most often. Instead of stuffing your entire documentation into the LLM’s context (a 50-page doc file already burns ~40k tokens), you store it in ChromaDB and retrieve only the 3–5 relevant passages when needed.

Indexing Documents into ChromaDB

import chromadb
from pathlib import Path

client = chromadb.PersistentClient(path="./knowledge_db")
collection = client.get_or_create_collection("docs")

def index_markdown_files(docs_dir: str):
    docs_path = Path(docs_dir)

    for md_file in docs_path.glob("**/*.md"):
        content = md_file.read_text(encoding="utf-8")

        # Chunk by paragraph, skip chunks that are too short
        chunks = [c.strip() for c in content.split("\n\n") if len(c.strip()) > 50]

        if not chunks:
            continue

        collection.add(
            documents=chunks,
            ids=[f"{md_file.stem}_{i}" for i in range(len(chunks))],
            metadatas=[{"source": str(md_file), "chunk": i} for i in range(len(chunks))]
        )
        print(f"Indexed {md_file.name}: {len(chunks)} chunks")

index_markdown_files("./docs")

Querying and Retrieving Context

def ask_knowledge_base(question: str, n_results: int = 3) -> str:
    results = collection.query(
        query_texts=[question],
        n_results=n_results,
        include=["documents", "metadatas", "distances"]
    )

    contexts = results["documents"][0]
    sources = [m["source"] for m in results["metadatas"][0]]
    distances = results["distances"][0]

    # distance < 1.5 = still relevant (cosine distance: 0=identical, 2=completely different)
    relevant = [
        (ctx, src)
        for ctx, src, dist in zip(contexts, sources, distances)
        if dist < 1.5
    ]

    if not relevant:
        return "No relevant information found."

    return "\n---\n".join([ctx for ctx, _ in relevant])

context = ask_knowledge_base("How do I restart a service in systemd?")
print(context)

Metadata Filtering: Conditional Search

Got a collection with many different types of documents? Metadata filters help narrow the search scope — more relevant results and faster queries too:

# Only search Linux docs
results = collection.query(
    query_texts=["how to mount a disk"],
    n_results=3,
    where={"category": "linux"}
)

# Combine multiple conditions
results = collection.query(
    query_texts=["backup database"],
    n_results=5,
    where={
        "$and": [
            {"category": {"$in": ["postgresql", "mysql"]}},
            {"language": "vi"}
        ]
    }
)

Running ChromaDB as a Docker Server

When multiple services or multiple team members need shared access, running ChromaDB as a standalone Docker container is the cleanest approach:

docker run -d \
  --name chromadb \
  -p 8000:8000 \
  -v ./chroma_data:/chroma/chroma \
  chromadb/chroma:latest

# Check if the server is running
curl http://localhost:8000/api/v1/heartbeat

# Connect from Python
import chromadb

client = chromadb.HttpClient(host="localhost", port=8000)
collection = client.get_or_create_collection("shared_knowledge")

Or use Docker Compose if you want to integrate it with other services:

version: "3.8"
services:
  chromadb:
    image: chromadb/chroma:latest
    ports:
      - "8000:8000"
    volumes:
      - chroma_data:/chroma/chroma
    environment:
      - ALLOW_RESET=true  # Enable only for development

volumes:
  chroma_data:

Practical Tips for Working with ChromaDB

Chunk Size Has a Big Impact on Result Quality

From my testing, chunks of around 300–500 tokens tend to give the best results. Too small and you lose context; too large and the embedding can’t capture the main point. For long technical documents, use a sliding window — 20% overlap between adjacent chunks so information isn’t lost at boundaries.

Always Store Complete Source Metadata

Retrieved information is useless if you don’t know where it came from — you can’t cite it, you can’t debug it. Store at minimum: source file, section title, last updated date. I once forgot to store the source and had to re-index all 500 files — 2 hours wasted just from skipping this step.

Use Upsert to Update Documents

# ChromaDB has no separate UPDATE — use upsert
collection.upsert(
    documents=["Updated content"],
    ids=["doc1"]  # Existing ID → overwrites the document
)

Backing Up ChromaDB Is Extremely Simple

PersistentClient data is stored as SQLite + binary files in the specified directory. Backup is far simpler than a traditional database — just copy the entire directory:

# Manual backup
cp -r ./chroma_db ./chroma_db_backup_$(date +%Y%m%d)

# Compress to save space
tar -czf chroma_backup_$(date +%Y%m%d).tar.gz ./chroma_db

ChromaDB vs. Qdrant — Which Should You Choose?

The blog already has a dedicated post on Qdrant, so I’ll keep this brief: ChromaDB is easier to install, more Python-friendly in its API, and a great fit when you need to prototype quickly or are working with a small team. Qdrant shines at large production scale, complex filtering, and multi-language client support. Working on a side project or internal tool? Start with ChromaDB. You can always migrate to Qdrant later when you need to scale.