Install ChromaDB and Run It in 5 Minutes
If you’re building an AI Agent or a RAG system, ChromaDB is something you need to know right now. Let’s dive straight into the code first, explanations after.
Install:
pip install chromadb
# If you need a local embedding model
pip install chromadb sentence-transformers
Initialize and add data right away:
import chromadb
# In-memory client — quick testing, no file needed
client = chromadb.Client()
# Create a collection (equivalent to a table in SQL)
collection = client.create_collection(name="my_knowledge_base")
# Add documents
collection.add(
documents=[
"Docker is an application containerization platform",
"Kubernetes manages containers at scale",
"Redis is a high-speed in-memory database"
],
ids=["doc1", "doc2", "doc3"]
)
# Semantic search
results = collection.query(
query_texts=["container orchestration tool"],
n_results=2
)
print(results['documents'])
# Output: [['Kubernetes manages containers at scale', 'Docker is an application containerization platform']]
Done. That’s ChromaDB in its most basic form — semantic search without writing a single line of SQL.
What Is ChromaDB and Why Do You Need It?
ChromaDB is an open-source vector database originally built to serve AI applications. What sets it apart from traditional databases: instead of searching by exact value (exact match), it searches by semantic meaning (semantic similarity) — a query for “container orchestration” returns results about Kubernetes even though none of the words match exactly.
When I was building an internal tool to query our team’s documentation, I tried Elasticsearch full-text search — the results were pretty poor. When a user typed “container crashing” it couldn’t find the article about “pod restart loops in Kubernetes”. After switching to ChromaDB with embeddings, the results improved dramatically.
How Does ChromaDB Work?
- Input text → Embedding model converts it into a vector (array of floating-point numbers)
- The vector is stored in ChromaDB along with optional metadata
- When querying, the query text is also embedded into a vector
- ChromaDB computes cosine similarity and returns the closest documents
By default, ChromaDB uses all-MiniLM-L6-v2 from sentence-transformers — small, fast, and good enough for most use cases. If you need higher accuracy, you can swap in OpenAI embeddings or a custom model — just replace the embedding function when creating the collection.
Persistent Storage: Saving Data to Disk
The in-memory client is convenient for quick testing, but a restart wipes all data. For production, use PersistentClient:
import chromadb
# Save to a local directory
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection(
name="devops_knowledge",
metadata={"hnsw:space": "cosine"} # cosine similarity
)
Data is saved to ./chroma_db as soon as you add documents. Restart the process and your data is still there.
Building a Knowledge Base for Your AI Agent
RAG (Retrieval-Augmented Generation) is the use case I reach for ChromaDB most often. Instead of stuffing your entire documentation into the LLM’s context (a 50-page doc file already burns ~40k tokens), you store it in ChromaDB and retrieve only the 3–5 relevant passages when needed.
Indexing Documents into ChromaDB
import chromadb
from pathlib import Path
client = chromadb.PersistentClient(path="./knowledge_db")
collection = client.get_or_create_collection("docs")
def index_markdown_files(docs_dir: str):
docs_path = Path(docs_dir)
for md_file in docs_path.glob("**/*.md"):
content = md_file.read_text(encoding="utf-8")
# Chunk by paragraph, skip chunks that are too short
chunks = [c.strip() for c in content.split("\n\n") if len(c.strip()) > 50]
if not chunks:
continue
collection.add(
documents=chunks,
ids=[f"{md_file.stem}_{i}" for i in range(len(chunks))],
metadatas=[{"source": str(md_file), "chunk": i} for i in range(len(chunks))]
)
print(f"Indexed {md_file.name}: {len(chunks)} chunks")
index_markdown_files("./docs")
Querying and Retrieving Context
def ask_knowledge_base(question: str, n_results: int = 3) -> str:
results = collection.query(
query_texts=[question],
n_results=n_results,
include=["documents", "metadatas", "distances"]
)
contexts = results["documents"][0]
sources = [m["source"] for m in results["metadatas"][0]]
distances = results["distances"][0]
# distance < 1.5 = still relevant (cosine distance: 0=identical, 2=completely different)
relevant = [
(ctx, src)
for ctx, src, dist in zip(contexts, sources, distances)
if dist < 1.5
]
if not relevant:
return "No relevant information found."
return "\n---\n".join([ctx for ctx, _ in relevant])
context = ask_knowledge_base("How do I restart a service in systemd?")
print(context)
Metadata Filtering: Conditional Search
Got a collection with many different types of documents? Metadata filters help narrow the search scope — more relevant results and faster queries too:
# Only search Linux docs
results = collection.query(
query_texts=["how to mount a disk"],
n_results=3,
where={"category": "linux"}
)
# Combine multiple conditions
results = collection.query(
query_texts=["backup database"],
n_results=5,
where={
"$and": [
{"category": {"$in": ["postgresql", "mysql"]}},
{"language": "vi"}
]
}
)
Running ChromaDB as a Docker Server
When multiple services or multiple team members need shared access, running ChromaDB as a standalone Docker container is the cleanest approach:
docker run -d \
--name chromadb \
-p 8000:8000 \
-v ./chroma_data:/chroma/chroma \
chromadb/chroma:latest
# Check if the server is running
curl http://localhost:8000/api/v1/heartbeat
# Connect from Python
import chromadb
client = chromadb.HttpClient(host="localhost", port=8000)
collection = client.get_or_create_collection("shared_knowledge")
Or use Docker Compose if you want to integrate it with other services:
version: "3.8"
services:
chromadb:
image: chromadb/chroma:latest
ports:
- "8000:8000"
volumes:
- chroma_data:/chroma/chroma
environment:
- ALLOW_RESET=true # Enable only for development
volumes:
chroma_data:
Practical Tips for Working with ChromaDB
Chunk Size Has a Big Impact on Result Quality
From my testing, chunks of around 300–500 tokens tend to give the best results. Too small and you lose context; too large and the embedding can’t capture the main point. For long technical documents, use a sliding window — 20% overlap between adjacent chunks so information isn’t lost at boundaries.
Always Store Complete Source Metadata
Retrieved information is useless if you don’t know where it came from — you can’t cite it, you can’t debug it. Store at minimum: source file, section title, last updated date. I once forgot to store the source and had to re-index all 500 files — 2 hours wasted just from skipping this step.
Use Upsert to Update Documents
# ChromaDB has no separate UPDATE — use upsert
collection.upsert(
documents=["Updated content"],
ids=["doc1"] # Existing ID → overwrites the document
)
Backing Up ChromaDB Is Extremely Simple
PersistentClient data is stored as SQLite + binary files in the specified directory. Backup is far simpler than a traditional database — just copy the entire directory:
# Manual backup
cp -r ./chroma_db ./chroma_db_backup_$(date +%Y%m%d)
# Compress to save space
tar -czf chroma_backup_$(date +%Y%m%d).tar.gz ./chroma_db
ChromaDB vs. Qdrant — Which Should You Choose?
The blog already has a dedicated post on Qdrant, so I’ll keep this brief: ChromaDB is easier to install, more Python-friendly in its API, and a great fit when you need to prototype quickly or are working with a small team. Qdrant shines at large production scale, complex filtering, and multi-language client support. Working on a side project or internal tool? Start with ChromaDB. You can always migrate to Qdrant later when you need to scale.
