Qdrant Installation and Usage Guide: A Powerful Vector Database for AI and RAG Applications – ITFROMZERO

Table of Contents

What is a vector database and why do we need it?

AI and Machine Learning are developing rapidly, leading to an increasing demand for processing and semantic search. Traditional databases like SQL or NoSQL, no matter how powerful, face significant challenges when working with embeddings. Embeddings are numerical representations of data (text, images, audio) generated by AI. This is where vector databases demonstrate their strength.

Comparing old approaches and the role of Vector Databases

When I first started building AI applications that required semantic search, I experimented with many solutions. The initial thought was often to store vectors in familiar databases, for example:

Relational databases (SQL): Store vectors as arrays of real numbers. Nearest neighbor search would require complex distance calculations across the entire dataset, which is extremely slow and resource-intensive for large datasets.
NoSQL databases (e.g., MongoDB, Redis): Slightly better at storing complex structures, but fundamentally, the efficient search problem still persists. They are not designed to be optimized for distance search algorithms like cosine similarity or Euclidean distance.
In-memory storage with libraries like NumPy: This method is fast but only suitable for small datasets and lacks persistence or scalability for production environments.

However, these methods quickly revealed their limitations. For real-world RAG projects requiring the processing of millions, even billions of vectors, searching needs to be not only accurate but also extremely fast. That’s why I turned to researching and implementing vector databases.

Qdrant: The optimal choice for RAG systems

After experimenting with many solutions, I decided to use Qdrant as the foundation for managing embeddings for itfromzero.com’s AI applications. Through practical experience, I found this to be a key skill for building scalable and high-performance AI systems.

Outstanding advantages of Qdrant

Superior Performance: Qdrant is built with Rust, a language known for its performance and memory safety. This allows Qdrant to handle millions of vector search queries in real-time, which is crucial for user experience in AI applications.
Powerful Filtering Capabilities: In addition to nearest neighbor search, Qdrant allows you to combine semantic search with powerful attribute filters (payload filtering). For example, you can find documents related to “AI” only within the “Technology” category or published after 2023. This is a key feature that other vector databases sometimes overlook or implement less effectively.
Flexible Scalability: Qdrant supports both standalone and distributed deployments, making it easy to scale according to data needs and traffic.
User-Friendly API: Provides a RESTful API and SDKs for multiple languages (Python, Go, JavaScript), making integration into existing applications easy.
Open Source and Community: As an active open-source project with a large supporting community, it ensures sustainable development and customization capabilities.

Some challenges when using Qdrant

Resource Requirements: With very large datasets, Qdrant can consume a fair amount of RAM and CPU to maintain high search performance. Server configuration needs to be optimized.
Learning New Concepts: For newcomers, concepts related to embeddings, vector spaces, and Approximate Nearest Neighbor (ANN) search algorithms might seem a bit unfamiliar at first.

In summary, despite a few minor challenges, Qdrant’s superior performance, flexible filtering capabilities, and stability have made it a top choice. In particular, it is highly suitable for RAG and AI systems that require accurate and fast semantic search. Personally, I have deployed Qdrant in a production environment for over 6 months and found it to operate extremely stably, meeting all requirements well.

Qdrant Server Installation Guide

The easiest and fastest way to run Qdrant is using Docker. This ensures an isolated and easy-to-manage environment.

Installation with Docker

Make sure you have Docker and Docker Compose installed on your system. If not, refer to the Docker installation guide on the homepage.

Create a docker-compose.yml file:

version: '3.8'
services:
  qdrant:
    image: qdrant/qdrant
    container_name: qdrant_server
    ports:
      - "6333:6333"  # REST interface
      - "6334:6334"  # gRPC interface
    volumes:
      - ./qdrant_data:/qdrant/data # To prevent data loss when the container stops
    restart: always

Save this file and run the following command in the same directory:

docker-compose up -d

This command will download the Qdrant image and run the server in the background. The Qdrant server will be available at http://localhost:6333 for the REST API and http://localhost:6334 for gRPC.

To check if Qdrant is running successfully, you can access http://localhost:6333/dashboard in your browser. You will see the Qdrant management interface.

Using Qdrant with Python SDK

Once the Qdrant server is running, we will interact with it via the Python SDK. Make sure you have installed the libraries:

pip install qdrant-client sentence-transformers

sentence-transformers will be used to create sample embeddings.

1. Initialize Qdrant Client

from qdrant_client import QdrantClient, models

# Initialize client to connect to Qdrant server
client = QdrantClient(host="localhost", port=6333) # For REST API
# Or client = QdrantClient(host="localhost", grpc_port=6334) # For gRPC

print("Connected to Qdrant server!")

2. Create Collection

A collection is where your embeddings are stored. When creating, you need to define the vector dimension and the distance metric to be used. For example, with the all-MiniLM-L6-v2 model from sentence-transformers, the dimension will be 384.

collection_name = "my_first_collection"
vector_size = 384 # Vector size for all-MiniLM-L6-v2 model

client.recreate_collection(
    collection_name=collection_name,
    vectors_config=models.VectorParams(size=vector_size, distance=models.Distance.COSINE),
) # Use Distance.COSINE for Sentence Transformers embeddings

print(f"Collection '{collection_name}' has been created or reinitialized.")

3. Create and Insert Data (Vector and Payload)

Each data point in Qdrant consists of an embedding and a payload (JSON metadata related to that vector). Payloads are very useful for filtering data later.

from sentence_transformers import SentenceTransformer

# Load model to create embeddings
model = SentenceTransformer('all-MiniLM-L6-v2')

documents = [
    {
        "id": 1,
        "text": "Guide to installing Qdrant and using it for RAG.",
        "category": "AI",
        "tags": ["Qdrant", "RAG", "Vector Database"]
    },
    {
        "id": 2,
        "text": "Build AI applications with Python and Machine Learning libraries.",
        "category": "AI",
        "tags": ["Python", "Machine Learning", "AI App"]
    },
    {
        "id": 3,
        "text": "SEO optimization tips for personal blogs, enhancing visibility.",
        "category": "Marketing",
        "tags": ["SEO", "Blog", "Marketing"]
    },
    {
        "id": 4,
        "text": "Frontend web programming with React and modern JavaScript.",
        "category": "Development",
        "tags": ["React", "JavaScript", "Frontend"]
    }
]

# Create embeddings for the texts
vectors = model.encode([doc["text"] for doc in documents]).tolist()

# Prepare data to insert into Qdrant
points = [
    models.PointStruct(
        id=doc["id"],
        vector=vec,
        payload={
            "text": doc["text"],
            "category": doc["category"],
            "tags": doc["tags"]
        }
    )
    for doc, vec in zip(documents, vectors)
]

client.upsert(
    collection_name=collection_name,
    wait=True,
    points=points
)

print(f"Inserted {len(points)} data points into collection '{collection_name}'.")

4. Search for Nearest Vectors

Now, we can perform semantic search. Qdrant will find the vectors closest to your query vector.

query_text = "how to use vector database for AI"
query_vector = model.encode(query_text).tolist()

search_result = client.search(
    collection_name=collection_name,
    query_vector=query_vector,
    limit=2  # Get only the 2 nearest results
)

print(f"Search results for: '{query_text}'")
for hit in search_result:
    print(f"  ID: {hit.id}, Score: {hit.score}, Payload: {hit.payload['text']}")

5. Search with Filters (Payload Filtering)

This is an extremely powerful feature of Qdrant. You can combine semantic search with filtering conditions on the payload.

query_text_filtered = "AI development guide"
query_vector_filtered = model.encode(query_text_filtered).tolist()

search_result_filtered = client.search(
    collection_name=collection_name,
    query_vector=query_vector_filtered,
    query_filter=models.Filter(
        must=[
            models.FieldCondition(
                key="category",
                match=models.MatchValue(value="AI") # Search only in the "AI" category
            )
        ]
    ),
    limit=1
)

print(f"\nFiltered search results for: '{query_text_filtered}' (AI category only)")
for hit in search_result_filtered:
    print(f"  ID: {hit.id}, Score: {hit.score}, Payload: {hit.payload['text']}")

Qdrant in a real-world RAG architecture

In a RAG system, Qdrant acts as a knowledge storage and retrieval hub. The process typically unfolds as follows:

Data Preparation: Original documents (PDFs, webpages, articles, etc.) are broken down into chunks.
Create Embeddings: Each chunk is transformed into an embedding vector using Large Language Models (LLMs) or specialized models like Sentence Transformers.
Store in Qdrant: These embeddings, along with metadata (payload) like document ID, title, author, are stored in Qdrant.
Retrieval: When a user asks a question, that question is also converted into an embedding vector. Qdrant will search for chunks with vectors closest to the query vector, while also applying filters if any.

Qdrant makes the retrieval step extremely efficient, ensuring that the LLM receives the most relevant information, thereby significantly improving the quality and accuracy of the response. With Qdrant, your RAG system will respond much faster and more accurately.

Conclusion

Qdrant is not just a fast and powerful vector database, but also an indispensable tool when you want to build effective AI and RAG applications in real-world environments. With its flexible semantic search and data filtering capabilities, Qdrant empowers you to turn complex ideas into reliable AI solutions.

Hopefully, through this article, you have gained an overview and learned how to get started with Qdrant. Try it yourself and discover its potential in your own projects!