PostgreSQL Full-Text Search: High-Performance Vietnamese Search Without Elasticsearch – ITFROMZERO

Table of Contents

Search Without Elasticsearch? It’s Entirely Possible!

It’s a common story: when developers need a search feature, they immediately think of Elasticsearch (ES) or Algolia. Admittedly, they are very powerful. However, if you’re running on a VPS with only 2GB of RAM, carrying the heavy Java-based server cluster of ES is a nightmare. Not to mention, you have to deal with the headache of synchronizing data between your main database and the search engine.

Practical experience shows that PostgreSQL is an incredibly effective “secret weapon.” With its built-in tsvector and tsquery toolset, you can build a professional-grade Vietnamese search engine. Everything is contained within a single database—no clutter, no extra RAM usage.

Quick Start: Try a Search in 5 Minutes

Instead of just reading theory, open your terminal or pgAdmin. Try executing the commands below to see how responsive it is.

-- 1. Create demo table
CREATE TABLE blog_posts (
    id SERIAL PRIMARY KEY,
    title TEXT,
    content TEXT
);

-- 2. Insert sample data
INSERT INTO blog_posts (title, content) VALUES 
('Learn Python programming', 'Python is an extremely powerful language for AI'),
('PostgreSQL Basics', 'PostgreSQL supports full-text search very well'),
('Dev Roadmap', 'You need to learn SQL, data structures, and algorithms');

-- 3. Search query
SELECT * FROM blog_posts 
WHERE to_tsvector('simple', title || ' ' || content) @@ to_tsquery('simple', 'python & programming');

The @@ operator is the key. It matches the keywords “python” and “programming” within the vectorized data block. The results are returned almost instantly with small datasets.

Decoding tsvector and tsquery

To master Full-Text Search (FTS), you only need to understand two core concepts.

1. tsvector (Text Search Vector)

Think of tsvector as a list of words (lexemes) that have been cleaned and normalized. It removes meaningless words and records the exact position of each word.

Example: SELECT to_tsvector('simple', 'Hello itfromzero team');

Result: 'hello':1 'itfromzero':3 'team':2. Thanks to this positional list, Postgres doesn’t need to scan through every character when you perform a search.

2. tsquery (Text Search Query)

This is the query language. You can use flexible logic operators:

& (AND): Must contain both words.
| (OR): Containing either is sufficient.
! (NOT): Exclude results containing this word.
<-> (FOLLOWED BY): Find the exact phrase in order (crucial for Vietnamese compound words).

A Note on the ‘simple’ Dictionary

Why use ‘simple’ for Vietnamese? Unlike English, which has word variations (like run/running), Vietnamese is an isolating language. Using the default dictionary helps preserve the word structure, preventing Postgres from arbitrarily stripping suffixes and distorting the meaning.

Advanced: Optimizing Performance for Hundreds of Thousands of Records

Problems arise as your table grows. If you let Postgres perform a full table scan (Seq Scan), the database will soon become overloaded. This is where indexing comes to the rescue.

1. GIN Index – Superior Speed

GIN (Generalized Inverted Index) is a specialized index for full-text search. Instead of indexing by row, it indexes by individual words.

CREATE INDEX idx_fts_post ON blog_posts 
USING GIN (to_tsvector('simple', title || ' ' || content));

With a table of about 100,000 rows, a GIN Index can reduce query times from several seconds to under 10ms. A truly impressive number.

2. Handling Accentless Search

Vietnamese users often have the habit of typing without accents. To handle this, use the unaccent extension.

CREATE EXTENSION IF NOT EXISTS unaccent;

-- Professional approach: Create a synchronized function and index
CREATE OR REPLACE FUNCTION f_unaccent_vector(text) RETURNS tsvector AS $$
    SELECT to_tsvector('simple', unaccent($1));
$$ LANGUAGE SQL IMMUTABLE;

CREATE INDEX idx_fts_unaccent ON blog_posts USING GIN (f_unaccent_vector(title || ' ' || content));

3. Generated Columns – A Pro Tip from Postgres 12

Instead of recalculating the vector for every query, you should store it in a separate column. This significantly saves CPU cycles.

ALTER TABLE blog_posts 
ADD COLUMN search_vector tsvector 
GENERATED ALWAYS AS (to_tsvector('simple', unaccent(title) || ' ' || unaccent(content))) STORED;

CREATE INDEX idx_search_vector ON blog_posts USING GIN (search_vector);

Now, the search command is as simple as: SELECT * FROM blog_posts WHERE search_vector @@ to_tsquery('simple', 'python');

Practical Tips for Better UI/UX

After implementing this in many projects, I’ve gathered three tricks to make search features more professional:

Weighting: Prioritize results found in the Title (Weight A) over the Content (Weight B). Users will find what they need faster.
Ranking: Use the ts_rank function to bring articles with higher keyword density to the top.
Highlighting: Use the ts_headline function to automatically bold (using <b> tags) keywords in the returned text fragments.

Don’t rush to install complex tools before fully utilizing the power of your current database. The KISS (Keep It Simple, Stupid) philosophy always holds true in engineering. Using PostgreSQL’s FTS makes your system lightweight, easy to maintain, and extremely stable.

If you encounter difficulties with configuration or want deeper optimization for large datasets, leave a comment below. The itfromzero team is here to help you troubleshoot!