Background: Why I Chose Neo4j Among a “Jungle” of Databases
In my early days as a developer, I believed MySQL and PostgreSQL could solve everything in the world. But reality hit hard. When my project grew and required building friend recommendation features and user behavior analysis, I started to feel the pain.
With traditional SQL, finding “friends of friends of friends” (3rd-degree relationships) is a nightmare. You would have to JOIN 5-7 tables consecutively. The query at that point isn’t just incredibly long; performance drops off a cliff. In practice, with a dataset of about 1 million records, a complex query like this could take 20-30 seconds to respond — an unacceptable figure for a production environment.
Neo4j was the missing piece for this problem. Unlike the rigid row-column storage model, Neo4j treats Nodes and Relationships as first-class citizens. Relationships are physically stored on the disk. This makes querying overlapping connections thousands of times faster than SQL because the database doesn’t have to scan indexes across entire tables to find links.
Installing Neo4j: Lightweight with Docker
To keep my machine clean and manageable, I always prioritize using Docker. A single command is all you need to set up a Neo4j environment for experimentation.
Run this command to pull the image and start the container immediately:
docker run -d \
--name neo4j_itfromzero \
-p 7474:7474 -p 7687:7687 \
-v $HOME/neo4j/data:/data \
-v $HOME/neo4j/logs:/logs \
-e NEO4J_AUTH=neo4j/password123 \
neo4j:latest
Here is a quick breakdown of what these ports do:
- 7474: The Neo4j Browser interface port (HTTP) for running visual queries.
- 7687: The Bolt protocol port. This is the “hotline” for your Python or Node.js apps to connect directly to the database.
Once finished, open http://localhost:7474, log in with username neo4j and password password123, and you’re ready to start “drawing” graphs.
Hands-on with Cypher and Python
To work with Neo4j, you use the Cypher language. Instead of dry SELECT statements, Cypher uses symbols like () for Nodes and -[]-> for relationships. Looking at a query, it feels just like a mind map.
1. Create Sample Data in Seconds
Let’s try creating a simple friend network to see the visualization in action:
CREATE (an:Person {name: 'An', age: 25})
CREATE (binh:Person {name: 'Binh', age: 28})
CREATE (chi:Person {name: 'Chi', age: 22})
CREATE (an)-[:FRIEND]->(binh)
CREATE (binh)-[:FRIEND]->(chi)
In this example, Person acts like a table name, while the properties inside {} are the object’s data.
2. Building Recommendation Logic with Python
Now for the most interesting part: using Python to find “people that An’s friends know, but An hasn’t friended yet.” This is the backbone of the “People You May Know” feature on Facebook.
Install the library first:
pip install neo4j
The code below has been streamlined so you can apply it to your backend immediately:
from neo4j import GraphDatabase
class Neo4jApp:
def __init__(self, uri, user, password):
self.driver = GraphDatabase.driver(uri, auth=(user, password))
def recommend_friends(self, person_name):
with self.driver.session() as session:
# With just 3 lines of Cypher, you replace dozens of SQL JOIN lines
query = """
MATCH (p:Person {name: $name})-[:FRIEND]->(friend)-[:FRIEND]->(fof)
WHERE NOT (p)-[:FRIEND]->(fof) AND p <> fof
RETURN DISTINCT fof.name AS recommended_friend
"""
result = session.run(query, name=person_name)
return [record["recommended_friend"] for record in result]
app = Neo4jApp("bolt://localhost:7687", "neo4j", "password123")
print(f"Recommendations for An: {app.recommend_friends('An')}")
app.close()
Hard-earned Lessons: Monitoring and Optimization
When taking Neo4j to production, RAM is the lifeblood. Neo4j tends to be “greedy” with memory to cache the entire graph, ensuring the fastest possible retrieval.
1. Proper Memory Configuration
Avoid using default configurations if you don’t want your server to hang early. You need to adjust two parameters in neo4j.conf:
dbms.memory.heap.max_size: Limits RAM for query processing. Don’t set it too high to avoid overloading the Garbage Collector.dbms.memory.pagecache.size: Dedicated to caching data from the disk. For example, if a server has 16GB of RAM, I usually reserve 8GB for this to ensure optimal file reading speeds.
2. Don’t Forget INDEX and PROFILE
Just like in SQL, if you don’t create an INDEX, Neo4j will have to perform a Node Scan. Always use the PROFILE command to check db hits before finalizing a query. A small action like CREATE INDEX FOR (p:Person) ON (p.name) can boost search speeds from seconds down to milliseconds.
3. Signs Your System is Struggling
If the server suddenly slows down, check the logs immediately using the command docker logs -f neo4j_itfromzero. Usually, 90% of issues stem from page cache overflow or accidentally writing a Cypher query that creates a “cartesian product,” causing a temporary memory explosion.
Neo4j is a heavy-duty weapon, but don’t over-rely on it for simple flat data. If you only need to store order information or basic user profiles, PostgreSQL is still a safe choice. But if your problem is riddled with tangled relationships, let Neo4j shine.

