The Nightmare of Hierarchical Data in Legacy MySQL
Handling parent-child structures has always been a headache for developers working with MySQL 5.7 and earlier. You can easily find this type of data in features like multi-level product categories, nested comment systems, or flexible data structures like company organizational charts.
Before CTEs, the most common solution was using nested Self-joins. If the data had 3 levels, you joined 3 times; 10 levels, you joined 10 times. This approach turned SQL statements into a “tangled mess” hundreds of lines long, making them extremely difficult to read and maintain. Many chose to pull all data into the application code (PHP, Node.js) to handle recursion. However, this method often increased latency by 200-500ms as the dataset grew.
The release of MySQL 8.0 changed the game with Common Table Expressions (CTE). This is not just new syntax but a modern approach to handling tree structures in the cleanest way possible.
What is a CTE?
Imagine a CTE as a temporary variable that stores query results. It exists only during the execution of a single statement such as SELECT, INSERT, or UPDATE. Instead of writing complex subqueries, you separate them into named virtual tables for reuse, a method that works perfectly with MySQL stored procedures and functions.
The system divides CTEs into two main types:
- Non-Recursive CTE: A replacement for subqueries, making the code structure more coherent and easier to understand.
- Recursive CTE: A specialized tool for handling repetitive or hierarchical data, where a record points to another record within the same table.
Basic Syntax with the WITH Keyword
To start a CTE, you always use the WITH keyword. This style helps separate raw data retrieval from the final logic processing.
WITH category_summary AS (
-- Define temporary table
SELECT id, name FROM categories WHERE is_active = 1
)
SELECT * FROM category_summary; -- Query directly from the temporary table above
Giving a CTE a clear name helps other team members immediately understand the purpose of the code without needing to read all the internal logic.
Handling Recursion with Recursive CTE
This is the most important technique for processing data layers of unknown depth. A standard Recursive CTE statement always consists of three components:
- Anchor Member: Retrieves the starting point, such as the company’s top level.
- UNION ALL: Connects the anchor result with subsequent iterations.
- Recursive Member: A query that references the CTE itself to find direct child levels.
Application: Building an Organizational Chart
Suppose we have an employees table with a simple structure:
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(100),
manager_id INT
);
INSERT INTO employees VALUES
(1, 'CEO - Nguyen Van A', NULL),
(2, 'IT Director - Tran Van B', 1),
(3, 'Sales Director - Le Thi C', 1),
(4, 'Dev Manager - Pham Van D', 2),
(5, 'Senior Dev - Hoang Van E', 4);
To display the hierarchy of each employee, we use a Recursive CTE as follows:
WITH RECURSIVE employee_hierarchy AS (
-- STEP 1: Get the CEO as the root (Level 1)
SELECT id, name, manager_id, 1 AS level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- STEP 2: Find subordinates by joining the table with the CTE itself
SELECT e.id, e.name, e.manager_id, eh.level + 1
FROM employees e
INNER JOIN employee_hierarchy eh ON e.manager_id = eh.id
)
SELECT * FROM employee_hierarchy ORDER BY level;
Under-the-Hood Mechanism:
The system starts by identifying the CEO (ID 1). Then, it uses this ID to find employees with manager_id = 1 (IT and Sales Directors). This process repeats continuously until it reaches the lowest level and stops.
Important Performance Considerations
Applying CTEs in a production environment requires caution regarding system resources:
1. Indexing is Mandatory
Linking columns (such as manager_id) must be indexed. Optimizing MySQL query performance with Index is mandatory; for a table with 1 million records, the lack of an index will turn each iteration into a Full Table Scan. This spikes CPU to 100% and can freeze the database in seconds.
2. Controlling Recursion Depth
MySQL defaults to a limit of 1000 iterations via the cte_max_recursion_depth variable. If the data has an infinite loop error (A manages B, B manages A), this limit protects your server. You can increase this limit if your actual tree structure is deeper:
SET SESSION cte_max_recursion_depth = 2000;
3. Use it Wisely
CTEs consume additional memory to initialize temporary tables. For simple flat queries, a standard SELECT statement will be about 5-10% faster as it avoids the overhead of managing CTE memory space. You should always detect and optimize slow SQL queries to ensure you are using the most efficient approach for your dataset.
Summary
CTE in MySQL 8 is not merely a new feature but an essential skill for the modern Backend Developer. It thoroughly solves the hierarchical data problem, cleans up source code, and is much easier to debug than the traditional Self-join method.
If your project has features like multi-level menus or permission systems, try applying Recursive CTE immediately. You will see a clear improvement in maintaining and scaling the system later on.

