MySQL JSON Data Type: Store Flexible Data Without Changing Your Schema

MySQL tutorial - IT technology blog
MySQL tutorial - IT technology blog

This story starts with an e-commerce project I worked on a few years back. The system needed to store “attributes” for each product type: shirts had size, color, and material — shoes had size number, color, and sole type. Every product category was completely different.

The solution back then was EAV (Entity-Attribute-Value) — a product_attributes table with three columns: entity_id, attribute_name, and value. It worked, but every query required 5–6 JOINs, and the EXPLAIN output was painful to look at. Then MySQL 5.7.8 introduced native JSON data type — something I wish I’d known about sooner.

What Is JSON Data Type, and Why Not Just Use TEXT?

The first question everyone asks is: “Can’t I just store TEXT and parse it in the application layer?” — Sure, you can. But TEXT and JSON are not the same thing. MySQL’s JSON type offers three things TEXT simply can’t do:

  • Automatic validation: MySQL rejects INSERT statements if the value is not valid JSON. TEXT silently accepts garbage into your database.
  • Binary storage: Data is stored in an optimized binary format, making nested field access faster — no need to parse the entire string like TEXT.
  • Native functions: JSON_EXTRACT, JSON_SET, JSON_CONTAINS… usable directly in SQL. No need to push processing up to the application layer.

The maximum size is bounded by max_allowed_packet — 16MB by default. In practice, most projects never come close to this limit.

Creating Tables and Inserting Data

Create a products table with an attributes column of type JSON:

CREATE TABLE products (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    price DECIMAL(10, 2),
    attributes JSON,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

The insert syntax is no different from storing a regular string — just make sure it’s valid JSON:

INSERT INTO products (name, price, attributes) VALUES
(
    'Basic T-Shirt',
    299000,
    '{"size": ["S", "M", "L", "XL"], "color": "white", "material": "cotton", "in_stock": true}'
),
(
    'Running Pro Shoes',
    1850000,
    '{"size": [39, 40, 41, 42, 43], "color": "black", "waterproof": false, "weight_g": 320}'
);

Try inserting malformed JSON — MySQL rejects it immediately, no waiting for the application layer:

-- This line will be rejected:
INSERT INTO products (name, attributes) VALUES ('Test', '{invalid json}');
-- ERROR 3140: Invalid JSON text at position 1...

Hands-On: Querying JSON Data

Reading Nested Fields with -> and ->>

These two operators are the ones I use most every day:

-- -> returns a JSON value (string with double quotes)
SELECT name, attributes->'$.color' AS color FROM products;
-- Result: "white"

-- ->> returns plain text (unquoted string)
SELECT name, attributes->>'$.color' AS color FROM products;
-- Result: white

-- Access array element (index starts at 0)
SELECT name, attributes->>'$.size[0]' AS smallest_size FROM products;

Filtering by JSON Conditions

-- Find products made from cotton
SELECT id, name, price
FROM products
WHERE attributes->>'$.material' = 'cotton';

-- JSON_CONTAINS: check if the array contains a value
SELECT id, name
FROM products
WHERE JSON_CONTAINS(attributes->'$.size', '"M"');

-- Combine multiple conditions
SELECT id, name, price
FROM products
WHERE JSON_EXTRACT(attributes, '$.in_stock') = true
  AND price < 500000;

Updating Values in JSON

-- JSON_SET: add or update a field
UPDATE products
SET attributes = JSON_SET(attributes, '$.discount_pct', 15)
WHERE id = 1;

-- JSON_REMOVE: remove a field
UPDATE products
SET attributes = JSON_REMOVE(attributes, '$.discount_pct')
WHERE id = 1;

-- JSON_MERGE_PATCH: merge objects (overwrites duplicate keys)
UPDATE products
SET attributes = JSON_MERGE_PATCH(attributes, '{"new_arrival": true, "season": "summer"}')
WHERE id = 2;

Performance and Indexes — The Most Common Pitfall

MySQL cannot index a JSON column directly. This is the classic trap — a table with hundreds of thousands of rows, querying by a JSON field, execution time jumping to 2–3 seconds with no obvious cause in the code. Only running EXPLAIN reveals the culprit: “Full table scan”.

The cleanest workaround is a Generated Column: create a virtual column that’s automatically derived from the JSON field, then index that column:

-- Add a virtual generated column from a JSON field
ALTER TABLE products
ADD COLUMN material VARCHAR(50)
    GENERATED ALWAYS AS (attributes->>'$.material') VIRTUAL;

-- Create an index on the generated column
CREATE INDEX idx_material ON products (material);

-- This query now uses an index instead of a full scan
SELECT id, name FROM products WHERE material = 'cotton';
-- EXPLAIN will show "Using index" instead of "Full table scan"

Using STORED instead of VIRTUAL physically writes the value to disk — slightly faster reads at the cost of additional storage. For a boolean field frequently used in filters:

ALTER TABLE products
ADD COLUMN in_stock BOOLEAN
    GENERATED ALWAYS AS (JSON_EXTRACT(attributes, '$.in_stock')) STORED,
ADD INDEX idx_in_stock (in_stock);

JSON_TABLE — Flattening JSON Arrays into Multiple Rows (MySQL 8.0+)

Sometimes you need to “unfold” a JSON array into multiple rows for tabular processing. JSON_TABLE handles this directly in SQL, without pulling data up to the application layer and looping:

-- List each size of each product as a separate row
SELECT
    p.id,
    p.name,
    s.size_value
FROM products p,
JSON_TABLE(
    p.attributes,
    '$.size[*]'
    COLUMNS (size_value VARCHAR(10) PATH '$')
) AS s
WHERE p.id = 1;

Result:

+----+---------------+------------+
| id | name          | size_value |
+----+---------------+------------+
|  1 | Basic T-Shirt | S          |
|  1 | Basic T-Shirt | M          |
|  1 | Basic T-Shirt | L          |
|  1 | Basic T-Shirt | XL         |
+----+---------------+------------+

When to Use It, When Not To

After several production deployments, I’ve drawn a fairly clear line. JSON columns work well for:

  • Flexible, frequently-changing schemas: product attributes, metadata, per-user config, feature flags
  • Data typically read as a whole object: rarely needing to filter by individual nested fields
  • Fast prototyping: when the schema isn’t settled yet and you want to iterate without running migrations every time

Avoid it when:

  • You frequently run complex queries against multiple JSON fields — normalizing into separate tables is cleaner and the SQL is much clearer
  • Data requires foreign key constraints — JSON columns don’t support them
  • The team is unfamiliar with MySQL JSON functions — it’s easy to write queries that silently skip indexes

Conclusion

MySQL JSON type isn’t a magic bullet, but for the right use case it solves the flexible-schema problem cleanly — no messy EAV patterns, no extra extensions needed. My go-to pattern: normalize the important fields (frequently filtered or sorted) into dedicated indexed columns; keep supplementary metadata in the JSON column.

Both MySQL 5.7+ and 8.0 ship with JSON type built in, no additional installation needed. Open a terminal, create a test table, and run through the examples above — it’s a completely different experience compared to parsing JSON in the application layer and guessing which fields exist.

Share: