Free Your Hands with SchemaSpy: Automating ERD Diagrams and Database Documentation – ITFROMZERO

Table of Contents

Background: Why You Shouldn’t Document Databases Manually

Jumping into a “legacy” project with over 100 tables and zero documentation is a true nightmare. You’re probably all too familiar with manually inspecting every table in MySQL Workbench just to guess foreign key relationships. This approach is incredibly time-consuming and prone to errors as the system scales.

Drawing Entity Relationship Diagrams (ERDs) using Draw.io or Lucidchart is one solution, but databases evolve with every code change. Forget to update an index column or change a data type once, and your documentation immediately becomes obsolete “trash.” That’s why I rely on SchemaSpy to automate this entire process.

At its core, SchemaSpy is a Java-based tool that scans your database and generates a detailed HTML dashboard. It doesn’t just list tables; it draws relationship diagrams, identifies “orphan” tables (unlinked ones), and provides statistics on columns missing indexes. Instead of spending two days understanding the structure, you can now grasp the entire system in just 5 minutes of browsing.

A quick tip for data engineers: When you need to quickly convert CSV to JSON for schema testing, I often use the tool at toolcraft.app/en/tools/data/csv-to-json. It runs client-side, making it extremely secure with no risk of project data leaking to the server.

Environment Setup

To get SchemaSpy running smoothly, you’ll need a specific combo: Java Runtime (JRE), a JDBC Driver, and most importantly, Graphviz for rendering the diagrams.

1. Install Java and Graphviz

SchemaSpy requires Java 8 or higher. On Ubuntu or Debian, you can install everything with a single command:

sudo apt update && sudo apt install openjdk-11-jre graphviz -y

Don’t forget Graphviz! Without it, your reports will be text-only, losing 80% of the tool’s visual impact.

2. Download SchemaSpy and JDBC Drivers

Head over to the official GitHub to grab the latest schemaspy.jar. Then, download the JDBC Driver corresponding to your database type (such as MySQL, PostgreSQL, or SQL Server).

PostgreSQL: Use the postgresql-x.x.x.jar file from the official site.
MySQL: Use mysql-connector-java-x.x.x.jar.

I usually group everything into a single directory for easier management:

mkdir ~/schemaspy-tool && cd ~/schemaspy-tool
wget https://github.com/schemaspy/schemaspy/releases/download/v6.1.0/schemaspy-6.1.0.jar -O schemaspy.jar
# Example: downloading driver for Postgres 42.5.0
wget https://jdbc.postgresql.org/download/postgresql-42.5.0.jar

Running SchemaSpy via Configuration File

Instead of typing long, error-prone commands, I recommend using a schemaspy.properties file. This approach makes it easy to share configurations with your team via Git.

Sample schemaspy.properties file

schemaspy.t=pgsql
schemaspy.dp=./postgresql-42.5.0.jar
schemaspy.host=127.0.0.1
schemaspy.port=5432
schemaspy.db=my_project_db
schemaspy.u=db_user
schemaspy.p=secret_password
schemaspy.s=public
schemaspy.o=./output
schemaspy.vizjs=true

Once prepared, run the magic command:

java -jar schemaspy.jar -configFile schemaspy.properties

SchemaSpy will scan all metadata. Once finished, open output/index.html to enjoy the visual results.

Quick Deployment with Docker

If you prefer not to install Java locally, Docker is a lifesaver. This method is very clean, especially when running on CI/CD servers:

docker run --v $(pwd):/output --net=host schemaspy/schemaspy:6.1.0 \
-t pgsql -host localhost:5432 -db my_db -u user -p pass -o /output -dp /drivers/postgresql.jar

Extracting Insights from the Report

The SchemaSpy interface has 3 tabs that I always inspect closely to optimize the system:

Relationships: The heart of the documentation. The ERD diagram allows you to zoom in/out and click on individual tables to view parent-child relationships instantly.
Anomalies: The “diagnostic” tab for your DB. It lists tables missing Primary Keys, columns with the same name but different data types, or orphan tables. This is the ideal place to clean up technical debt.
Columns: The database search engine. Simply type a field name, and it will point directly to the table containing that column.

Integrating CI/CD to Keep Documentation Fresh

Documentation that isn’t updated is useless. The best approach is to integrate it into your GitLab CI or GitHub Actions workflow. Every time a new Migration is merged, the system can automatically run SchemaSpy and push the files to GitHub Pages.

This way, everyone from Frontend developers to BAs can check the latest schema via a single URL. DBA colleagues will also appreciate not being constantly pestered for every minor database change.

Security note: When running against Production, create a user with SELECT permissions only on information_schema. Never use the root user to ensure the safety of sensitive data.

Database documentation is often a tedious task, but SchemaSpy turns it into a professional automated experience. Hopefully, this tool will help make your projects cleaner and easier to maintain!