Have you ever gotten a 2 AM phone call about a slow API with no idea where to even start looking? I’ve been there — SSH-ing into servers one by one, running top, netstat, and still coming up empty. Since integrating prom-client + Grafana, everything changed. Now I just open the dashboard and instantly see: current request rate, which endpoints are slow, and what the error rate is.
The blog already has a post on setting up Prometheus + Grafana for server monitoring (CPU, RAM, disk). This post goes one layer deeper: monitoring at the application level — what your actual Node.js code is doing, which endpoints are bottlenecked, and whether your business logic is running correctly.
3 Ways to Monitor Node.js — Compare Before You Choose
prom-client isn’t always the right answer. Here are 3 common approaches — each exists for a reason.
Option 1: Logs Only + Manual Analysis
Log requests/responses to a file, then use grep, awk, or Graylog to analyze after the fact.
- Pros: No additional setup required, logs are already there, easy to debug specific issues
- Cons: No visibility into trends over time, no real-time alerting, manual analysis is time-consuming — especially when incidents happen at 3 AM
Option 2: Commercial APM (Datadog, New Relic, Dynatrace)
Install an agent, everything gets traced automatically, beautiful dashboards out of the box.
- Pros: Extremely easy to set up, includes distributed tracing and anomaly detection, no infrastructure to manage
- Cons: High cost (Datadog starts at $15/host/month, plus $0.10/GB data ingested), vendor lock-in, can’t define custom metrics tailored to your own business logic
Option 3: prom-client + Prometheus + Grafana (self-hosted)
You expose metrics directly from your code, Prometheus scrapes them on a schedule, and Grafana visualizes and alerts.
- Pros: Completely free, full control, define metrics exactly as you want, large community, excellent Kubernetes integration
- Cons: Requires learning PromQL for queries, you manage the Prometheus + Grafana infrastructure yourself
Why Choose prom-client?
If your project already has Prometheus (or you’re planning to add it), prom-client is the most natural choice. Commercial APMs suit large teams with big budgets that need complex distributed tracing. For startups, side projects, or when you want to track specific business metrics your own way — prom-client + Grafana is more than enough and completely free.
prom-client has 4 metric types. Counter only goes up — use it to count requests and errors. Gauge goes up and down — active connections, memory. Histogram captures distributions — request duration. Summary computes client-side quantiles. For a web API, Counter and Histogram are the two you’ll use most.
Integrating prom-client into Express.js — Step by Step
Step 1: Install the Package
npm install prom-client
Step 2: Initialize Metrics in a Separate File
Separate monitoring logic into its own metrics.js file to keep it out of your business code:
// metrics.js
const client = require('prom-client');
const register = new client.Registry();
// Collect Node.js default metrics (memory heap, event loop lag, GC...)
client.collectDefaultMetrics({ register });
// Counter: track total HTTP requests
const httpRequestsTotal = new client.Counter({
name: 'http_requests_total',
help: 'Total number of HTTP requests',
labelNames: ['method', 'route', 'status_code'],
registers: [register],
});
// Histogram: distribution of request processing time (latency)
const httpRequestDuration = new client.Histogram({
name: 'http_request_duration_seconds',
help: 'Duration of HTTP requests in seconds',
labelNames: ['method', 'route', 'status_code'],
buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5],
registers: [register],
});
// Business metric: count orders created (example)
const ordersCreatedTotal = new client.Counter({
name: 'orders_created_total',
help: 'Total number of orders created',
labelNames: ['status', 'payment_method'],
registers: [register],
});
// Gauge: number of currently active users (can go up or down)
const activeUsers = new client.Gauge({
name: 'active_users_current',
help: 'Number of currently active users',
registers: [register],
});
module.exports = { register, httpRequestsTotal, httpRequestDuration, ordersCreatedTotal, activeUsers };
Step 3: Middleware to Automatically Track Every HTTP Request
// middleware/metricsMiddleware.js
const { httpRequestsTotal, httpRequestDuration } = require('../metrics');
function metricsMiddleware(req, res, next) {
const start = Date.now();
res.on('finish', () => {
const duration = (Date.now() - start) / 1000;
// req.route.path returns the pattern like /api/users/:id instead of /api/users/123
const route = req.route ? req.route.path : req.path;
const labels = { method: req.method, route, status_code: res.statusCode };
httpRequestsTotal.inc(labels);
httpRequestDuration.observe(labels, duration);
});
next();
}
module.exports = metricsMiddleware;
Step 4: Register Middleware and Expose /metrics
// app.js
const express = require('express');
const { register } = require('./metrics');
const metricsMiddleware = require('./middleware/metricsMiddleware');
const app = express();
app.use(express.json());
app.use(metricsMiddleware);
// Prometheus scrape endpoint — do NOT expose publicly, see notes below
app.get('/metrics', async (req, res) => {
res.set('Content-Type', register.contentType);
res.end(await register.metrics());
});
app.get('/api/orders', (req, res) => {
res.json({ orders: [] });
});
app.listen(3000, () => console.log('Server :3000 | Metrics: :3000/metrics'));
Step 5: Track Custom Business Metrics in Route Handlers
This is where prom-client shines compared to generic monitoring — you can track your exact business logic:
// routes/orders.js
const { ordersCreatedTotal, activeUsers } = require('../metrics');
app.post('/api/orders', async (req, res) => {
try {
const order = await createOrder(req.body);
ordersCreatedTotal.inc({ status: 'success', payment_method: order.paymentMethod });
res.json({ success: true, orderId: order.id });
} catch (err) {
ordersCreatedTotal.inc({ status: 'failed', payment_method: req.body.paymentMethod || 'unknown' });
res.status(500).json({ error: err.message });
}
});
app.post('/api/login', async (req, res) => {
// ... auth logic
activeUsers.inc();
res.json({ token: '...' });
});
app.post('/api/logout', (req, res) => {
activeUsers.dec();
res.json({ success: true });
});
Configuring Prometheus to Scrape the Node.js App
Open prometheus.yml and add a new job alongside the existing node-exporter job:
scrape_configs:
- job_name: 'node-exporter'
static_configs:
- targets: ['localhost:9100']
# New job for the Node.js application
- job_name: 'nodejs-app'
static_configs:
- targets: ['localhost:3000']
metrics_path: '/metrics'
scrape_interval: 15s
Reload the Prometheus config (no restart required):
curl -X POST http://localhost:9090/-/reload
PromQL Queries for the Grafana Dashboard
With Prometheus scraping data, it’s time to build the dashboard. These 4 panels give you a complete view of API health:
Request Rate (requests/second)
sum(rate(http_requests_total[5m])) by (route, method)
P95 Latency — The Most Important Metric
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, route))
P95 means 95% of requests complete within time X. It’s far more informative than the average — averages get skewed by fast requests and can mask the ones that are genuinely slow.
Error Rate (% of 5xx errors)
rate(http_requests_total{status_code=~"5.."}[5m]) / rate(http_requests_total[5m]) * 100
Business Metric: Order Success Rate
rate(orders_created_total{status="success"}[5m]) / rate(orders_created_total[5m]) * 100
Quick Verification Before Connecting Grafana
# Run the app
node app.js
# Send a few test requests
curl http://localhost:3000/api/orders
curl -X POST http://localhost:3000/api/orders \
-H "Content-Type: application/json" \
-d '{"item":"product-1","paymentMethod":"card"}'
# View raw metrics output
curl http://localhost:3000/metrics
If you see output like this, you’re good:
# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",route="/api/orders",status_code="200"} 3
http_requests_total{method="POST",route="/api/orders",status_code="200"} 1
# HELP http_request_duration_seconds Duration of HTTP requests in seconds
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.005",...} 2
http_request_duration_seconds_bucket{le="0.01",...} 4
Real-World Notes
- Don’t use user_id as a label: Labels must have low cardinality. Using
user_idorrequest_idas labels creates millions of time series — Prometheus will run out of memory fast. Method, route, and status_code are safe choices. - Protect the /metrics endpoint: Don’t expose it publicly. Use Basic Auth, whitelist internal IPs, or bind the metrics server to a separate port accessible only to internal Prometheus. This endpoint reveals quite a bit about your infrastructure.
- A scrape_interval of 15s is sufficient: Don’t set it to 5s or lower without a specific reason — it adds unnecessary load on both the app and Prometheus.
- Test route normalization: With nested Express routers,
req.route.pathmay return a relative path. Test thoroughly to make sure/api/users/:iddoesn’t accidentally become/api/users/123.
With metrics in Grafana, the next step is setting up alerts — for example, alerting when P95 latency exceeds 500ms, or when error rate stays above 5% for 5 consecutive minutes. Alertmanager has its own dedicated post on the blog; combine it with this dashboard and you have a complete monitoring loop.
