Turning Linux into a Database: Querying and Security Monitoring with osquery

Security tutorial - IT technology blog
Security tutorial - IT technology blog

The Nightmare of Managing Hundreds of Linux Servers

Imagine your boss asks you to immediately list the Nginx versions across 200 servers to check for the latest CVE vulnerability. If you use manual Bash scripts and ssh loops, you’ll spend all afternoon grepping and awking through a mess of unorganized data.

Every tool like ps, netstat, or lsof returns a different text format. Standardizing this data for reports is a real pain. Osquery solves this problem by treating the operating system as a relational database. Instead of battling shell scripts, you just write SQL.

Why is osquery Different?

To see the value of osquery, look at this comparison of common monitoring methods today:

  • Traditional CLI (top, netstat): Fast and available, but doesn’t scale. Gathering data from 500 nodes simultaneously is nearly impossible.
  • Log Aggregators (ELK, Splunk): Great storage, beautiful charts. However, resource costs are high, and data often has latency (post-analysis).
  • osquery: Real-time queries, standardized data, and extremely lightweight. You can ask the system anything and get results in milliseconds.

The Power of “Querying the Operating System”

Osquery (developed by Facebook) abstracts system components into over 200 data tables. Everything from running processes and kernel modules to network connections is neatly organized into SQL tables.

Instead of memorizing dozens of parameters for netstat -tulpn, you just run: SELECT * FROM listening_ports;. The result is a clean table format, easily exportable to JSON for integration with other systems.

Deploying osquery on Linux in Practice

Here is a quick installation guide for Ubuntu. For RHEL/CentOS systems, simply replace the commands with yum or dnf.

1. Installing the Agent

First, add the official repository to receive the latest security updates:

export OSQUERY_KEY=14841204C457537ED9F7D182915B1051F6F3CFC5
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys $OSQUERY_KEY
sudo add-apt-repository 'deb [arch=amd64] https://pkg.osquery.io/deb deb main'
sudo apt update && sudo apt install osquery -y

2. Interactive Querying with osqueryi

osqueryi is an excellent tool for quick troubleshooting. It works exactly like a MySQL or PostgreSQL shell.

osqueryi

Try checking the top 5 RAM-consuming processes on your server:

SELECT name, pid, (resident_size / 1024 / 1024) AS ram_mb 
FROM processes 
ORDER BY resident_size DESC LIMIT 5;

3. 3 Real-World Security Scenarios

These are commands I frequently use during Incident Response.

Detecting Users Without Passwords:
Attackers often create hidden users or remove passwords to maintain access. This query exposes them:

SELECT * FROM users WHERE password = '';

Searching for Reverse Shells:
This command helps find processes connecting to the internet that are not localhost:

SELECT p.name, pos.remote_address, pos.remote_port 
FROM process_open_sockets AS pos 
JOIN processes AS p ON pos.pid = p.pid 
WHERE pos.remote_address NOT IN ('127.0.0.1', '::1', '0.0.0.0', '');

Monitoring Configuration File Changes:
Check who modified files in /etc within the last 24 hours:

SELECT path, mtime, uid FROM file 
WHERE path LIKE '/etc/%' AND mtime > (strftime('%s', 'now') - 86400);

A small tip: When setting up admin accounts for data logging, stay away from weak passwords. I often use Password Generator to generate random 20-character strings. This tool runs locally in the browser, making it very safe for tech professionals.

Automated Monitoring with osqueryd

For continuous monitoring, you need osqueryd (the daemon). It runs scheduled queries and pushes logs to a central server.

By default, osquery features a “Watchdog”. If a query consumes more than 12% CPU or 200MB of RAM, it is automatically terminated to protect the server. This gives you peace of mind when deploying on critical production systems.

Sample configuration in /etc/osquery/osquery.conf to check for rootkits every hour:

{
  "schedule": {
    "rootkit_check": {
      "query": "SELECT * FROM kernel_modules WHERE status != 'Live';",
      "interval": 3600
    }
  }
}

Performance Optimization Tips

Even though osquery is highly optimized, a poorly written SQL query can still hang your system. Remember these 3 golden rules:

  1. Never SELECT * on the file table: If you scan the entire hard drive to calculate MD5 hashes, your server will “scream.” Only scan critical directories like /bin or /sbin.
  2. Use JOINs wisely: Always filter data in the primary table before JOINing with another table.
  3. Leverage Event-based tables: Use _events tables to receive notifications only when an event changes, rather than periodic scanning (polling).

Conclusion

Osquery is more than just a tool; it’s a modern management mindset. Bringing every metric into standard SQL makes it easy to integrate into CI/CD pipelines or SOC Dashboards. If you are managing large-scale server clusters, try installing osquery today. You’ll find that controlling infrastructure has never been this effortless.

Share: