Diagnosing and Troubleshooting Network Connectivity Issues on Linux: A Guide to `ip`, `route`, `dig`, `netcat`, and More

Network tutorial - IT technology blog
Network tutorial - IT technology blog

The Real-World Problem: “2 AM and the Server Can’t Connect Out!”

It’s two in the morning. You’re deep asleep when PagerDuty starts screaming. A critical application on your Linux server has suddenly lost Internet connectivity. Application logs are flooded with Connection timed out errors when calling third-party APIs. The server can’t even update software packages or synchronize NTP time. Customers are starting to complain, and the pressure is mounting.

You quickly SSH into the server. The first thing I always do is check the most basic connectivity:

ping google.com

You get the result ping: google.com: Name or service not known. Clearly, DNS is having issues, or worse, the server has completely lost Internet connectivity. Next, I try pinging a public IP address:

ping 8.8.8.8

This time, you see Destination Host Unreachable or connect: Network is unreachable. It’s clear the server is experiencing serious issues connecting externally. Are all connections blocked? I try pinging my internal gateway:

ping 192.168.1.1 # Replace with your actual gateway address

Fortunately, the ping command succeeds. This is a crucial clue: the server can still communicate within the local network, but it cannot reach the Internet. So, what’s the cause?

Root Cause Analysis: Starting from the Basics

Whenever I face network issues, I always adhere to the OSI model, moving from lower layers to higher layers. This approach helps me systematize the debugging process and ensures I don’t overlook any possibilities.

1. Network Layer (Layer 3): IP, Gateway, Routing

This is the starting point for a packet’s journey out of the server. If the server doesn’t know the path or has an incorrect IP address, everything will immediately get stuck.

  • Does the server have a valid IP address? Is the network card active (UP status)?
  • Does the server know where to send packets externally (default gateway)?
  • Does the routing table specify the correct path for packets?

2. Transport Layer (Layer 4): Firewall

Even if packets find their way, they can still be blocked by firewalls (like iptables, firewalld), whether on the server itself or on intermediary network devices. Firewalls block outbound or inbound connections based on ports and source/destination IP addresses.

3. Application Layer (Layer 7): Domain Name Resolution (DNS)

If you can ping a public IP address but cannot ping a domain name (e.g., google.com), then DNS is the primary culprit. In this case, the server cannot resolve the domain name to an IP address, leading to connectivity errors.

Solutions and In-Depth Diagnostic Tools

Now it’s time to dive into the tools for checking each layer.

1. Check IP Configuration and Interface Status with ip

The ip command is a modern and powerful tool for network management on Linux, gradually replacing the older ifconfig command. I’ll use `ip` to check IP addresses, network card status, and many other parameters.

ip a show

This command displays all network interfaces and their IP configurations. You need to check:

  • Whether the primary interface (e.g., eth0, ens33, enp0s3) has a valid IP address within the server’s network range.
  • Whether the status of that interface is UP. If it’s DOWN, you need to activate it with the command:
sudo ip link set dev eth0 up # Replace eth0 with your interface name

Sometimes, the IP address might be lost or misconfigured. If so, you can try reassigning the IP address (note: this is only temporary, until the network service restarts or the configuration file is updated):

sudo ip addr add 192.168.1.100/24 dev eth0

2. Analyze the Routing Table with ip route

This is a critically important step to determine if the server knows how to reach the Internet. The ip route command (shortcut ip r) displays the system’s routing table.

ip r show

Look for the line starting with default via. This is the default gateway – where all packets without a specific route will be forwarded. For example:

default via 192.168.1.1 dev eth0 proto dhcp src 192.168.1.100 metric 100
192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.100 metric 100

If you don’t see a default via line or it points to an incorrect address, that’s the cause. The server simply doesn’t know how to send packets externally. To temporarily fix this issue:

sudo ip r add default via 192.168.1.1 dev eth0 # Replace with your gateway and interface

After that, try ping 8.8.8.8 again. If successful, you’ve identified the problem. For a permanent setup, you’ll need to check your NetworkManager configuration or the appropriate interface configuration files.

3. Check Domain Name Resolution with dig

If you can ping 8.8.8.8 but still can’t ping google.com, then DNS is the main culprit. The dig (Domain Information Groper) command is a powerful tool for diagnosing DNS issues.

dig google.com

If you see ;; connection timed out; no servers could be reached or a missing ANSWER SECTION, it means the server cannot query the DNS servers. First, check the system’s DNS configuration file:

cat /etc/resolv.conf

You will see nameserver lines. Ensure these IP addresses are valid and reachable. Try querying a public DNS server directly, like Google DNS (8.8.8.8):

dig @8.8.8.8 google.com

If this command succeeds, it means the DNS servers configured in /etc/resolv.conf are problematic (or unreachable from your server). You can edit /etc/resolv.conf to temporarily use public DNS servers:

echo 'nameserver 8.8.8.8' | sudo tee /etc/resolv.conf
echo 'nameserver 8.8.4.4' | sudo tee -a /etc/resolv.conf

Important Note: On systems using systemd-resolved or NetworkManager, the /etc/resolv.conf file is often a symlink and will be overwritten upon reboot. To make permanent changes, you need to configure them via systemd-resolved or NetworkManager.

4. Check TCP/UDP Connectivity with netcat (nc)

Even if IP, routing, and DNS are all fine, the application might still report connection errors to a specific service on a particular port (e.g., port 443 for HTTPS, port 80 for HTTP). At this point, netcat is a powerful tool to check if any firewall is blocking the connection.

To check if your server can connect to Google’s port 443:

nc -zv google.com 443

If you receive Connection to google.com 443 port [tcp/https] succeeded!, it means the TCP connection from your server to Google on port 443 is completely stable. Conversely, if you see Connection timed out or Connection refused, it’s highly likely that a firewall is blocking somewhere, or the target service is not running.

I also often use netcat to test two-way connectivity between two servers. For example, on Server A, I will listen on a port:

nc -lvp 12345

On Server B, I connect to Server A:

nc -zv <IP_Server_A> 12345

If the connection is successful, you can type messages from Server B and see them appear on Server A. This helps rule out firewall or transmission issues between the two servers.

5. Determine Packet Path with traceroute (or mtr)

If the above steps haven’t helped you find the problem, or you suspect an issue along the packet’s path (e.g., a faulty router), traceroute is an indispensable tool.

traceroute google.com

This command shows each hop (router) that a packet traverses to reach its destination. If packets stop at a hop or experience unusually high latency at a certain point, you can narrow down the problem. I recall a time when a service would only occasionally fail to connect to an external API.

It turned out to be due to intermittent packet loss on an intermediate router during peak hours. At that time, mtr (My Traceroute) was a lifesaver. It continuously sends packets and displays real-time packet loss statistics, making it easy to detect issues that only occurred during specific time frames.

mtr google.com

6. Check Firewall on Linux

Finally, don’t overlook checking the firewall on your server itself. Modern Linux systems typically use firewalld or iptables.

With firewalld (on CentOS/RHEL/Fedora):

sudo firewall-cmd --list-all

With iptables (on Debian/Ubuntu or older systems):

sudo iptables -nvL

Look for rules that might block outbound connections (OUTPUT chain) or rules related to FORWARD if your server acts as a router. If you suspect the firewall is the cause, try temporarily disabling it (ONLY in a test environment!) to confirm:

sudo systemctl stop firewalld # Or iptables.service

After confirming, remember to re-enable it and configure the correct rules instead of leaving it off permanently!

Best Practices for Approaching a Network Issue

Network debugging is both an art and a science. Here are some lessons I’ve learned after many “struggles” with network problems:

  1. Start with the absolute basics: Always begin checking from Layer 1 (Physical) up to Layer 7 (Application) of the OSI model. Never jump straight to checking DNS if you’re not sure about IP and routing.
  2. Use the process of elimination: With each command run and each result received, try to rule out a possibility. For example, if ping 8.8.8.8 is successful, you can rule out basic Layer 3 and 4 issues.
  3. Check logs: Always examine system logs. journalctl -u NetworkManager or journalctl -u systemd-networkd can provide clues about configuration errors or interface status. dmesg is also very useful for issues related to network card drivers.
  4. Don’t be afraid to ask and search: If you’ve tried everything and are still stuck, don’t hesitate to ask colleagues, search on Stack Overflow, or specialized forums. It’s very likely someone has encountered a similar problem before.
  5. Document your steps: Documenting not only helps you track the debugging process but also serves as valuable reference for future incidents.
  6. Stay calm: Especially at 2 AM. Pressure can make you overlook the smallest details. Take a deep breath and approach the problem systematically.

Diagnosing and troubleshooting network issues on Linux can be complex. However, with the right tools and approach, you can absolutely find and resolve the problem. Practice regularly so these commands become second nature to you!

Share: