Steven Rosales

Linux Troubleshooting Commands for DevOps and Support

The goal is to document common Linux commands, troubleshooting workflows, and investigation patterns used to diagnose real server issues.


Table of Contents


1. System Information

Use these commands to understand the server, OS version, hostname, uptime, kernel, and current user context.

hostname
hostnamectl
uname -a
uptime
whoami
id
date

Check OS release:

cat /etc/os-release

Check kernel version:

uname -r

Check system architecture:

uname -m

Useful when:


2. CPU Troubleshooting

Use these commands when the server is slow or CPU usage is high.

Real-time CPU usage

top

If installed:

htop

Check load average

uptime

Example output:

10:30:15 up 5 days, 2:10, 2 users, load average: 1.20, 2.10, 3.50

The load average shows system demand over:

1 minute, 5 minutes, 15 minutes

Find top CPU-consuming processes

ps aux --sort=-%cpu | head

Show CPU information

lscpu

Troubleshooting logic


3. Memory Troubleshooting

Use these commands when the server is slow, applications are crashing, or the system may be running out of memory.

Check memory usage

free -h

Real-time memory usage

top

Find top memory-consuming processes

ps aux --sort=-%mem | head

Check virtual memory statistics

vmstat 1

Check swap usage

swapon --show
free -h

Check OOM killer events

dmesg | grep -i "out of memory"
dmesg | grep -i "killed process"

Or with journalctl:

journalctl -k | grep -i "out of memory"
journalctl -k | grep -i "killed process"

Troubleshooting logic


4. Disk Usage Troubleshooting

Use these commands when the disk is full or applications cannot write files.

Check filesystem usage

df -h

Check inode usage

df -i

A disk can fail because of:

Storage full
Inodes full
Read-only filesystem
Permission issues
Large log files

Check largest folders in current directory

du -sh * | sort -h

Check largest folders under root

sudo du -xh / | sort -h | tail -n 20

Find large files

sudo find / -type f -size +500M 2>/dev/null

Find recently modified large files

sudo find / -type f -mtime -1 -size +100M 2>/dev/null

Troubleshooting logic


5. Storage and Partitions

Use these commands to inspect disks, mounts, partitions, and block devices.

List block devices

lsblk

Show mounted filesystems

mount

More readable:

findmnt

Check disk partitions

sudo fdisk -l

Check filesystem type

df -Th

Check UUIDs

blkid

Check persistent mounts

cat /etc/fstab

Troubleshooting logic


6. Process Troubleshooting

Use these commands to inspect running processes.

Show all processes

ps aux

Find a process by name

ps aux | grep nginx

Better:

pgrep -a nginx

Show process tree

pstree

If not installed:

ps -ef --forest

Kill a process

kill <PID>

Force kill:

kill -9 <PID>

Use kill -9 only when normal termination does not work.

Troubleshooting logic


7. Service Troubleshooting

Use these commands when a Linux service is down, unhealthy, or failing to start.

Check service status

systemctl status nginx

Start, stop, restart service

sudo systemctl start nginx
sudo systemctl stop nginx
sudo systemctl restart nginx

Enable service at boot

sudo systemctl enable nginx

Disable service at boot

sudo systemctl disable nginx

Check if service is active

systemctl is-active nginx

Check if service is enabled

systemctl is-enabled nginx

View service logs

journalctl -u nginx

Follow logs live:

journalctl -u nginx -f

View recent logs:

journalctl -u nginx --since "1 hour ago"

Troubleshooting logic


8. Logs and Journalctl

Logs are one of the most important sources during troubleshooting.

Common log locations

/var/log/syslog
/var/log/messages
/var/log/auth.log
/var/log/secure
/var/log/nginx/
/var/log/httpd/
/var/log/audit/

View logs live

tail -f /var/log/syslog

For RHEL/CentOS:

tail -f /var/log/messages

Search errors

grep -i "error" /var/log/syslog

Search failed logins

Ubuntu/Debian:

grep -i "failed password" /var/log/auth.log

RHEL/CentOS:

grep -i "failed password" /var/log/secure

Journal logs

journalctl

Follow logs live:

journalctl -f

Kernel logs:

journalctl -k

Logs since one hour ago:

journalctl --since "1 hour ago"

Logs for a service:

journalctl -u nginx

Troubleshooting logic


9. Network Interfaces

Use these commands to inspect network interfaces and routes.

Show IP addresses

ip a

Show routes

ip route

Show interface statistics

ip -s link

Show DNS resolver config

cat /etc/resolv.conf

Troubleshooting logic


10. Connectivity Testing

Use these commands to test if the server can reach another host or service.

Ping test

ping 8.8.8.8

Test HTTP/HTTPS connectivity

curl -v http://example.com
curl -I https://example.com

Test a specific port with netcat

nc -vz example.com 443

If nc is not installed, use:

telnet example.com 443

Trace network path

traceroute example.com

If using systems with tracepath:

tracepath example.com

Troubleshooting logic


11. DNS Troubleshooting

Use these commands when hostname resolution is failing.

Resolve a domain

dig example.com

Alternative:

nslookup example.com

Query a specific DNS server

dig @8.8.8.8 example.com

Check DNS config

cat /etc/resolv.conf

Check local hosts file

cat /etc/hosts

Troubleshooting logic


12. Ports and Listening Services

Use these commands when an application is not reachable or a port conflict is suspected.

Show listening TCP/UDP ports

ss -tulpn

Show listening TCP ports

ss -tlpn

Find what is using a port

sudo lsof -i :8080

Alternative:

sudo ss -tulpn | grep 8080

Test local application port

curl -v http://localhost:8080

Troubleshooting logic


13. Firewall Checks

Firewall commands depend on the Linux distribution.

UFW - Ubuntu

Check status:

sudo ufw status verbose

Allow a port:

sudo ufw allow 80/tcp

firewalld - RHEL/CentOS

Check status:

sudo firewall-cmd --state

List rules:

sudo firewall-cmd --list-all

Allow a port:

sudo firewall-cmd --add-port=80/tcp --permanent
sudo firewall-cmd --reload

iptables

List rules:

sudo iptables -L -n -v

Troubleshooting logic


14. Permissions and Ownership

Use these commands when seeing Permission denied errors.

Show permissions

ls -l

Show hidden files too

ls -la

Change permissions

chmod 644 file.txt
chmod 755 script.sh
chmod +x script.sh

Change owner

sudo chown user:user file.txt

Change owner recursively

sudo chown -R user:user /path/to/directory

Troubleshooting logic


15. File Search and Disk Cleanup

Use these commands to find files, clean logs, and investigate disk usage.

Find files by name

find /var/log -name "*.log"

Find files by size

find / -type f -size +500M 2>/dev/null

Find files modified in the last day

find . -type f -mtime -1

Find files by extension

find . -type f -name "*.yaml"

Clean package cache

Ubuntu/Debian:

sudo apt clean

RHEL/CentOS:

sudo yum clean all

Truncate a large log file

sudo truncate -s 0 /var/log/app.log

Use with caution.

Troubleshooting logic


16. Package Management

Package commands depend on the Linux distribution.

Ubuntu/Debian

Update package index:

sudo apt update

Install package:

sudo apt install nginx

Remove package:

sudo apt remove nginx

Search package:

apt search nginx

RHEL/CentOS

Install package:

sudo yum install nginx

Remove package:

sudo yum remove nginx

For newer systems:

sudo dnf install nginx
sudo dnf remove nginx

Troubleshooting logic


17. Users and Groups

Use these commands for access and permission troubleshooting.

Current user

whoami
id

List logged-in users

who
w

Add user

sudo useradd username

Change password

sudo passwd username

Add user to a group

sudo usermod -aG groupname username

Check groups

groups username

Troubleshooting logic


18. SSH Troubleshooting

Use these commands when you cannot connect to a server over SSH.

Basic SSH

ssh user@server

Verbose SSH debug

ssh -v user@server

More verbose:

ssh -vvv user@server

Check SSH service

systemctl status ssh

On some systems:

systemctl status sshd

Check SSH port

ss -tulpn | grep ssh

Check SSH logs

Ubuntu/Debian:

sudo tail -f /var/log/auth.log

RHEL/CentOS:

sudo tail -f /var/log/secure

Troubleshooting logic


19. Performance Quick Checks

Use this quick command set when a server is slow.

hostname
uptime
free -h
df -h
df -i
top
ps aux --sort=-%cpu | head
ps aux --sort=-%mem | head
ss -tulpn
journalctl --since "1 hour ago" | tail -n 100

Quick investigation logic:

1. Confirm the server and time.
2. Check CPU and load average.
3. Check memory and swap.
4. Check disk and inodes.
5. Check top processes.
6. Check listening ports.
7. Check recent system logs.
8. Check application logs.
9. Check recent changes.
10. Document findings.

20. Real Troubleshooting Scenarios

Scenario 1: Server is slow

Commands:

uptime
top
free -h
df -h
ps aux --sort=-%cpu | head
ps aux --sort=-%mem | head

What to check:

Possible fixes:


Scenario 2: Disk is full

Commands:

df -h
df -i
du -sh * | sort -h
sudo find / -type f -size +500M 2>/dev/null

What to check:

Possible fixes:


Scenario 3: Service is down

Commands:

systemctl status nginx
journalctl -u nginx --since "1 hour ago"
ss -tulpn | grep nginx

What to check:

Possible fixes:


Scenario 4: Application is not listening on expected port

Commands:

ss -tulpn
sudo lsof -i :8080
curl -v http://localhost:8080
systemctl status app-service

What to check:

Possible fixes:


Scenario 5: DNS resolution is failing

Commands:

dig example.com
nslookup example.com
cat /etc/resolv.conf
cat /etc/hosts

What to check:

Possible fixes:


Scenario 6: Cannot SSH into server

Commands:

ping server
nc -vz server 22
ssh -vvv user@server
systemctl status ssh
sudo tail -f /var/log/auth.log

What to check:

Possible fixes:


Scenario 7: High memory usage

Commands:

free -h
top
ps aux --sort=-%mem | head
swapon --show
dmesg | grep -i "killed process"

What to check:

Possible fixes:


Scenario 8: Permission denied error

Commands:

ls -l
ls -la
whoami
id
namei -l /path/to/file

What to check:

Possible fixes: