IT Problems & Solutions
Step-by-step fixes for the most common IT problems. Real commands, real solutions — no fluff.
200+ Problems Solved
9 Topic Areas
500+ Commands & Examples
No problems found for ""
Networking Problems
28 problems
High
Cannot connect to the internet — all pings fail
Problem
No internet access.
ping 8.8.8.8 times out. Browser shows "No internet connection".Diagnosis
ip addr show # check if interface has IP
ip route show # check default gateway exists
ping 192.168.1.1 # ping gateway — if fails, local issue
ping 8.8.8.8 # if gateway ok, DNS/ISP issue
cat /etc/resolv.conf # check DNS servers
Solution — Step by Step
- Restart network interface:
sudo ip link set eth0 down && sudo ip link set eth0 up - Request new IP via DHCP:
sudo dhclient -r && sudo dhclient eth0 - Add Google DNS if resolv.conf is empty:
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf - Add missing default route:
sudo ip route add default via 192.168.1.1 - If still failing, restart networking service:
sudo systemctl restart NetworkManager
High
DNS resolution failing — domain not found
Problem
Websites fail with "server not found" but IP addresses work fine.
nslookup google.com returns SERVFAIL.Diagnosis
nslookup google.com # test DNS resolution
nslookup google.com 8.8.8.8 # test with Google DNS directly
cat /etc/resolv.conf # see configured DNS servers
systemd-resolve --status # check systemd-resolved
Solution
- Set reliable DNS servers:
sudo nano /etc/resolv.conf→ addnameserver 8.8.8.8andnameserver 1.1.1.1 - Flush DNS cache:
sudo systemd-resolve --flush-caches - On Windows:
ipconfig /flushdns - Restart DNS resolver:
sudo systemctl restart systemd-resolved
Medium
High network latency and packet loss
Problem
Slow network, video calls drop, ping shows >200ms or packet loss.
Diagnosis
ping -c 50 8.8.8.8 # check packet loss %
mtr --report 8.8.8.8 # traceroute + ping combined
iperf3 -c iperf.he.net # test bandwidth
ss -tulpn # check what's using bandwidth
nethogs # per-process bandwidth monitor
Solution
- Identify bandwidth hogs:
sudo nethogs eth0— kill greedy processes - Check for duplex mismatch:
ethtool eth0— set tosudo ethtool -s eth0 duplex full speed 1000 - Enable jumbo frames for LAN:
sudo ip link set eth0 mtu 9000 - If Wi-Fi: switch to 5GHz band or move closer to AP
- If VPS: contact provider — could be noisy neighbor on shared host
Medium
SSH connection refused on port 22
Problem
ssh user@host returns "Connection refused" or times out.Diagnosis
nc -zv host 22 # test if port 22 open
nmap -p 22 host # port scan
# on the server:
systemctl status sshd # is SSH daemon running?
ss -tlnp | grep 22 # is it listening?
ufw status # firewall blocking?
Solution
- Start SSH daemon:
sudo systemctl start sshd && sudo systemctl enable sshd - Allow port in firewall:
sudo ufw allow 22/tcp - If port was changed, connect with:
ssh -p 2222 user@host - Check SSH config:
sudo sshd -T | grep port
Medium
SSL certificate error in browser
Problem
"Your connection is not private" / NET::ERR_CERT_AUTHORITY_INVALID or certificate expired warning.
Diagnosis
openssl s_client -connect domain.com:443 2>/dev/null | openssl x509 -noout -dates
# Shows notBefore and notAfter dates
curl -vI https://domain.com 2>&1 | grep -E "expire|SSL|cert"
Solution
- Check expiry: if expired, renew with
sudo certbot renew --force-renewal - Auto-renew setup:
sudo certbot renew --dry-run - Add cron:
0 12 * * * certbot renew --quiet - If self-signed cert: replace with Let's Encrypt free cert
- Check system clock — wrong date causes SSL failures:
timedatectl status
Low
VPN connected but no internet access
Problem
VPN shows connected but browsing fails. Also known as "VPN tunnel all traffic" issue.
Solution
- Check routing table:
ip route show— look for 0.0.0.0/0 via VPN interface - Enable split tunneling in VPN client settings to allow non-VPN traffic
- Add DNS bypass: set DNS to 8.8.8.8 in network settings while VPN is on
- On OpenVPN: remove
redirect-gateway def1from config if you don't want all traffic routed
High
Port 80/443 not accessible from outside
Problem
Web server runs locally but external users can't reach the site.
Diagnosis
ss -tlnp | grep -E "80|443" # is nginx/apache listening?
sudo ufw status # firewall rules
curl -I http://localhost # local test
# From outside:
curl -I http://YOUR_IP
Solution
- Open firewall:
sudo ufw allow 80/tcp && sudo ufw allow 443/tcp - Check server binds to 0.0.0.0 not 127.0.0.1 in nginx/apache config
- If cloud server (AWS/DO): add inbound rules in security group / firewall panel
- Restart web server:
sudo systemctl restart nginx
Security Problems
30 problems
High
Server getting brute-forced via SSH
Problem
Hundreds of failed SSH login attempts in auth.log. Server is being scanned/attacked.
Diagnosis
sudo grep "Failed password" /var/log/auth.log | tail -20
sudo grep "Failed password" /var/log/auth.log | awk '{print $11}' | sort | uniq -c | sort -rn | head -10
Solution
- Install fail2ban:
sudo apt install fail2ban && sudo systemctl enable fail2ban - Disable password auth — use keys only: in
/etc/ssh/sshd_configsetPasswordAuthentication no - Change SSH port:
Port 2222in sshd_config (reduces noise significantly) - Allow only your IP:
sudo ufw allow from YOUR_IP to any port 22 - Reload SSH:
sudo systemctl reload sshd
High
Website defaced or injected with malware
Problem
Site shows unexpected content, redirects visitors, or Google marks it as dangerous.
Diagnosis
find /var/www -name "*.php" -newer /var/www/index.php -ls # recently changed files
grep -r "eval(base64_decode" /var/www/ # common malware pattern
grep -r "iframe" /var/www/ --include="*.html" # injected iframes
Solution
- Take site offline immediately to protect visitors
- Restore from last known clean backup
- Change all passwords: FTP, database, CMS admin, hosting panel
- Update CMS/plugins to latest versions — patch the entry point
- Add WAF: Cloudflare free plan blocks most attacks
- Request Google reconsideration after cleanup
High
Ransomware encrypted files on server
Problem
Files renamed with unknown extension (.locked, .crypt). Ransom note left on server.
Solution
- Immediately isolate the server — disconnect from network
- Do NOT pay the ransom — no guarantee of decryption
- Identify the ransomware family at nomoreransom.org — may have free decryptor
- Restore from offline backup (why you should always have air-gapped backups)
- Report to CISA (US) or your national cybercrime unit
- After restore: patch the entry vector, enable EDR, set up immutable backups
Medium
API keys accidentally pushed to GitHub
Problem
AWS keys, API tokens or passwords committed to a public or private repository.
Solution
- Revoke the exposed key IMMEDIATELY — do this before anything else
- Remove from Git history:
git filter-repo --path secrets.env --invert-paths - Force push:
git push origin --force --all - Add to .gitignore:
echo ".env" >> .gitignore - Use environment variables or a secrets manager (AWS Secrets Manager, HashiCorp Vault)
- Enable GitHub secret scanning to catch this automatically in future
Medium
Users logging in from suspicious locations
Problem
Account accessed from unknown country/IP. Possible credential compromise.
Solution
- Force password reset for affected account immediately
- Invalidate all existing sessions
- Enable MFA — even if password is leaked, attacker can't log in
- Check for other accounts using same password — change those too
- Review audit logs for what the attacker accessed or changed
- Set up geo-blocking or anomaly detection alerts
Low
Missing security headers on web application
Problem
securityheaders.com gives F grade. Missing CSP, HSTS, X-Frame-Options etc.
Solution — Add to Nginx config
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-XSS-Protection "1; mode=block" always;
add_header Referrer-Policy "strict-origin-when-cross-origin" always;
add_header Content-Security-Policy "default-src 'self'; script-src 'self' 'unsafe-inline' https://www.googletagmanager.com; style-src 'self' 'unsafe-inline';" always;
Linux Problems
28 problems
High
Disk 100% full — server unresponsive
Problem
df -h shows 100% usage. Applications crash, logs stop writing, server may become unreachable.Find the culprit
df -h # which partition is full
du -sh /* 2>/dev/null | sort -rh | head -10 # top directories by size
du -sh /var/log/* | sort -rh | head -10 # often logs are the issue
find / -name "*.log" -size +100M 2>/dev/null # big log files
Solution
- Clear old logs:
sudo journalctl --vacuum-size=100M - Remove old kernels:
sudo apt autoremove --purge - Clean apt cache:
sudo apt clean - Find and delete large temp files:
sudo find /tmp -size +50M -delete - Truncate large log file (safe):
sudo truncate -s 0 /var/log/syslog - Set log rotation: configure
/etc/logrotate.conf
High
CPU at 100% — server crawling
Diagnosis
top # see what's eating CPU (press P to sort by CPU)
ps aux --sort=-%cpu | head -10 # top CPU processes
htop # visual alternative
iotop # if it's I/O wait, not CPU
Solution
- Kill runaway process:
kill -9 PID - Nice a process to lower priority:
renice +10 PID - Check for crypto miners:
ps aux | grep -i "minerd\|xmrig\|cryptonight" - If it's a web process: check for slow database queries or infinite loops in code
- Set CPU limits with cgroups or systemd:
CPUQuota=50%in service file
High
Permission denied errors on files/directories
Problem
bash: ./script.sh: Permission denied or cannot write to a directory you should own.Diagnosis
ls -la /path/to/file # check permissions and owner
whoami # current user
id # groups you're in
stat /path/to/file # full permission details
Solution
- Make script executable:
chmod +x script.sh - Change ownership:
sudo chown user:group /path - Fix web directory permissions:
sudo chown -R www-data:www-data /var/www - Set correct directory perms:
chmod 755 /dir(dirs need execute bit) - Add user to group:
sudo usermod -aG groupname username(log out and back in)
Medium
Service fails to start after reboot
Diagnosis
systemctl status servicename # see error message
journalctl -u servicename -n 50 # last 50 log lines for service
journalctl -u servicename --since "1 hour ago"
Solution
- Enable auto-start:
sudo systemctl enable servicename - Fix config errors shown in journal output
- Check dependencies:
systemctl list-dependencies servicename - If port conflict:
ss -tlnp | grep PORT— kill conflicting process - Reload daemon after editing service file:
sudo systemctl daemon-reload
Medium
Out of memory — OOM killer killing processes
Diagnosis
free -h # see available RAM
dmesg | grep -i "oom\|killed" # OOM killer log
ps aux --sort=-%mem | head -10 # top memory processes
Solution
- Add swap (quick fix):
sudo fallocate -l 2G /swapfile && sudo chmod 600 /swapfile && sudo mkswap /swapfile && sudo swapon /swapfile - Make swap permanent: add
/swapfile none swap sw 0 0to/etc/fstab - Tune swappiness:
sudo sysctl vm.swappiness=10 - Find memory leaks — restart the leaking service nightly via cron
- Upgrade RAM or move to larger server
Cloud Problems
24 problems
High
AWS bill unexpectedly high — cost spike
Problem
AWS/GCP/Azure bill 10x higher than expected. Often caused by forgotten resources or data transfer costs.
Solution
- Go to AWS Cost Explorer → identify the service causing the spike
- Common culprits: NAT Gateway data transfer, unused EC2 instances, S3 request spikes, Elastic IPs not attached
- Set billing alarm:
aws cloudwatch put-metric-alarm --alarm-name billing-alarm --metric-name EstimatedCharges - Enable AWS Budgets — get email before threshold reached
- Use Reserved Instances or Savings Plans for predictable workloads (saves 40-70%)
- Delete idle resources: use AWS Trusted Advisor to find them
High
EC2 instance unreachable after security group change
Problem
Locked yourself out of EC2 — SSH and HTTP both blocked after mis-configuring security group.
Solution
- Go to AWS Console → EC2 → Security Groups
- Find the security group → Edit Inbound Rules
- Add rule: Type SSH, Port 22, Source: My IP (or 0.0.0.0/0 temporarily)
- For future: use AWS Systems Manager Session Manager — SSH without port 22
- Never remove all rules in one batch — always add new rule before removing old
Medium
S3 bucket public — data exposed
Problem
S3 bucket accidentally made public exposing files. Noticed via security scan or data breach.
Solution
- Block public access immediately: AWS Console → S3 → Bucket → Permissions → Block all public access → Enable
- Or via CLI:
aws s3api put-public-access-block --bucket BUCKET --public-access-block-configuration "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true" - Audit what was accessed: enable S3 server access logging and CloudTrail
- Enable S3 Block Public Access at account level to prevent future mistakes
Medium
Kubernetes pod stuck in CrashLoopBackOff
Diagnosis
kubectl get pods # see status
kubectl describe pod POD_NAME # events and error details
kubectl logs POD_NAME --previous # logs from crashed container
kubectl logs POD_NAME -c CONTAINER # specific container logs
Solution
- Read crash logs — usually reveals the root cause immediately
- Common causes: missing environment variables, wrong image, misconfigured liveness probe
- Check resource limits — OOMKilled means pod ran out of memory:
kubectl describe pod | grep -A5 Limits - Test image locally:
docker run --rm IMAGE_NAME - Fix config then:
kubectl rollout restart deployment/DEPLOYMENT_NAME
DevOps Problems
24 problems
High
Docker container exits immediately after start
Diagnosis
docker ps -a # see exited containers
docker logs CONTAINER_ID # see what it printed before dying
docker inspect CONTAINER_ID # full config and exit code
docker run -it IMAGE /bin/sh # run interactively to debug
Solution
- Exit code 1 = application error — check logs for crash message
- Exit code 137 = OOM killed — increase
--memorylimit - Missing CMD/ENTRYPOINT — add to Dockerfile:
CMD ["python", "app.py"] - Missing env vars — add
-e VAR=valueor use--env-file .env - Permission issue on mounted volumes — check
:zflag on SELinux systems
High
CI/CD pipeline failing on every push
Problem
GitHub Actions / Jenkins pipeline red on every commit. Tests pass locally but fail in CI.
Solution
- Read the full CI log — don't just see "failed", read WHY
- Common: missing secrets — add to GitHub Settings → Secrets and variables
- Environment mismatch — CI uses different Node/Python version: pin it in workflow
node-version: '20' - Missing test dependencies: add install step before test step
- Reproduce locally:
acttool runs GitHub Actions locally - Port conflicts between parallel jobs: use dynamic ports or sequential jobs
Medium
Git merge conflicts blocking deployment
Solution
- See all conflicted files:
git status | grep "both modified" - Open each file — resolve between
<<<<<<<and>>>>>>>markers - Use visual tool:
git mergetool(opens vimdiff or configured tool) - After resolving:
git add . && git commit - Prevent conflicts: merge main into feature branches frequently, keep PRs small
Medium
Nginx 502 Bad Gateway error
Diagnosis
sudo tail -f /var/log/nginx/error.log # see the actual error
systemctl status gunicorn # is backend running?
curl http://localhost:8000 # test backend directly
ss -tlnp | grep 8000 # is backend listening on expected port?
Solution
- 502 = nginx can't reach backend — start the backend service
- Check proxy_pass port matches where backend actually runs
- Increase timeouts if backend is slow:
proxy_read_timeout 300;in nginx config - Check unix socket permissions if using socket instead of port
- Reload after config fix:
sudo nginx -t && sudo systemctl reload nginx
Database Problems
24 problems
High
MySQL queries extremely slow — full table scans
Diagnosis
EXPLAIN SELECT * FROM users WHERE email = '[email protected]';
-- Look for "type: ALL" = full table scan — needs index
SHOW FULL PROCESSLIST; -- see running queries
SHOW VARIABLES LIKE 'slow_query_log%'; -- enable slow query log
Solution
- Add index on searched column:
CREATE INDEX idx_email ON users(email); - Enable slow query log:
SET GLOBAL slow_query_log = 1; SET GLOBAL long_query_time = 1; - Analyze with EXPLAIN — every query plan should show index use, not ALL
- Use composite index for multi-column WHERE:
CREATE INDEX idx_name ON table(col1, col2); - Run ANALYZE TABLE to update statistics:
ANALYZE TABLE users;
High
Database connection pool exhausted
Problem
"Too many connections" error. Application throws connection errors under load.
Diagnosis
SHOW STATUS LIKE 'Threads_connected';
SHOW VARIABLES LIKE 'max_connections';
SHOW FULL PROCESSLIST; -- see all open connections
Solution
- Increase max connections:
SET GLOBAL max_connections = 500; - Add connection pooling: use PgBouncer (PostgreSQL) or ProxySQL (MySQL)
- Find connection leaks in code — ensure connections are closed after use
- Kill idle connections:
KILL CONNECTION_ID; - Set wait_timeout:
SET GLOBAL wait_timeout = 60;
High
Accidentally deleted important data — no backup
Problem
DELETE FROM users; or DROP TABLE orders; without WHERE clause or backup.Solution
- Stop writes immediately to prevent overwriting deleted data
- MySQL: check binary logs for recovery:
mysqlbinlog --start-datetime="2026-07-03 08:00:00" /var/lib/mysql/mysql-bin.000001 | mysql -u root -p - PostgreSQL: if WAL archiving was on, use Point-in-Time Recovery
- Check if any replica has the data — promote replica before it syncs the delete
- Lesson: always use transactions, always have backups, test restores regularly
Medium
Deadlock errors in database logs
Diagnosis
SHOW ENGINE INNODB STATUS\G -- MySQL deadlock info
-- Look for "LATEST DETECTED DEADLOCK" section
Solution
- Always access tables in the same order across all transactions
- Keep transactions short — don't hold locks while doing non-DB work
- Use lower isolation level if possible: READ COMMITTED instead of REPEATABLE READ
- Add retry logic in application for deadlock errors (error code 1213 in MySQL)
- Use SELECT ... FOR UPDATE only when you'll actually update
Hardware Problems
22 problems
High
Server overheating — thermal throttling
Diagnosis
sensors # CPU/GPU temps (install: apt install lm-sensors)
sudo sensors-detect # first-time setup
cat /sys/class/thermal/thermal_zone*/temp # raw temp in millidegrees
dmesg | grep -i "thermal\|throttl" # kernel thermal events
Solution
- Clean dust from fans and heatsinks — compressed air every 6-12 months
- Replace thermal paste on CPU — degrades after 3-5 years
- Ensure proper airflow — hot exhaust not recirculating back as intake
- Check fan speeds:
sensors— if fans at 0 RPM they've failed - For servers: check data center ambient temp and cooling unit function
High
Hard drive failing — S.M.A.R.T. errors
Diagnosis
sudo smartctl -a /dev/sda # full SMART report
sudo smartctl -H /dev/sda # overall health assessment
dmesg | grep -i "error\|ata\|I/O" # kernel disk errors
Solution
- If SMART shows "FAILED" — backup ALL data immediately, drive will die soon
- Watch reallocated_sector_ct — any non-zero value is serious
- Set up monitoring:
sudo apt install smartmontools && sudo systemctl enable smartd - Check disk health weekly: add to cron
smartctl -H /dev/sda | mail -s "Disk Health" [email protected] - Replace drive before it fails — RAID is not a backup
Medium
RAM causing random crashes and blue screens
Diagnosis
# Linux: run memtest86 from boot
# Or: sudo apt install memtester
sudo memtester 1G 1 # test 1GB of RAM, 1 pass
# Windows: mdsched.exe → Restart and check for problems
Solution
- Run memtest86+ overnight — any errors = faulty RAM
- If multiple sticks: remove one at a time to isolate the bad stick
- Reseat RAM sticks — remove and firmly push back in
- Try different RAM slots — could be motherboard slot fault
- Replace faulty DIMM — RAM is relatively inexpensive
Compliance Problems
20 problems
High
GDPR violation — user data processed without consent
Problem
Tracking users, storing personal data, or sending emails without proper GDPR consent mechanism.
Solution
- Add cookie consent banner before loading any tracking (GA4, Meta Pixel, etc.)
- Create and publish Privacy Policy and Cookie Policy pages
- Implement consent management platform (CMP): Cookiebot, OneTrust, or open-source Klaro
- Document your legal basis for processing: consent, legitimate interest, contract, etc.
- Add "Delete My Data" mechanism for users to exercise right to erasure
- Appoint DPO if processing sensitive data at scale
High
PCI DSS — storing plain-text card data
Problem
Card numbers, CVVs, or full track data stored in database in plain text — massive PCI violation.
Solution
- Delete all stored card data immediately
- Use a payment processor (Stripe, Braintree) — never touch raw card data
- Tokenize: processor gives you a token to charge later, card number never hits your server
- Scope reduction: if you don't store/transmit card data, PCI scope drops to SAQ A (simplest)
- Never store CVV under any circumstances — prohibited by PCI DSS
Medium
Audit log gaps — missing activity records
Problem
Security audit finds incomplete logs — who accessed what data and when is not tracked.
Solution
- Enable database audit logging: MySQL audit plugin or PostgreSQL pgaudit extension
- Log all admin actions: who changed what config, when, from which IP
- Ship logs to immutable storage: attacker can't delete what they can't reach
- Set retention: SOC 2 requires 1 year, PCI DSS requires 1 year minimum
- Use structured logging (JSON) for easy querying and alerting