K8s Troubleshooting

Quick Diagnostic Commands

# Check pod status
kubectl get pods -n <namespace> -o wide

# Describe pod (events and conditions)
kubectl describe pod <pod-name> -n <namespace>

# View container logs
kubectl logs <pod-name> -c <container-name> --previous

# Interactive shell in running pod
kubectl exec -it <pod-name> -- /bin/sh

# Check resource usage
kubectl top pods -n <namespace>
kubectl top nodes

Common Issues & Solutions

CrashLoopBackOff

Container keeps crashing and restarting.

kubectl logs <pod> --previous    # Check last crash logs
kubectl describe pod <pod>       # Check exit code and events
# Exit code 137 = OOMKilled
# Exit code 1 = App error
# Exit code 126/127 = Command not found
ImagePullBackOff / ErrImagePull

Cannot pull container image.

# Causes: wrong image name, auth issue, registry unreachable
kubectl describe pod <pod> | grep -A 5 Events
# Fix: create imagePullSecret
kubectl create secret docker-registry regcred \
  --docker-server=registry.example.com \
  --docker-username=user \
  --docker-password=pass
Pending Pod
# Check events for scheduling failures
kubectl describe pod <pod> | grep Events -A 20

# Common causes:
# 1. Insufficient CPU/Memory: kubectl describe nodes | grep -A 5 Allocatable
# 2. No matching NodeSelector/Taint
kubectl get nodes --show-labels
# 3. PVC not bound
kubectl get pvc -n <namespace>
OOMKilled

Container exceeded memory limit.

# Exit code 137 = killed by OOM
kubectl describe pod <pod> | grep -i oom

# Fix: increase memory limit in deployment
resources:
  requests:
    memory: "128Mi"
  limits:
    memory: "512Mi"   # increase this

# Monitor: kubectl top pods --sort-by=memory