How to Troubleshoot a Kubernetes Cluster- CKA Exam Preparation SeriesSep 14, 2020
This is one of the numerous posts by TechCommanders in a series for studying for the Certified Kubernetes Administrator (CKA) Exam.
Become a Certified Kubernetes Administrator (CKA)!
There are many cases when you realize your Kubernetes cluster is not working correctly or even stopped working. So, in this post, I will help you to understand the internal working of Kubernetes and give you some tips which will help you to navigate issues in Kubernetes.
Let’s start with understanding what are the main reasons for the failure of your Pods.
- There can be errors in the configuration resources like deployment and services.
- There might be problems in your code.
So, if the case is former the container will not start whereas in the latter the instance will start but application code will fail. We will discuss both the cases further in this article.
Prerequisite for this article is kubectl command line to interact with your Kubernetes cluster.
So, let’s start with the tips.
- Observe Pods
Check the status of Pods and verify its Running or Ready. You can use the following command to check all the pods.
kubectl get pods
So, as we can see there is one Pod which hasn't started from the last 9 hours and the status is pending. We will investigate this in the next tip.
There are some more codes which we can look for if our container fails to start.
- ImagePullBackoff: Docker image registry not accessible, image name/version specified in deployment is incorrect. For this you have to check whether the image name is correct, or not and the registry is accessible and authenticated (docker login…).
- RunContainerError: The other possible reason can be ConfigMap/Secrets missing.
- ContainerCreating: Something not available immediately, persistent volume?
Before proceeding with the other errors let’s try to start our image with incorrect name
# start Pod from image "ngin".
# 'web' can be any name, is the name of resulting K8S deployment
kubectl run web --image=ngin --replicas=1
You can see the last line in this image which caused an ImagePullBackoff error. Let’s try again with the correct name.
kubectl run temp --image=nginx --replicas=1
kubectl get pods
Now, you can see every container is running. Let’s discuss a few errors which occur after the container has started.
- CrashLoopBackOff: Pod liveness check has failed, or the Docker image is faulty. For example: The Docker CMD is exiting immediately. We will further discuss this in detail. Note: The RESTARTS column in the screenshot shows the number of restarts. In this case, you should expect to see some restarts because K8S attempts to start Pods repeatedly when errors occur.
- If the Pod is in status Running and your app is still not working correctly, proceed to further steps.
- Check Events Related to Pods: If you can see some error codes in the Pod status. You can get more information about the error by googling it or using the describe command. This can also work as magic if the container starts to fail.
kubectl describe frontend-65c58c957d-f4cqn
In the above screenshot you can see that it failed due to lack of CPU resources. This is properly visible in the last line of the image. For this you can try increasing the CPU shares and redeploy the application.
- Check your Logs: If the container is started and running properly you can check whether its functioning properly or not by checking the logs. E.g. for pod frontend-65c58c957d-bzbg2 you can run the following command.
kubectl logs --tail=10 frontend-65c58c957d-bzbg2
These are logs of the running container. Sometimes there might be a reason when your pod is showing no logs because it might be a newly started Pod, so you can have a look at previously dead Pod by using the following command.
kubectl logs frontend-65c58c957d-bzbg2 --previous
- Run “sh”, “bash”, or “ash” directly in the pods: You can use the command line inside your pods to troubleshoot the issue. For this you have to come out of the current pod by hitting exit. Then running the following command can help.
kubectl exec -it frontend-65c58c957d-bzbg2 /bin/sh
- Show Cluster Level events: Kubernetes fires events for each and every change in the state(Normal, warning etc). These commands are helpful for understanding what happened behind the scenes. The get event commands provide an aggregate perspective of the event.
# all events sorted by time.
kubectl get events --sort-by=.metadata.creationTimestamp
# warnings only
kubectl get events --field-selector type=Warning
# events related to Nodes
kubectl get events --field-selector involvedObject.kind=Node
With this, I hope you will find yourself more comfortable while working on the Kubernetes cluster. By following these easy steps or tips you can easily find and fix the error in K8S resources and code.
Join TechCommanders Today.
Over 60 Courses and Practice Questions!
Coaching and CloudINterviewACE
Stay connected with news and updates!
Join our mailing list to receive the latest news and updates from our team.
Don't worry, your information will not be shared.
We hate SPAM. We will never sell your information, for any reason.