how to fix PLEG problem in Kubernetes

I’ve got kubelet message on our Kubernetes, the error message like below.

skipping pod synchronization - [PLEG is not healthy: pleg was last seen active 3m6.527452257s ago; threshold is 3m0s]

you can get the message from the rancher UI if using it.

Solution

According to the IBM document, this issue is caused by slow interaction between kubelet and Docker. the solution is to increase house-keeping interval, house-keeping is the kubelet evaluates eviction thresholds based on its configured housekeeping-interval which defaults to 10s.

Step

We are using RKE to deploy Kubernetes, so all components are running in the form of containers. use docker inspect kubelet` to see the configuration. no house-keeping argument in the below screenshot.

if your kubelet running as a service, you can modify /etc/systemd/system/kubelet.service

To update kubelet, add the house-keeping in the cluster.yml

To update the argument, run the below command

rke up --config cluster.yml

Verify

use docker inspect command to check again, house-keeping is added after updating.

Conclusion

“PLGE is not healthy” can happen due to various causes, I believe there are many potential causes I have not run into it. yet. this post provides one of the solutions to fix.