Yesterday, I encountered an annoying problem in my OpenShift Cluster. I was trying to deploy an application. Everything was fine until OpenShift tried to push the image generated at the end of the build to the OpenShift registry.
Below is how I described the problem to Red Hat support:
I just ran my jenkins pipeline and everthing was success until I hit this problem. OpenShift failed to push the created image to the docker registry 6 times. Then stopped trying. When I checked the defaut namespace, I saw that the pod creation for docker-registry was failing. It says “no nodes are available that match all of the following predicates:: CheckServiceAffinity (2), MatchNodeSelector (2)”. I checked my 2 compute nodes and there are no disk space problems. I checked my infra node too, no problems there either. This automated deployment pipeline was working like 3 weeks ago, there has been no change since then and I don’t see why registry pod is failing now. I am attching some screenshots and output of df command in infra node. Also, I don’t know why but this pod, although dc says 1 replicas, still trying to scale up to 2 and then scaling down..
It turned out that the problem was originated from “openvswitch.service”. Service just could not start. Which in turn was caused by “ovs-vswitchd.service”. Short path to the result: It was because “ovs-vswitchd.service” kept timing out while trying to start and which resulted at the end with “ovs-vswitchd.service start operation timed out. Terminating.” message (output of journalctl command).
I did some searching and digging around. The solution that I came up with was to add TimeoutSec values to both “/usr/lib/systemd/system/openvswitch.service” and “/usr/lib/systemd/system/ovs-vswitchd.service”. Restarting them with “systemctl restart <service_name>” command finally got those services to active (running) state.
After rebooting the infra node, waiting for a couple of minutes, and running “oc get nodes” on the master node I got this:
tybsrhosinode01.defence.local Ready 189d v1.6.1+5115d708d7
tybsrhosmaster01.defence.local Ready,SchedulingDisabled 189d v1.6.1+5115d708d7
tybsrhosnode01.defence.local Ready 189d v1.6.1+5115d708d7
tybsrhosnode02.defence.local Ready 189d v1.6.1+5115d708d7
Which says it is all good now!
[[email protected] ~]# cat /usr/lib/systemd/system/ovs-vswitchd.service
Description=Open vSwitch Forwarding Unit
After=ovsdb-server.service network-pre.target systemd-udev-settle.service
–no-ovsdb-server –no-monitor –system-id=random \
ExecStop=/usr/share/openvswitch/scripts/ovs-ctl –no-ovsdb-server stop
ExecReload=/usr/share/openvswitch/scripts/ovs-ctl –no-ovsdb-server \
–no-monitor –system-id=random \
[[email protected] ~]# cat /usr/lib/systemd/system/openvswitch.service
After=network-pre.target ovsdb-server.service ovs-vswitchd.service
Hope this helps.