Recent Updates Toggle Comment Threads | Keyboard Shortcuts

  • Serdar Osman Onur 1:55 pm on November 13, 2018 Permalink | Reply
    Tags: , ,   

    error: build error: Failed to push image – OpenShift v3.6 

    We had the below problem while trying to deploy an application on OpenShift version 3.6. The build was successful but it failed trying to push the image to the registry:

    Copying Maven artifacts from /tmp/src/XX/XXX/XXXX/target to /deployments …

    Running: cp *-SNAPSHOT.jar /deployments

    … done

    Pushing image docker-registry.default.svc:5000/tybsdev/XXXX:latest …

    Registry server Address:

    Registry server User Name: serviceaccount

    Registry server Email: [email protected]

    Registry server Password: <<non-empty>>

    error: build error: Failed to push image: Get https://docker-registry.default.svc:5000/v1/_ping: dial tcp: lookup docker-registry.default.svc on 172.20.30.2:53: no such host

    I checked that the failed build POD was on node1. So I logged in to Node 1 and tried to login to the registry:

    And got the below message:

    Error response from daemon: Get https://docker-registry.default.svc:5000/v1/users/: dial tcp: lookup docker-registry.default.svc on 172.20.30.2:53: no such host

    Adding a line for the registry to /etc/hosts of Node 1 resolved the problem:

     

     
  • Serdar Osman Onur 8:45 am on October 26, 2018 Permalink | Reply
    Tags: , , ,   

    gave up on Build for BuildConfig tybsdev/basvuru-arayuz (0) due to fatal error: the LastVersion(1) on build config xxx does not match the build request LastVersion(0) 

    OpenShift Builder POD Failed with no error message in oc logs -f pod_name output.

    **oc get events on the master showed this:

    Type: Warning
    Reason:BuildConfigInstantiateFailed
    Source: buildconfig-controller
    Message: gave up on Build for BuildConfig tybsdev/basvuru-arayuz (0) due to fatal error: the LastVersion(1) on build config tybsdev/basvuru-arayuz does not match the build request LastVersion(0)

    **oc describe pod said this:
    Events:
    FirstSeen LastSeen Count From SubObjectPath Type Reason Message
    ——— ——– —– —- ————- ——– —— ——-
    25m 25m 1 default-scheduler Normal Scheduled Successfully assigned basvuru-arayuz-1-build to tybsrhosnode02.defence.local
    <invalid> <invalid> 1 kubelet, tybsrhosnode02.defence.local spec.containers{sti-build} Normal Pulled Container image “openshift3/ose-sti-builder:v3.6.173.0.21” already present on machine
    <invalid> <invalid> 1 kubelet, tybsrhosnode02.defence.local spec.containers{sti-build} Normal Created Created container
    <invalid> <invalid> 1 kubelet, tybsrhosnode02.defence.local spec.containers{sti-build} Normal Started Started container

     

    **oc get pods -o wide showed that the build pod was scheduled on node2

    node2 showed no problems:
    **[[email protected] ~]# oc describe node tybsrhosnode02.defence.local
    Name: tybsrhosnode02.defence.local
    Role:
    Labels: beta.kubernetes.io/arch=amd64
    beta.kubernetes.io/os=linux
    kubernetes.io/hostname=tybsrhosnode02.defence.local
    logging-infra-fluentd=true
    region=primary
    Annotations: volumes.kubernetes.io/controller-managed-attach-detach=true
    Taints: <none>
    CreationTimestamp: Wed, 13 Sep 2017 14:16:02 +0300
    Phase:
    Conditions:
    Type Status LastHeartbeatTime LastTransitionTime Reason Message
    —- —— —————– —————— —— ——-
    OutOfDisk False Fri, 26 Oct 2018 11:53:16 +0300 Wed, 10 Oct 2018 20:09:05 +0300 KubeletHasSufficientDisk kubelet has sufficient disk space available
    MemoryPressure False Fri, 26 Oct 2018 11:53:16 +0300 Wed, 10 Oct 2018 20:09:05 +0300 KubeletHasSufficientMemory kubelet has sufficient memory available
    DiskPressure False Fri, 26 Oct 2018 11:53:16 +0300 Wed, 10 Oct 2018 20:09:05 +0300 KubeletHasNoDiskPressure kubelet has no disk pressure
    Ready True Fri, 26 Oct 2018 11:53:16 +0300 Wed, 10 Oct 2018 20:08:54 +0300 KubeletReady kubelet is posting ready status
    Addresses: 172.20.30.224,172.20.30.224,tybsrhosnode02.defence.local
    Capacity:
    cpu: 8
    memory: 131865388Ki
    pods: 80
    Allocatable:
    cpu: 6
    memory: 125618988Ki
    pods: 80
    System Info:
    Machine ID: dfeb0732c1464538abc9eab4169868cf
    System UUID: 42184533-35FC-C47E-B84F-223AE30C8645
    Boot ID: e9e34a15-7d8e-4cfc-b995-3871f849f1d3
    Kernel Version: 3.10.0-693.2.1.el7.x86_64
    OS Image: OpenShift Enterprise
    Operating System: linux
    Architecture: amd64
    Container Runtime Version: docker://1.12.6
    Kubelet Version: v1.6.1+5115d708d7
    Kube-Proxy Version: v1.6.1+5115d708d7
    ExternalID: tybsrhosnode02.defence.local
    Non-terminated Pods: (11 in total)
    Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
    ——— —- ———— ———- ————— ————-
    amq broker-drainer-2-19w50 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    bpm-test proj1-1-h2kbs 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    logging logging-fluentd-tznxt 100m (1%) 100m (1%) 512Mi (0%) 512Mi (0%)
    process-server a1501-bpm-app-postgresql-4-bsrpv 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    sso sso-postgresql-1-gqxfr 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    tybsdev bilgi-edinme-yonetimi-1-km0dh 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    tybsdev infra-test-1-c3p9q 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    tybsdev komite-yonetimi-arayuz-1-pb11l 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    tybsdev panel-yonetimi-arayuz-1-393cr 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    tybsdev program-cagri-yonetimi-arayuz-1-x7q16 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    tybsdev teydeb-1-jt82x 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Allocated resources:
    (Total limits may be over 100 percent, i.e., overcommitted.)
    CPU Requests CPU Limits Memory Requests Memory Limits
    ———— ———- ————— ————-
    100m (1%) 100m (1%) 512Mi (0%) 512Mi (0%)
    Events: <none>

     

    SSHed into node2
    Tried to fetch the builder image manually from node2
    **docker pull docker-registry.default.svc:5000/openshift3/ose-sti-builder:v3.6.173.0.21

    it said:
    Trying to pull repository docker-registry.default.svc:5000/openshift3/ose-sti-builder …
    unable to retrieve auth token: 401 unauthorized

     

    Tried to pull the application’s image
    **docker pull docker-registry.default.svc:5000/tybsdev/basvuru-arayuz

    it said:
    Using default tag: latest
    Trying to pull repository docker-registry.default.svc:5000/tybsdev/basvuru-arayuz …
    unable to retrieve auth token: 401 unauthorized

    I loggedin to the registry:
    **docker login -u admin -p $(oc whoami -t) docker-registry.default.svc:5000

    Tried to pull the image again:
    **docker pull docker-registry.default.svc:5000/tybsdev/basvuru-arayuz

    It said:
    Using default tag: latest
    Trying to pull repository docker-registry.default.svc:5000/tybsdev/basvuru-arayuz …
    manifest unknown: manifest unknown

    Did the same for builder image:

    ** docker pull docker-registry.default.svc:5000/openshift3/ose-sti-builder:v3.6.173.0.21

    it said:
    Trying to pull repository docker-registry.default.svc:5000/openshift3/ose-sti-builder …
    manifest unknown: manifest unknown

     

     

    Deleted the failed pod and re-run the paremeterized jenkins pipeline for deploying this application.
    Failed again.
    ** oc get events still displays the original problem.

     

    Exported build configuration. Contents are below:

    ** oc export bc basvuru-arayuz
    apiVersion: v1
    kind: BuildConfig
    metadata:
    annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
    {“apiVersion”:”v1″,”kind”:”BuildConfig”,”metadata”:{“annotations”:{“openshift.io/generated-by”:”OpenShiftNewApp”},”creationTimestamp”:null,”labels”:{“app”:”basvuru-arayuz”,”template”:”tybs-s2i-newapp-template”},”name”:”basvuru-arayuz”,”namespace”:”tybsdev”},”spec”:{“nodeSelector”:null,”output”:{“to”:{“kind”:”ImageStreamTag”,”name”:”basvuru-arayuz:latest”}},”postCommit”:{},”resources”:{},”source”:{“contextDir”:”dev”,”git”:{“ref”:”develop”,”uri”:”http://serdar.onur:[email protected]:7990/scm/tybs/tybs_code.git&#8221;},”type”:”Git”},”strategy”:{“sourceStrategy”:{“env”:[{“name”:”ARTIFACT_COPY_ARGS”,”value”:”*-SNAPSHOT.jar”},{“name”:”ARTIFACT_DIR”,”value”:”basvuru/basvuru-arayuz/target/”},{“name”:”MAVEN_ARGS”,”value”:”package -Dfabric8.skip=true -Ddb=postgres -DskipTests=true -pl basvuru/basvuru-arayuz –also-make”},{“name”:”MAVEN_MIRROR_URL”,”value”:”http://192.168.63.121:8081/repository/maven-public/&#8221;}],”from”:{“kind”:”ImageStreamTag”,”name”:”fis-java-openshift:latest”,”namespace”:”openshift”}},”type”:”Source”},”triggers”:[{“github”:{“secret”:”wjj3wH0ppJ9nNc4lQC6_”},”type”:”GitHub”},{“generic”:{“secret”:”jZEi04f0yjPpkDYbaxR4″},”type”:”Generic”},{“type”:”ConfigChange”},{“imageChange”:{},”type”:”ImageChange”}]},”status”:{“lastVersion”:0}}
    openshift.io/generated-by: OpenShiftNewApp
    creationTimestamp: null
    labels:
    app: basvuru-arayuz
    template: tybs-s2i-newapp-template
    name: basvuru-arayuz
    spec:
    nodeSelector: null
    output:
    to:
    kind: ImageStreamTag
    name: basvuru-arayuz:latest
    postCommit: {}
    resources: {}
    runPolicy: Serial
    source:
    contextDir: dev
    git:
    ref: develop
    uri: http://serdar.onur:[email protected]:7990/scm/tybs/tybs_code.git
    type: Git
    strategy:
    sourceStrategy:
    env:

    • name: ARTIFACT_COPY_ARGS

    value: ‘*-SNAPSHOT.jar’

    • name: ARTIFACT_DIR

    value: basvuru/basvuru-arayuz/target/

    • name: MAVEN_ARGS

    value: package -Dfabric8.skip=true -Ddb=postgres -DskipTests=true -pl basvuru/basvuru-arayuz
    –also-make

    • name: MAVEN_MIRROR_URL

    value: http://192.168.63.121:8081/repository/maven-public/
    from:
    kind: ImageStreamTag
    name: fis-java-openshift:latest
    namespace: openshift
    type: Source
    triggers:

    • github:

    secret: wjj3wH0ppJ9nNc4lQC6_
    type: GitHub

    • generic:

    secret: jZEi04f0yjPpkDYbaxR4
    type: Generic

    • type: ConfigChange
    • imageChange: {}

    type: ImageChange
    status:
    lastVersion: 0

     

     
    • Serdar Osman Onur 7:30 am on October 30, 2018 Permalink | Reply

      Red Hat Response

      Login the registry with the cluster-admin user.

      docker -D login -u $(oc whoami) -p $(oc whoami -t) docker-registry.default.svc:5000

      After successful login

      docker pull openshift3/ose-sti-builder

      > I think we need to solve the “manifest unknown: manifest unknown”. I researched but could not find a useful post on the net.
      This error message means this image is not available in the registry or tag is missing.

      Verify by using docker search # docker search registry.access.redhat.com/openshift3/ose-sti-builder

      For build issue please increase the build log level and capture the builder pod logs

      oc start-build –build-loglevel=5
      oc logs -f

  • Serdar Osman Onur 11:44 am on August 13, 2018 Permalink | Reply
    Tags: , ,   

    Taking Config Files Outside of a POD – OpenShift

    There are configuration files that affect the way your application works and behaves. These files get deployed together with your application. So, when you deploy your application (in this case Red Hat SSO) these config files will also be deployed inside a POD. If you want to edit your configuration you will need to rsh into your pod and make changes to these configuration files. How about your PODs being destroyed and re-created on another node? What happens to the changes in your configuration files? They are gone!

    There are a couple of alternative approaches you can follow here. If you use a configmap or mount a PV, in both cases they become a part of the “DC” and when a pod is destroyed & re-created it will keep using the configmap or refer to the mounted PV. You get to keep any modifications you have made to your config files when a pod gets destroyed and re-created.

    You can use configmaps/secrets

    Using config maps is “like” mounting a volume to your POD.

    my-conf]# oc create configmap my-conf –from-file=. –dry-run -o yaml
    oc set volume dc/my-dc –configmap-name my-conf –mount-path /test –add=true

    You can mount secrets in a similar way:
    oc create secret generic my-secret –from-file=. –dry-run -o yaml
    oc set volume dc/my-dc –secret-name my-secret –mount-path /test

    In this case you will need to use “oc edit” command to make changes to your configmaps but the problem is, in order for these changes to be reflected in your running application, you will need to re-deploy it (this is what the Red Hat support wrote back to me…).

    You can use PersistentVolumes

    In this scenario, you need to create a PersistentVolume, create a PersistentVolumeClaim and bind the POD to the PV using this claim.

    You PV needs to include the config files that you want to use. A way to go about this could be:

    a) Copy all the files in your config directory to the PV
    b) Mount the PV to your config directory (inside your POD)

    Be Careful! You need to do a) before b) otherwise you will lose all the files and folders inside the config directory of your POD. The good thing about PersistentVolume usage is that you don’t need to re-deploy your PODs to your OpenShift cluster.

     
  • Serdar Osman Onur 7:52 am on August 2, 2018 Permalink | Reply
    Tags: , , , ,   

    Red Hat SSO 7.1 Persistent Application Template Deployment on OpenShift Failed

    I was having a problem deploying the persistent (PostgreSQL) red hat sso 7.1 application on OpenShift. For some reason, my postgresql pod was being stuck at ContainerCreating state. I saw the below message when I described sso-postgresql pod:

    FailedMount Unable to mount volumes for pod “sso-postgresql-1-3gjgf_tybsdev(b652abc6-9002-11e8-a82a-0050569897ab)”: timeout expired waiting for volumes to attach/mount for pod “tybsdev”/”sso-postgresql-1-3gjgf”. list of unattached/unmounted volumes=[sso-postgresql-pvol]

    “mount.nfs: Connection refused ”

    I thought the problem was about my PV/PVC configurations. I checked them and they seemed alright. I tried changing the accessMode of the related PV. I changed it from ReadWriteOnce to ReadWriteMany just to try and it didn’t work.

    Then I check the NFS service on the NFS server “systemctl status nfs”
    NFS service was stopped!

    I started the NFS service and I changed accessMode of PV back to ReadWriteOnce, re-started the installation process. It worked!

     
  • Serdar Osman Onur 7:19 am on June 21, 2018 Permalink | Reply
    Tags: , , ,   

    OpenShift POD in CrashLoopBackOff State

    *OpenShift V3.6
    From time to time PODs in an OCP cluster can be stuck in CrashLoopBackOff state. There are various reasons for this. Here I will talk about an exceptional case to be stuck in this CrashLoopBackOff state.

    I opened a support ticket about this and I had a remote session to solve the problem together with a Red Hat support personnel.

    The thing was, somehow, at some point, for an unknown reason (possibilities are network issues, proxy issues etc.), this exceptional state was created and the node that this pod was being scheduled to did not get the COMPLETE IMAGE to be used for this deployment. There was a missing layer! Once that missing layer was manually pulled inside the failing NODE, the problem was gone and the POD was up & running again.

    There are 2 things to be done after SSHing to the target NODE.
    1- Login to the DOCKER REGISTRY
    docker login -u admin -p $(oc whoami -t) docker-registry.default.svc:5000

    2-Manually pull the image
    docker pull docker-registry.default.svc:5000/tybsdev/yazi-sablon-arayuz

    In step 2 you will see the missing layer being pulled from the registry.

     
  • Serdar Osman Onur 12:17 pm on June 6, 2018 Permalink | Reply
    Tags: , , ,   

    OpenShift – Basic Deployment Operations 

    Starting a deployment:(start a deployment manually)

    Viewing a deployment: (get basic information about all the available revisions of your application)

    Canceling a deployment: (cancel a running or stuck deployment process)

    Retrying a deployment: (retry a failed deployment)

    Rolling back a deployment: (If no revision is specified with --to-revision, then the last successfully deployed revision will be used)

     

    https://docs.openshift.com/container-platform/3.6/dev_guide/deployments/basic_deployment_operations.html

     
  • Serdar Osman Onur 6:33 am on June 1, 2018 Permalink | Reply
    Tags: , ,   

    OpenShift – All Compute Nodes are in NotReady State 

    I am having a problem with my cluster. I have 2 compute nodes and none of them are working.
    When I do “oc describe node node_name” I get the attached outputs for the 2 nodes.

    In the Events part the following caught my attention:

    1)
    NODE1:
    Type: Warning
    Reason: ContianerGCFailed
    Message: rpc error: code = 4 desc = context deadline exceeded

    2)
    NODE2:
    Reason: SystemOOM
    Message: System OOM Encountered

    Reason: ImageGCFailed
    Message: unable to find data for container

    Below are the “describe” outputs from the master for both compute nodes.

    describe-node1-28.05.2018

    describe-node2-28.05.2018

    I also attached the “sos reports” for both nodes. Below is the answer from Red Hat support.

    After some investigation, I figured It was a docker service problem caused by limited RAM resources. The problem was fixed by increasing the RAM for both compute nodes.

     
    • Serdar Osman Onur 6:34 am on June 1, 2018 Permalink | Reply

      Red Hat Support:

      Thank you for contacting Red Hat Support.

      I can see below messages in the logs :

      May 28 00:31:22 tybsrhosnode02.defence.local atomic-openshift-node[101303]: E0528 00:31:22.071759 101303 kuberuntime_manager.go:619] createPodSandbox for pod “fikri-hak-yonetimi-1-mv312_tybsdev(34e7c705-61f5-11e8-a82a-0050569897ab)” failed: rpc error: code = 4 desc = context deadline exceeded
      May 28 00:31:22 tybsrhosnode02.defence.local atomic-openshift-node[101303]: E0528 00:31:22.071794 101303 pod_workers.go:182] Error syncing pod 34e7c705-61f5-11e8-a82a-0050569897ab (“fikri-hak-yonetimi-1-mv312_tybsdev(34e7c705-61f5-11e8-a82a-0050569897ab)”), skipping: failed to “CreatePodSandbox” for “fikri-hak-yonetimi-1-mv312_tybsdev(34e7c705-61f5-11e8-a82a-0050569897ab)” with CreatePodSandboxError: “CreatePodSandbox for pod \”fikri-hak-yonetimi-1-mv312_tybsdev(34e7c705-61f5-11e8-a82a-0050569897ab)\” failed: rpc error: code = 4 desc = context deadline exceeded”
      May 28 00:31:22 tybsrhosnode02.defence.local atomic-openshift-node[101303]: E0528 00:31:22.128803 101303 remote_runtime.go:86] RunPodSandbox from runtime service failed: rpc error: code = 4 desc = context deadline exceeded
      May 28 00:31:22 tybsrhosnode02.defence.local atomic-openshift-node[101303]: E0528 00:31:22.128875 101303 kuberuntime_sandbox.go:54] CreatePodSandbox for pod “dosya-yonetimi-arayuz-1-zwjbz_tybsdev(331ac283-61f5-11e8-a82a-0050569897ab)” failed: rpc error: code = 4 desc = context deadline exceeded
      May 28 00:31:22 tybsrhosnode02.defence.local atomic-openshift-node[101303]: E0528 00:31:22.128893 101303 kuberuntime_manager.go:619] createPodSandbox for pod “dosya-yonetimi-arayuz-1-zwjbz_tybsdev(331ac283-61f5-11e8-a82a-0050569897ab)” failed: rpc error: code = 4 desc = context deadline exceeded
      May 28 00:31:22 tybsrhosnode02.defence.local atomic-openshift-node[101303]: E0528 00:31:22.128929 101303 pod_workers.go:182] Error syncing pod 331ac283-61f5-11e8-a82a-0050569897ab (“dosya-yonetimi-arayuz-1-zwjbz_tybsdev(331ac283-61f5-11e8-a82a-0050569897ab)”), skipping: failed to “CreatePodSandbox” for “dosya-yonetimi-arayuz-1-zwjbz_tybsdev(331ac283-61f5-11e8-a82a-0050569897ab)” with CreatePodSandboxError: “CreatePodSandbox for pod \”dosya-yonetimi-arayuz-1-zwjbz_tybsdev(331ac283-61f5-11e8-a82a-0050569897ab)\” failed: rpc error: code = 4 desc = context deadline exceeded”
      May 28 00:31:22 tybsrhosnode02.defence.local atomic-openshift-node[101303]: I0528 00:31:22.833265 101303 kubelet_node_status.go:410] Recording NodeNotReady event message for node tybsrhosnode02.defence.local

      May 28 00:31:22 tybsrhosnode02.defence.local atomic-openshift-node[101303]: I0528 00:31:22.833303 101303 kubelet_node_status.go:717] Node became not ready: {Type:Ready Status:False LastHeartbeatTime:2018-05-28 00:31:22.833243851 +0200 EET LastTransitionTime:2018-05-28 00:31:22.833243851 +0200 EET Reason:KubeletNotReady Message:PLEG is not healthy: pleg was last seen active 3m9.360485732s ago; threshold is 3m0s}

      It looks like your docker is either using more resources or is in hang state when this issue is observed.

      My recommendation would be upgrade to latest docker1.12 version.

      Also, when you see the issue again check if you are able to perform docker ps or any docker commands on the system.

      provide output of below command in the mean time to check if docker socket returns data ?

      curl –unix-socket /var/run/docker.sock http:/containers/json | python -mjson.tool

      if mjson tool is not installed then,

      curl –unix-socket /var/run/docker.sock http:/containers/json

      # gcore -o /tmp/dockerd_core $(pidof dockerd-current)
      # gcore -o /tmp/containerd_core $(pidof docker-containerd-current)

      • Serdar Osman Onur 6:10 pm on June 11, 2018 Permalink | Reply

        In the end, the problems were gone away after I increased the RAM and CPU dedicated to my compute nodes. Then I asked this:

        Is there a guideline to handle such cases where a node becomes NotReady?

        Is there a list of first steps to take in such situations for a quick diagnosis?

        Another question. I feel like my nodes are consuming RAM aggressively and RAM usage sometimes result in nodes being NotReady. How can I check if something is wrong with the RAM consumption of my nodes/pods?

        Red Hat Support
        —–
        I feel like my nodes are consuming RAM aggressively and RAM usage sometimes result in nodes being NotReady.
        ———–

        — Yes, this could be one of the reason. Perhaps you can increase the RAM of the node according to your need.

        —-
        Is there a list of first steps to take in such situations for a quick diagnosis?
        ———-
        >> 1. Docker, atomic-openshift-node, dnsmasq daemon must be running. If any service from this failed, then the node can turn into NotReady.
        >> 2. Also, DNS configuration should be correct in place. For this, I am attaching one article[1].
        >> Ensure the disk and memory pressure is within a limit.

        — You can also limit the number of pods scheduling on the node. Also, configuring limit ranges[2] would help you to manage the resource utilization efficiency.

        —-
        How can I check if something is wrong with the RAM consumption of my nodes/pods?
        ———-
        — There is no concrete method but yes if we configure the limit ranges properly then the pods will never try to reach or exceed beyond the limits. Also, set the replica count for pods as per the need only.

        [1] https://access.redhat.com/solutions/3343511
        [2] https://docs.openshift.com/container-platform/3.6/admin_guide/limits.html#overview

  • Serdar Osman Onur 2:41 pm on April 26, 2018 Permalink | Reply
    Tags: docker, ,   

    Could not transfer artifact "x" from/to mirror. Failed to transfer file. Return code is: 500. 

    Could not transfer artifact … from/to mirror.default (…/repository/maven-public/): Failed to transfer file: … Return code is: 500.

    Error occurred while executing a write operation to database ‘component’ due to limited free space on the disk (219 MB). The database is now working in read-only mode. Please close the database (or stop OrientDB), make room on your hard drive and then reopen the database. The minimal required space is 256 MB.

    This error caused our Jenkins pipelines and OpenShift builds fail. Apparently, the VM that hosted the docker container for nexus was out of disk space. We increased the disk space and the problem is gone now.

     
  • Serdar Osman Onur 6:36 am on April 20, 2018 Permalink | Reply
    Tags: , ,   

    What are OpenShift Node Selectors and Pod Selectors

    Node selectors are parameters that you can use in OpenShift cli that helps you target some or specific nodes.

    Pod selectors are parameters that you can use in OpenShift cli that helps you target some or specific pods.

    Not all the cli actions/commands require the use of these selectors. But some cli operations/command may require both selectors to be used.

    Consider the command below. This can be used to select specific pods on specific nodes:

    $ oc adm manage-node –selector= –list-pods [–pod-selector=] [-o json|yaml]

    An example would be something like below:

    oc adm manage-node –selector=region=primary –list-pods –pod-selector=app=basvuru-arayuz

    In this example region=primary is a label that I used on my cluster’s schedulable nodes.
    “app” is a label that I use for my deployments. Each bc/dc/service/route/pod will have this “app” label. In the example above “basvuru-arayuz” is an application name for one of my deployments.

    This example command will list all the pods related to “basvuru-arayuz” application deployed on schedulable nodes.

     
  • Serdar Osman Onur 10:28 am on April 18, 2018 Permalink | Reply
    Tags: , ,   

    oc-describe-node1 oc-describe-node2 oc-describe-pod-proj1-1-build

    No nodes are available that match all of the following predicates:: CheckServiceAffinity (1), Insufficient pods (1), MatchNodeSelector (1).

    Hi,

    I am trying to deploy a new BPM application (proj1) on OpenShift. The thing is, build pod is stuck at pending state and in the Events section is says this:

    “No nodes are available that match all of the following predicates:: CheckServiceAffinity (1), Insufficient pods (1), MatchNodeSelector (1).”

    I have 2 compute nodes in my cluster labeled as “region=primary”.

    oc get nodes prints this:

    NAME STATUS AGE VERSION
    tybsrhosinode01.defence.local Ready 215d v1.6.1+5115d708d7
    tybsrhosmaster01.defence.local Ready,SchedulingDisabled 215d v1.6.1+5115d708d7
    tybsrhosnode01.defence.local Ready 215d v1.6.1+5115d708d7
    tybsrhosnode02.defence.local Ready 215d v1.6.1+5115d708d7

    oc get pods prints this:

    NAME READY STATUS RESTARTS AGE
    basvuru-arayuz-1-ddwbs 1/1 Running 0 3d
    hakem-yonetimi-arayuz-1-0x379 1/1 Running 0 3d
    infra-test-6-h02nb 0/1 Pending 0 22h
    program-cagri-yonetimi-arayuz-1-cmcbb 1/1 Running 0 3d
    proj1-1-build 0/1 Pending 0 1h
    proj1-postgresql-1-deploy 0/1 Pending 0 1h

    I am attaching the outputs for oc describe node/node1, oc describe node/node2 and oc describe pod/proj1-1-build.

    I would like to solve this problem asap since I have a demo coming up.

    Thanks

    Where are you experiencing the behavior? What environment?

    This is our development environment.

    When does the behavior occur? Frequently? Repeatedly? At certain times?

    Never happened before.

    What information can you provide around timeframes and the business impact?

    I have a demo coming up so I would like to get this fixed asap.

    ************* ******************* ***************** ****************

    More info:

    “No nodes are available that match all of the following predicates:: CheckServiceAffinity (1), Insufficient pods (1), MatchNodeSelector (1).”

    I would expect it to say:

    “No nodes are available that match all of the following predicates:: CheckServiceAffinity (2), Insufficient pods (2), MatchNodeSelector (2).” since I have 2 compute nodes. I feel like one of the nodes is not considered at all.

    Another thing, 1 of the compute nodes (node1) seems to be out of disk:

    Filesystem 1M-blocks Used Available Use% Mounted on
    /dev/mapper/rhel-root 46058 2081 43978 5% /
    devtmpfs 3901 0 3901 0% /dev
    tmpfs 3912 0 3912 0% /dev/shm
    tmpfs 3912 9 3903 1% /run
    tmpfs 3912 0 3912 0% /sys/fs/cgroup
    /dev/sda1 1014 185 830 19% /boot
    /dev/mapper/rhel-tmp 1014 33 982 4% /tmp
    /dev/mapper/rhel-var 15350 15337 14 100% /var
    /dev/mapper/rhel-usr_local_bin 1014 33 982 4% /usr/local/bin
    tmpfs 783 0 783 0% /run/user/0

    ****************** ******************* ***************** ***************

    Further info:

    Further info:

    I deleted all the other apps in the target namespace and this time build pod was successfully scheduled.

    1- Could this be a port issue?
    How should I manage the pods of my applications deployed on a namespace? Is it possible to have this kind of clashes?

    2- After finalizing clarification on “1”, we should still think about the (1) number in the error log instead of (2). I have 2 noıdes, it should have been (2).

    *************** ***************** ****************** ************** **************

    Red Hat Response

    From the attachments provided the following can be seen from the “tybsrhosnode01.defence.local” node

    —>

    Conditions:
    Type Status LastHeartbeatTime LastTransitionTime Reason Message
    —- —— —————– —————— —— ——-
    OutOfDisk True Mon, 16 Apr 2018 15:03:15 +0300 Mon, 16 Apr 2018 11:45:56 +0300 KubeletOutOfDisk out of disk space
    MemoryPressure False Mon, 16 Apr 2018 15:03:15 +0300 Mon, 16 Apr 2018 11:45:56 +0300 KubeletHasSufficientMemory kubelet has sufficient memory available

    —>

    Here the node clearly seems out of disk space.

    Hence, deleting other applications, which were scheduled on that node made disk space for the build pod “proj1-1-build”

    When we run:

    $ oc get pods -o wide

    we can see which pod is scheduled on which node.

    I will answer you questions one by one:

    1- Could this be a port issue?
    How should I manage the pods of my applications deployed on a namespace? Is it possible to have this kind of clashes?

    No, this is not a port issue, but due to unavailability of resources on the node to schedule a pod. Pods in a project can be scheduled on a node using the parameters in schedular.json file.

    $ vi /etc/origin/master/scheduler.json

    2- After finalizing clarification on “1”, we should still think about the (1) number in the error log instead of (2). I have 2 noıdes, it should have been (2).

    The below error message means:

    ——— ——– —– —- ————- ——– —— ——-
    1h 2m 208 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: CheckServiceAffinity (1), Insufficient pods (1), MatchNodeSelector (1).

    –>

    No nodes are available that match all of the following predicates:: CheckServiceAffinity (1 node failed), Insufficient pods (1 node failed), MatchNodeSelector (1 node failed).

    This is because of the node “tybsrhosnode01.defence.local” which does not meet the requirement to schedule a pod.

    My response

    Yeah, while checking out the output of describe command I realized the same thing. Obviously, there is a disk space problem with “node 1”.
    Deleting existing application is out of question, so I think I need to add more disk space to this VM.

    What I don’t understand is, I have “2” compute nodes. node1 and node2. I see node 1 is out of space but what about node 2. Why is it not used?
    The error message should have said “CheckServiceAffinity (2), Insufficient pods (2), MatchNodeSelector (2).” since it cannot schedule the pod in either of the 2 nodes.

    ***** *************** ************* ************

    Solution and Conclusion

    We saw that the build was failing with the following error:

    No nodes are available that match all of the following predicates:: CheckServiceAffinity (1), Insufficient pods (1), MatchNodeSelector (1).

    We checked both the nodes, node1 and node2.

    node1 – had full disk space and no pods were deployed on it. As we checked further, the /var/log folder was populated with the messages and the log rotated messages.

    All the above led to no pods being deployed on node1.

    node2 – Why the pods were unable to be deployed on node2 was now a question for us. We checked the node2 description and the following was the reason for that:

    Capacity:
    cpu: 1
    memory: 8010972Ki
    pods: 10
    Allocatable:
    cpu: 1
    memory: 7908572Ki
    pods: 10

    The node-2 already had 10 pods allocated so no pods were being scheduled on that node.

    Clearing your doubt about the error message:

    No nodes are available that match all of the following predicates:: CheckServiceAffinity (1), Insufficient pods (1), MatchNodeSelector (1)

    The 1 in the brackets is solely related to node1 as over here node2 is not considered as it already has allocated 10 pods.

    As mentioned earlier, 1 means 1 node failed. Over that node is node1.

     
c
Compose new post
j
Next post/Next comment
k
Previous post/Previous comment
r
Reply
e
Edit
o
Show/Hide comments
t
Go to top
l
Go to login
h
Show/Hide help
shift + esc
Cancel