Recent Updates Toggle Comment Threads | Keyboard Shortcuts

  • Serdar Osman Onur 11:44 am on August 13, 2018 Permalink | Reply
    Tags: , ,   

    Taking Config Files Outside of a POD – OpenShift

    There are configuration files that affect the way your application works and behaves. These files get deployed together with your application. So, when you deploy your application (in this case Red Hat SSO) these config files will also be deployed inside a POD. If you want to edit your configuration you will need to rsh into your pod and make changes to these configuration files. How about your PODs being destroyed and re-created on another node? What happens to the changes in your configuration files? They are gone!

    There are a couple of alternative approaches you can follow here. If you use a configmap or mount a PV, in both cases they become a part of the “DC” and when a pod is destroyed & re-created it will keep using the configmap or refer to the mounted PV. You get to keep any modifications you have made to your config files when a pod gets destroyed and re-created.

    You can use configmaps/secrets

    Using config maps is “like” mounting a volume to your POD.

    my-conf]# oc create configmap my-conf –from-file=. –dry-run -o yaml
    oc set volume dc/my-dc –configmap-name my-conf –mount-path /test –add=true

    You can mount secrets in a similar way:
    oc create secret generic my-secret –from-file=. –dry-run -o yaml
    oc set volume dc/my-dc –secret-name my-secret –mount-path /test

    In this case you will need to use “oc edit” command to make changes to your configmaps but the problem is, in order for these changes to be reflected in your running application, you will need to re-deploy it (this is what the Red Hat support wrote back to me…).

    You can use PersistentVolumes

    In this scenario, you need to create a PersistentVolume, create a PersistentVolumeClaim and bind the POD to the PV using this claim.

    You PV needs to include the config files that you want to use. A way to go about this could be:

    a) Copy all the files in your config directory to the PV
    b) Mount the PV to your config directory (inside your POD)

    Be Careful! You need to do a) before b) otherwise you will lose all the files and folders inside the config directory of your POD. The good thing about PersistentVolume usage is that you don’t need to re-deploy your PODs to your OpenShift cluster.

     
  • Serdar Osman Onur 7:52 am on August 2, 2018 Permalink | Reply
    Tags: , , , ,   

    Red Hat SSO 7.1 Persistent Application Template Deployment on OpenShift Failed

    I was having a problem deploying the persistent (PostgreSQL) red hat sso 7.1 application on OpenShift. For some reason, my postgresql pod was being stuck at ContainerCreating state. I saw the below message when I described sso-postgresql pod:

    FailedMount Unable to mount volumes for pod “sso-postgresql-1-3gjgf_tybsdev(b652abc6-9002-11e8-a82a-0050569897ab)”: timeout expired waiting for volumes to attach/mount for pod “tybsdev”/”sso-postgresql-1-3gjgf”. list of unattached/unmounted volumes=[sso-postgresql-pvol]

    “mount.nfs: Connection refused ”

    I thought the problem was about my PV/PVC configurations. I checked them and they seemed alright. I tried changing the accessMode of the related PV. I changed it from ReadWriteOnce to ReadWriteMany just to try and it didn’t work.

    Then I check the NFS service on the NFS server “systemctl status nfs”
    NFS service was stopped!

    I started the NFS service and I changed accessMode of PV back to ReadWriteOnce, re-started the installation process. It worked!

     
  • Serdar Osman Onur 7:19 am on June 21, 2018 Permalink | Reply
    Tags: , , ,   

    OpenShift POD in CrashLoopBackOff State

    *OpenShift V3.6
    From time to time PODs in an OCP cluster can be stuck in CrashLoopBackOff state. There are various reasons for this. Here I will talk about an exceptional case to be stuck in this CrashLoopBackOff state.

    I opened a support ticket about this and I had a remote session to solve the problem together with a Red Hat support personnel.

    The thing was, somehow, at some point, for an unknown reason (possibilities are network issues, proxy issues etc.), this exceptional state was created and the node that this pod was being scheduled to did not get the COMPLETE IMAGE to be used for this deployment. There was a missing layer! Once that missing layer was manually pulled inside the failing NODE, the problem was gone and the POD was up & running again.

    There are 2 things to be done after SSHing to the target NODE.
    1- Login to the DOCKER REGISTRY
    docker login -u admin -p $(oc whoami -t) docker-registry.default.svc:5000

    2-Manually pull the image
    docker pull docker-registry.default.svc:5000/tybsdev/yazi-sablon-arayuz

    In step 2 you will see the missing layer being pulled from the registry.

     
  • Serdar Osman Onur 12:17 pm on June 6, 2018 Permalink | Reply
    Tags: , , ,   

    OpenShift – Basic Deployment Operations 

    Starting a deployment:(start a deployment manually)

    Viewing a deployment: (get basic information about all the available revisions of your application)

    Canceling a deployment: (cancel a running or stuck deployment process)

    Retrying a deployment: (retry a failed deployment)

    Rolling back a deployment: (If no revision is specified with --to-revision, then the last successfully deployed revision will be used)

     

    https://docs.openshift.com/container-platform/3.6/dev_guide/deployments/basic_deployment_operations.html

     
  • Serdar Osman Onur 6:33 am on June 1, 2018 Permalink | Reply
    Tags: , ,   

    OpenShift – All Compute Nodes are in NotReady State 

    I am having a problem with my cluster. I have 2 compute nodes and none of them are working.
    When I do “oc describe node node_name” I get the attached outputs for the 2 nodes.

    In the Events part the following caught my attention:

    1)
    NODE1:
    Type: Warning
    Reason: ContianerGCFailed
    Message: rpc error: code = 4 desc = context deadline exceeded

    2)
    NODE2:
    Reason: SystemOOM
    Message: System OOM Encountered

    Reason: ImageGCFailed
    Message: unable to find data for container

    Below are the “describe” outputs from the master for both compute nodes.

    describe-node1-28.05.2018

    describe-node2-28.05.2018

    I also attached the “sos reports” for both nodes. Below is the answer from Red Hat support.

    After some investigation, I figured It was a docker service problem caused by limited RAM resources. The problem was fixed by increasing the RAM for both compute nodes.

     
    • Serdar Osman Onur 6:34 am on June 1, 2018 Permalink | Reply

      Red Hat Support:

      Thank you for contacting Red Hat Support.

      I can see below messages in the logs :

      May 28 00:31:22 tybsrhosnode02.defence.local atomic-openshift-node[101303]: E0528 00:31:22.071759 101303 kuberuntime_manager.go:619] createPodSandbox for pod “fikri-hak-yonetimi-1-mv312_tybsdev(34e7c705-61f5-11e8-a82a-0050569897ab)” failed: rpc error: code = 4 desc = context deadline exceeded
      May 28 00:31:22 tybsrhosnode02.defence.local atomic-openshift-node[101303]: E0528 00:31:22.071794 101303 pod_workers.go:182] Error syncing pod 34e7c705-61f5-11e8-a82a-0050569897ab (“fikri-hak-yonetimi-1-mv312_tybsdev(34e7c705-61f5-11e8-a82a-0050569897ab)”), skipping: failed to “CreatePodSandbox” for “fikri-hak-yonetimi-1-mv312_tybsdev(34e7c705-61f5-11e8-a82a-0050569897ab)” with CreatePodSandboxError: “CreatePodSandbox for pod \”fikri-hak-yonetimi-1-mv312_tybsdev(34e7c705-61f5-11e8-a82a-0050569897ab)\” failed: rpc error: code = 4 desc = context deadline exceeded”
      May 28 00:31:22 tybsrhosnode02.defence.local atomic-openshift-node[101303]: E0528 00:31:22.128803 101303 remote_runtime.go:86] RunPodSandbox from runtime service failed: rpc error: code = 4 desc = context deadline exceeded
      May 28 00:31:22 tybsrhosnode02.defence.local atomic-openshift-node[101303]: E0528 00:31:22.128875 101303 kuberuntime_sandbox.go:54] CreatePodSandbox for pod “dosya-yonetimi-arayuz-1-zwjbz_tybsdev(331ac283-61f5-11e8-a82a-0050569897ab)” failed: rpc error: code = 4 desc = context deadline exceeded
      May 28 00:31:22 tybsrhosnode02.defence.local atomic-openshift-node[101303]: E0528 00:31:22.128893 101303 kuberuntime_manager.go:619] createPodSandbox for pod “dosya-yonetimi-arayuz-1-zwjbz_tybsdev(331ac283-61f5-11e8-a82a-0050569897ab)” failed: rpc error: code = 4 desc = context deadline exceeded
      May 28 00:31:22 tybsrhosnode02.defence.local atomic-openshift-node[101303]: E0528 00:31:22.128929 101303 pod_workers.go:182] Error syncing pod 331ac283-61f5-11e8-a82a-0050569897ab (“dosya-yonetimi-arayuz-1-zwjbz_tybsdev(331ac283-61f5-11e8-a82a-0050569897ab)”), skipping: failed to “CreatePodSandbox” for “dosya-yonetimi-arayuz-1-zwjbz_tybsdev(331ac283-61f5-11e8-a82a-0050569897ab)” with CreatePodSandboxError: “CreatePodSandbox for pod \”dosya-yonetimi-arayuz-1-zwjbz_tybsdev(331ac283-61f5-11e8-a82a-0050569897ab)\” failed: rpc error: code = 4 desc = context deadline exceeded”
      May 28 00:31:22 tybsrhosnode02.defence.local atomic-openshift-node[101303]: I0528 00:31:22.833265 101303 kubelet_node_status.go:410] Recording NodeNotReady event message for node tybsrhosnode02.defence.local

      May 28 00:31:22 tybsrhosnode02.defence.local atomic-openshift-node[101303]: I0528 00:31:22.833303 101303 kubelet_node_status.go:717] Node became not ready: {Type:Ready Status:False LastHeartbeatTime:2018-05-28 00:31:22.833243851 +0200 EET LastTransitionTime:2018-05-28 00:31:22.833243851 +0200 EET Reason:KubeletNotReady Message:PLEG is not healthy: pleg was last seen active 3m9.360485732s ago; threshold is 3m0s}

      It looks like your docker is either using more resources or is in hang state when this issue is observed.

      My recommendation would be upgrade to latest docker1.12 version.

      Also, when you see the issue again check if you are able to perform docker ps or any docker commands on the system.

      provide output of below command in the mean time to check if docker socket returns data ?

      curl –unix-socket /var/run/docker.sock http:/containers/json | python -mjson.tool

      if mjson tool is not installed then,

      curl –unix-socket /var/run/docker.sock http:/containers/json

      # gcore -o /tmp/dockerd_core $(pidof dockerd-current)
      # gcore -o /tmp/containerd_core $(pidof docker-containerd-current)

      • Serdar Osman Onur 6:10 pm on June 11, 2018 Permalink | Reply

        In the end, the problems were gone away after I increased the RAM and CPU dedicated to my compute nodes. Then I asked this:

        Is there a guideline to handle such cases where a node becomes NotReady?

        Is there a list of first steps to take in such situations for a quick diagnosis?

        Another question. I feel like my nodes are consuming RAM aggressively and RAM usage sometimes result in nodes being NotReady. How can I check if something is wrong with the RAM consumption of my nodes/pods?

        Red Hat Support
        —–
        I feel like my nodes are consuming RAM aggressively and RAM usage sometimes result in nodes being NotReady.
        ———–

        — Yes, this could be one of the reason. Perhaps you can increase the RAM of the node according to your need.

        —-
        Is there a list of first steps to take in such situations for a quick diagnosis?
        ———-
        >> 1. Docker, atomic-openshift-node, dnsmasq daemon must be running. If any service from this failed, then the node can turn into NotReady.
        >> 2. Also, DNS configuration should be correct in place. For this, I am attaching one article[1].
        >> Ensure the disk and memory pressure is within a limit.

        — You can also limit the number of pods scheduling on the node. Also, configuring limit ranges[2] would help you to manage the resource utilization efficiency.

        —-
        How can I check if something is wrong with the RAM consumption of my nodes/pods?
        ———-
        — There is no concrete method but yes if we configure the limit ranges properly then the pods will never try to reach or exceed beyond the limits. Also, set the replica count for pods as per the need only.

        [1] https://access.redhat.com/solutions/3343511
        [2] https://docs.openshift.com/container-platform/3.6/admin_guide/limits.html#overview

  • Serdar Osman Onur 2:41 pm on April 26, 2018 Permalink | Reply
    Tags: docker, ,   

    Could not transfer artifact "x" from/to mirror. Failed to transfer file. Return code is: 500. 

    Could not transfer artifact … from/to mirror.default (…/repository/maven-public/): Failed to transfer file: … Return code is: 500.

    Error occurred while executing a write operation to database ‘component’ due to limited free space on the disk (219 MB). The database is now working in read-only mode. Please close the database (or stop OrientDB), make room on your hard drive and then reopen the database. The minimal required space is 256 MB.

    This error caused our Jenkins pipelines and OpenShift builds fail. Apparently, the VM that hosted the docker container for nexus was out of disk space. We increased the disk space and the problem is gone now.

     
  • Serdar Osman Onur 6:36 am on April 20, 2018 Permalink | Reply
    Tags: , ,   

    What are OpenShift Node Selectors and Pod Selectors

    Node selectors are parameters that you can use in OpenShift cli that helps you target some or specific nodes.

    Pod selectors are parameters that you can use in OpenShift cli that helps you target some or specific pods.

    Not all the cli actions/commands require the use of these selectors. But some cli operations/command may require both selectors to be used.

    Consider the command below. This can be used to select specific pods on specific nodes:

    $ oc adm manage-node –selector= –list-pods [–pod-selector=] [-o json|yaml]

    An example would be something like below:

    oc adm manage-node –selector=region=primary –list-pods –pod-selector=app=basvuru-arayuz

    In this example region=primary is a label that I used on my cluster’s schedulable nodes.
    “app” is a label that I use for my deployments. Each bc/dc/service/route/pod will have this “app” label. In the example above “basvuru-arayuz” is an application name for one of my deployments.

    This example command will list all the pods related to “basvuru-arayuz” application deployed on schedulable nodes.

     
  • Serdar Osman Onur 10:28 am on April 18, 2018 Permalink | Reply
    Tags: , ,   

    oc-describe-node1 oc-describe-node2 oc-describe-pod-proj1-1-build

    No nodes are available that match all of the following predicates:: CheckServiceAffinity (1), Insufficient pods (1), MatchNodeSelector (1).

    Hi,

    I am trying to deploy a new BPM application (proj1) on OpenShift. The thing is, build pod is stuck at pending state and in the Events section is says this:

    “No nodes are available that match all of the following predicates:: CheckServiceAffinity (1), Insufficient pods (1), MatchNodeSelector (1).”

    I have 2 compute nodes in my cluster labeled as “region=primary”.

    oc get nodes prints this:

    NAME STATUS AGE VERSION
    tybsrhosinode01.defence.local Ready 215d v1.6.1+5115d708d7
    tybsrhosmaster01.defence.local Ready,SchedulingDisabled 215d v1.6.1+5115d708d7
    tybsrhosnode01.defence.local Ready 215d v1.6.1+5115d708d7
    tybsrhosnode02.defence.local Ready 215d v1.6.1+5115d708d7

    oc get pods prints this:

    NAME READY STATUS RESTARTS AGE
    basvuru-arayuz-1-ddwbs 1/1 Running 0 3d
    hakem-yonetimi-arayuz-1-0x379 1/1 Running 0 3d
    infra-test-6-h02nb 0/1 Pending 0 22h
    program-cagri-yonetimi-arayuz-1-cmcbb 1/1 Running 0 3d
    proj1-1-build 0/1 Pending 0 1h
    proj1-postgresql-1-deploy 0/1 Pending 0 1h

    I am attaching the outputs for oc describe node/node1, oc describe node/node2 and oc describe pod/proj1-1-build.

    I would like to solve this problem asap since I have a demo coming up.

    Thanks

    Where are you experiencing the behavior? What environment?

    This is our development environment.

    When does the behavior occur? Frequently? Repeatedly? At certain times?

    Never happened before.

    What information can you provide around timeframes and the business impact?

    I have a demo coming up so I would like to get this fixed asap.

    ************* ******************* ***************** ****************

    More info:

    “No nodes are available that match all of the following predicates:: CheckServiceAffinity (1), Insufficient pods (1), MatchNodeSelector (1).”

    I would expect it to say:

    “No nodes are available that match all of the following predicates:: CheckServiceAffinity (2), Insufficient pods (2), MatchNodeSelector (2).” since I have 2 compute nodes. I feel like one of the nodes is not considered at all.

    Another thing, 1 of the compute nodes (node1) seems to be out of disk:

    Filesystem 1M-blocks Used Available Use% Mounted on
    /dev/mapper/rhel-root 46058 2081 43978 5% /
    devtmpfs 3901 0 3901 0% /dev
    tmpfs 3912 0 3912 0% /dev/shm
    tmpfs 3912 9 3903 1% /run
    tmpfs 3912 0 3912 0% /sys/fs/cgroup
    /dev/sda1 1014 185 830 19% /boot
    /dev/mapper/rhel-tmp 1014 33 982 4% /tmp
    /dev/mapper/rhel-var 15350 15337 14 100% /var
    /dev/mapper/rhel-usr_local_bin 1014 33 982 4% /usr/local/bin
    tmpfs 783 0 783 0% /run/user/0

    ****************** ******************* ***************** ***************

    Further info:

    Further info:

    I deleted all the other apps in the target namespace and this time build pod was successfully scheduled.

    1- Could this be a port issue?
    How should I manage the pods of my applications deployed on a namespace? Is it possible to have this kind of clashes?

    2- After finalizing clarification on “1”, we should still think about the (1) number in the error log instead of (2). I have 2 noıdes, it should have been (2).

    *************** ***************** ****************** ************** **************

    Red Hat Response

    From the attachments provided the following can be seen from the “tybsrhosnode01.defence.local” node

    —>

    Conditions:
    Type Status LastHeartbeatTime LastTransitionTime Reason Message
    —- —— —————– —————— —— ——-
    OutOfDisk True Mon, 16 Apr 2018 15:03:15 +0300 Mon, 16 Apr 2018 11:45:56 +0300 KubeletOutOfDisk out of disk space
    MemoryPressure False Mon, 16 Apr 2018 15:03:15 +0300 Mon, 16 Apr 2018 11:45:56 +0300 KubeletHasSufficientMemory kubelet has sufficient memory available

    —>

    Here the node clearly seems out of disk space.

    Hence, deleting other applications, which were scheduled on that node made disk space for the build pod “proj1-1-build”

    When we run:

    $ oc get pods -o wide

    we can see which pod is scheduled on which node.

    I will answer you questions one by one:

    1- Could this be a port issue?
    How should I manage the pods of my applications deployed on a namespace? Is it possible to have this kind of clashes?

    No, this is not a port issue, but due to unavailability of resources on the node to schedule a pod. Pods in a project can be scheduled on a node using the parameters in schedular.json file.

    $ vi /etc/origin/master/scheduler.json

    2- After finalizing clarification on “1”, we should still think about the (1) number in the error log instead of (2). I have 2 noıdes, it should have been (2).

    The below error message means:

    ——— ——– —– —- ————- ——– —— ——-
    1h 2m 208 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: CheckServiceAffinity (1), Insufficient pods (1), MatchNodeSelector (1).

    –>

    No nodes are available that match all of the following predicates:: CheckServiceAffinity (1 node failed), Insufficient pods (1 node failed), MatchNodeSelector (1 node failed).

    This is because of the node “tybsrhosnode01.defence.local” which does not meet the requirement to schedule a pod.

    My response

    Yeah, while checking out the output of describe command I realized the same thing. Obviously, there is a disk space problem with “node 1”.
    Deleting existing application is out of question, so I think I need to add more disk space to this VM.

    What I don’t understand is, I have “2” compute nodes. node1 and node2. I see node 1 is out of space but what about node 2. Why is it not used?
    The error message should have said “CheckServiceAffinity (2), Insufficient pods (2), MatchNodeSelector (2).” since it cannot schedule the pod in either of the 2 nodes.

    ***** *************** ************* ************

    Solution and Conclusion

    We saw that the build was failing with the following error:

    No nodes are available that match all of the following predicates:: CheckServiceAffinity (1), Insufficient pods (1), MatchNodeSelector (1).

    We checked both the nodes, node1 and node2.

    node1 – had full disk space and no pods were deployed on it. As we checked further, the /var/log folder was populated with the messages and the log rotated messages.

    All the above led to no pods being deployed on node1.

    node2 – Why the pods were unable to be deployed on node2 was now a question for us. We checked the node2 description and the following was the reason for that:

    Capacity:
    cpu: 1
    memory: 8010972Ki
    pods: 10
    Allocatable:
    cpu: 1
    memory: 7908572Ki
    pods: 10

    The node-2 already had 10 pods allocated so no pods were being scheduled on that node.

    Clearing your doubt about the error message:

    No nodes are available that match all of the following predicates:: CheckServiceAffinity (1), Insufficient pods (1), MatchNodeSelector (1)

    The 1 in the brackets is solely related to node1 as over here node2 is not considered as it already has allocated 10 pods.

    As mentioned earlier, 1 means 1 node failed. Over that node is node1.

     
  • Serdar Osman Onur 9:10 am on April 18, 2018 Permalink | Reply
    Tags: , , ,   

    Deploying BPM Processes on OpenShift

    When you create a Business Process in Red Hat BPM Suite, it will be living in a hierarchical structure similar to below:

    • Organizational Unit
      • Repository
        • Project
          • Business Process

    So, how do you deploy a Business Process on OpenShift? What is the unit of deployment?

    How do you deploy Business Process on Red Hat OpenShift?

    You can use S2I (source-to-image) process of OpenShift.
    1- You can deploy your projects using the quickstarts that already exist in the OpenShift catalog.
    2- You can create your own templates and you can use them instead for creating your “oc objects” inside your OpenShift Cluster.

    What is the unit of deployment in OpenShift?

    Unit of deployment is “project”. You deploy BPM “project”s on OpenShift, not individual business processes.
    Therefore, if your project has multiple business processes, then they will al scale together since they will all be residing in a single POD. If you want maximum modularity and scalability, you could consider building a single business process in a single project.

    This post is based on answers to a red hat support case.

     
  • Serdar Osman Onur 8:42 am on April 18, 2018 Permalink | Reply
    Tags: , ,   

    Updating Sub Processes – Red Hat JBoss BPM Suite

    Using a process as a sub-process in another one introduces some dependencies between the 2 BPM processes.

    If the child process (sub-process) is to be updated, following should be applied:

    • Implement changes in your child process
    • Increase the maven version of your child project
    • Build
    • Update parent’s pom.xml so it matches the new child maven version
    • Build & Deploy

    This post is based on answers to a red hat support case.

     
c
Compose new post
j
Next post/Next comment
k
Previous post/Previous comment
r
Reply
e
Edit
o
Show/Hide comments
t
Go to top
l
Go to login
h
Show/Hide help
shift + esc
Cancel