OpenShift Cluster LAB with Domaine Suffix. Local

Most of us use the .Local suffix in their LAB Domaine Name service, like my Domaine lab that I used always cloudz.local 🙂 this could be a serious problem in deployment of Kubernetes based cluster Redhat OpenShift or VMware Tanzu TKC cluster etc , I am writing this blog post specially because I faced this problem and it Tooks a lot of time and efforts to troubleshoot and detect that the root cause of making the cluster ready and was the Domaine suffix local  

I will not detail about the OpenShift Cluster installation (Redhat Kubernetes version) in the poste there is a loft of blog posts and Official documentation from Redahat but I suggest starting reproduce same lab personal blogs and flow it because official documentation is hardly for a beginner to flow and can lead to Misunderstanding of some topics but is mandatory for being guru , For bare metal installation suggest this blog for bare Metale or VMware cluster.

OpenShift Cluster OCP

I have created my cluster using only three 3x Nodes and used the control Nodes as worker node this became possible since version 4.10 and could be useful for PoC/Lab deployment scenarios. with a DNS and HA proxy load balancer, my cluster looks like this it is ready to use to be honest It takes me months to arrive to a ready cluster because of some internet and network and DNS misconfigurations (be aware a cluster must be 24 hours of creating the ignition configuration file)

                 my OpenShift cluster 3x Nodes looks like this

My Suprise was after trying to connect to the Cluster Console the web address and found it is not ready to server request, so I stared troubleshooting the cluster operator’s status

I found that most of operators are ready expect three operations that has errors Console, Ingress and Authentication I started diagnostic error. and noticed some DNS related error ‘lookup fail” and no such host and find that that OpenShift Pods cannot reach DNS resolution of my Outside infrastructure. and more Stange I can the DNS of google outside the internet!! the errors were like

Get “https://oa uth-openshift.apps……….l/healthz”: dial tcp: lookup oauth-console 4.14.7 False False False 23h RouteHealthAvailable: failed to GET route (https://console-openshift-c onsole.apps…….. ): Get “https://console-openshift onsole.apps.c….l“: dial tcp: lookup console-openshift-console.apps….. local: no such host

That lead me to a Deep troubleshoot inside the cluster pods and containers to understand the root cause of that issue of course I did a lot of google search for but I didn’t not find any direct answer from people even I post on Redhat community, but I think None has this mysterious situation since everything was configured correctly until I crash on this Blog of DNS Deep Drive in OpenShift and Official Kubernetes web site that explain the DNS customization and DNS forwarding in K8

In fact, Core DNS architecture has a local DNS zone that respond to Pods local DNS request and forward nonlocal DNS requests to the DNS server configured in of each pod /etc/resolv.conf ,my cluster did not forwarded the. Local request outside the Openshift CoreDNS because it has a local zone named cluster. Local that and my Infrastructure has cloud.local Domaine, hence in Core DNS all the *.local requests will not be forwarded to external DNS , nor responded from its local zone else explicitly add a forward Role, and always getting the flowing Error message. NXDOMAIN

[root@oauth-openshift-54fdb77854-dpjhj /]# host api.cloudone.cloudz.local

Host api.cloudone.cloud.local not found: 3(NXDOMAIN)

I customized DNS to force CoreDNS to forward cloud.local Domaine names to my DNS Server

apiVersion: operator.openshift.io/v1

kind: DNS
metadata:
name: default
spec:
servers:
- name: cloudz.dns
zones:
- cloudz.local
forwardPlugin:
upstreams:
- 192.168.253.99

Core DNS will push the configuration of DNS Change al the cluster Pods and the final configuration of DNS maps looks like (see the blood paragraph) a

[root@rhv ~]# oc get configmap/dns-default -n openshift-dns -o yaml

apiVersion: v1
data:
Corefile: |
# opentlc-dns
cloudz.local:5353 {
prometheus 127.0.0.1:9153
forward . 192.168.253.99 {
policy random
}
errors
log . {
class error
}
bufsize 1232
cache 900 {
denial 9984 30
}

}
.:5353 {
bufsize 1232
errors
log . {
class error
}
health {
lameduck 20s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus 127.0.0.1:9153
forward . /etc/resolv.conf {
policy sequential
}
cache 900 {
denial 9984 30
}
reload
}
hostname.bind:5353 {
chaos
}

and hope all the Operator are Available, this took a of days and weeks OpenShift is really amazing product from Redhat but like any Kubernetes distribution it is like and ice Brage iceberg theory.

Good Luck

Leave a comment