Infrastructure and Cloud for Enthusiasts

[blog 018]# git commit

Cloud Director – Tenant Kubernetes Troubleshooting

Recently in my professional life I have been working through the product development process to allow customers to deploy Kubernetes Container Clusters within their Cloud Director tenancies leveraging the Tanzu Container Service Extension.

As a quick recap the Container Service Extension automates the instantiation of the ephemeral vApp in the customers tenancy which in turn deploys the Kubernetes control plane and worker nodes, and ingress and nat services from NSX ALB.

The requirements for deploying container workloads within a tenancy are the following,

  • A customer tenancy with a Provider Gateway, NSX Edge Gateway and Overlay Networks,
  • NSX ALB with a shared or dedicated Service Engine for the customer tenancy,
  • The T1 gateway must have load balancing services enabled,
  • Customer overlay networks must be able to route and resolve the Cloud Director endpoint.

The following diagram illustrates the high level networking requirements for Container Clusters.

Figure 1 – Example Customer Tenancy High Level Networking

In this blog I would like to cover trouble shooting that maybe required within the customers tenancy if a Kubernetes Container Cluster won’t deploy.

Trouble Shooting with the Ephemeral vApp

NSX DFW and Datacenter Groups

The purpose of the Ephemeral vApp is essentially deploy the control plane and worker nodes for the container cluster, and the automated process is similar to deploying TKGm cluster. The main difference is that instead of having a VM with all the required deployment packages already installed, the Ephemeral vApp downloads items such as capvcd, clusterd, kubectl and docker from Github repositories.

If you have a customer tenancy that has datacenter groups, DFW enable and NSX 4.x there is chance that the “DefaultMacliciousIpGroup” in NSX will block access to sites such as Github and Microsoft Services.

On the Ephemeral vApp if you tail /var/log/cloud-final.err you will see errors where the vApp cannot download the YAML components from Github and connectivity to “raw.githubuserconent.com” is not reachable. This is due to NSX blocking the URL as being a malicious IP. Allow the IP within NSX as an exception.

Figure 2 – Example of URL blocking by NSX
Figure 3 – Example Malicious IP Blocked.
Figure 4 – Adding Exception to IP.
Checking for failed deployment of Container Clusters

If you are not seeing any errors within the /var/log/cloud-final.err logs check the deployment control plane for any errors by checking the availability of the bootstrap cluster.

This can be done using the following command on the ephemeral vApp. Note to get the root password check the guest customization of the vApp VM.

root@EPHEMERAL-TEMP-VM:/.kube# kubectl get po,deploy,cluster,kubeadmcontrolplane,machine,machinedeployment -A –kubeconfig /.kube/config

If there are any failed pods you can check the logs of the bootstrap pods for errors.

For example , kubectl –namespace kube-system logs etcd-kind-control-plane –kubeconfig /.kube/config

Figure 5 – Checking Bootstrap Cluster.

If the bootstap cluster is ok check the following,

  • The Ephemeral VM can reach and resolve the public VIP of the Cloud Director cluster.
  • Check NSX ALB for the following
    • The tenancy has a service engine assigned and load balancing is enabled.
    • Check that the transit network was created in NSX with dhcp enabled.
    • Check in NSX ALB that a NSX VRF context has been created for the customer tenancy T1 gateway and the associated transit network has dhcp enabled.
    • There is enough licensing resource to deploy service engines.

While this is not an exhaustive list of trouble shooting, I have found that this is typically the reason why container clusters will not deploy. To be honest it took time to trouble shoot each of these scenarios and I will add to this blog over time if I come across anymore issue.

This will be the last blog for 2023 so stay safe and I will see you all again in 2024.

Cheers

Tony Williamson


Add Your Comment

* Indicates Required Field

Your email address will not be published.

*