We have an EKS Cluster that is currently working and we deploy to it with the kubectl script step. However, there is a regular problem, that isn’t too much of a problem right now, but will be when we want to deploy automatically.
The connectivity to an EKS cluster is quite fragile and need to be done manually multiple time. Here is an example of what happens today (and happen every time):
10:46: try to connect, unhealthy
10:52: try, healthy
10:54: deploying doesn’t work. Unhealthy again.
10:55: try to connect, unhealthy.
10:57: try, healthy.
10:59: deploying works. healthy.
11:21: deploy works.
11:25: deploy fails. environment still marked as healthy.
11:26: try, unhealthy.
11:27: try, healthy.
11.29: deployment successful.
I am not sure if that is a ‘normal’ thing from Octopus, from EKS, if the API is just very ‘moody’…? I have to admit it is a bit annoying it doesn’t just deploy like it does for other environments/infrastructure.
It is as well a bit particular that the first time it passes healthy, it isn’t actually always healthy so you have to double check.
Here is what we have as an error:
Creating kubectl context to https://NUMBER.yl4.REGION.eks.amazonaws.com (namespace default) using EKS cluster name NAME kubectl version to test connectivity Client Version: v1.12.7 Unable to connect to the server: x509: certificate signed by unknown authority Fatal The remote script failed with exit code 1
What we tried, and which works, is skipping the TLS certification, so there is something weird going on with the certificate.
We currently have an environment per namespace (but we used to have only one cluster at the very beginning), and the user deploying has the authorization to deploy to the cluster.