EKS: fragile connectivity


We have an EKS Cluster that is currently working and we deploy to it with the kubectl script step. However, there is a regular problem, that isn’t too much of a problem right now, but will be when we want to deploy automatically.

The connectivity to an EKS cluster is quite fragile and need to be done manually multiple time. Here is an example of what happens today (and happen every time):
10:46: try to connect, unhealthy
10:52: try, healthy
10:54: deploying doesn’t work. Unhealthy again.
10:55: try to connect, unhealthy.
10:57: try, healthy.
10:59: deploying works. healthy.
11:21: deploy works.
11:25: deploy fails. environment still marked as healthy.
11:26: try, unhealthy.
11:27: try, healthy.
11.29: deployment successful.

I am not sure if that is a ‘normal’ thing from Octopus, from EKS, if the API is just very ‘moody’…? I have to admit it is a bit annoying it doesn’t just deploy like it does for other environments/infrastructure.
It is as well a bit particular that the first time it passes healthy, it isn’t actually always healthy so you have to double check.

Here is what we have as an error:

Creating kubectl context to https://NUMBER.yl4.REGION.eks.amazonaws.com (namespace default) using EKS cluster name NAME
kubectl version to test connectivity
Client Version: v1.12.7
Unable to connect to the server: x509: certificate signed by unknown authority
The remote script failed with exit code 1

What we tried, and which works, is skipping the TLS certification, so there is something weird going on with the certificate.

We currently have an environment per namespace (but we used to have only one cluster at the very beginning), and the user deploying has the authorization to deploy to the cluster.


(Michael Richardson) #4


We haven’t seen this before. I can certainly see why it is a problem.

Just to confirm, this is happening even when you have Skip TLS Verification set on the target?
Do you have anything configured in the cluster certificate authority selection?