Good afternoon,
A little while ago today all of our Kubernetes deployment targets started reporting as unhealthy.
Upon inspection, it seems that kubectl
is no longer available when running health checks directly on a worker.
For us the fix is easy, we can simply run a custom docker image in a container. However, this change/regression was unexpected - it looks like the options required to run health checks using a docker image running in a container were only added in the current version, 2021.2 - so I thought it worth bringing to your attention.
For anyone that is interested, the script we use is below, and the logs of the failed health check are below that.
Thanks,
David
Script to create deployment target
param(
### The name of the EKS cluster to create a deployment target for.
[Parameter(Mandatory = $true)]
[string]$ClusterName,
### The name of the AWS account to use to connect to the EKS cluster.
[Parameter(Mandatory = $true)]
[string]$AwsAccountName
)
### Describe the EKS cluster that the deployment target will be for.
Write-Highlight "Describing EKS cluster..."
$cluster = aws eks describe-cluster --name $ClusterName --query "cluster" | ConvertFrom-Json
### Create a hash table containing arguments to pass to the function that will create the deployment target.
$newTargetArgs = [ordered]@{
name = $cluster.name.ToLower()
octopusRoles = "aws-eks, aws-eks-{0}" -f $cluster.name.ToLower()
clusterUrl = $cluster.endpoint
octopusAccountIdOrName = $AwsAccountName
clusterName = $cluster.name
namespace = "default"
updateIfExisting = $true
skipTlsVerification = $true
healthCheckContainerImageFeedIdOrName = "Feeds-1482"
healthCheckContainerImage = "ldx-analytics/deployment:latest"
}
### Create the new deployment target.
Write-Highlight "Creating deployment target..."
New-OctopusKubernetesTarget @newTargetArgs
Health check logs
Task ID: ServerTasks-1057521
Related IDs: Machines-3657, Spaces-182
Task status: Failed
Task queued: Wednesday, 15 September 2021 3:31:40 PM +00:00
Task started: Wednesday, 15 September 2021 3:31:57 PM +00:00
Task completed: Wednesday, 15 September 2021 3:32:01 PM +00:00
Task duration: 4 seconds
Server version: 2021.2.7428+Branch.release-2021.2.Sha.d771d6437f879f789be3f86d8c7d4ffa53eb3867
Server node: octopus-i009472-85f99c4fb-wjsmm
| == Failed: Check my-cluster health ==
15:31:57 Info | Starting health check for a limited set of machines.
15:31:57 Verbose | Health check was requested for 1 machine
15:31:57 Verbose | Found 1 matching machine
15:31:57 Info | Performing health check on 1 machine.
15:32:01 Verbose | Checking for Tentacles to update
15:32:01 Fatal | The health check failed. One or more machines were not available.
|
| == Failed: Check deployment target: my-cluster ==
15:31:57 Verbose | Performing health check on machine
15:31:57 Verbose | Leased worker octopus-worker from pool AWS ECS Linux (lease WorkerTaskLeases-45323).
15:31:57 Verbose | Script isolation level: NoIsolation
15:31:58 Verbose | Executable directory is /bin
15:31:58 Verbose | Executable name or full path: /bin/bash
15:31:58 Verbose | No user context provided. Running as current user.
15:31:58 Verbose | Starting /bin/bash in working directory '/etc/octopus/Work/20210915153157-1057521-54' using 'Unicode (UTF-8)' encoding running as 'root' with the same environment variables as the launching process
15:31:58 Verbose | Process /bin/bash in /etc/octopus/Work/20210915153157-1057521-54 exited with code 0
15:31:58 Verbose | Using Calamari.linux-x64 19.4.8
15:31:58 Verbose | Script isolation level: NoIsolation
15:31:59 Verbose | Executable directory is /bin
15:31:59 Verbose | Executable name or full path: /bin/bash
15:31:59 Verbose | No user context provided. Running as current user.
15:31:59 Verbose | Starting /bin/bash in working directory '/etc/octopus/Work/20210915153159-1057521-55' using 'Unicode (UTF-8)' encoding running as 'root' with the same environment variables as the launching process
15:32:00 Verbose | Calamari Version: 19.4.8
15:32:00 Verbose | Environment Information:
15:32:00 Verbose | OperatingSystem: Unix 4.14.243.185
15:32:00 Verbose | OsBitVersion: x64
15:32:00 Verbose | Is64BitProcess: True
15:32:00 Verbose | Running on Mono: False
15:32:00 Verbose | CurrentUser: root
15:32:00 Verbose | MachineName: 862e32519743
15:32:00 Verbose | ProcessorCount: 2
15:32:00 Verbose | CurrentDirectory: /etc/octopus/Work/20210915153159-1057521-55
15:32:00 Verbose | TempDirectory: /tmp/
15:32:00 Verbose | HostProcess: Calamari (3177)
15:32:00 Verbose | Performing variable substitution on '/etc/octopus/Work/20210915153159-1057521-55/Script.sh'
15:32:00 Verbose | Executing '/etc/octopus/Work/20210915153159-1057521-55/Script.sh'
15:32:00 Verbose | Setting Proxy Environment Variables
15:32:00 Verbose | "chmod" u=rw,g=,o= "/etc/octopus/Work/20210915153159-1057521-55/kubectl-octo.yml"
15:32:00 Verbose | Temporary kubectl config set to /etc/octopus/Work/20210915153159-1057521-55/kubectl-octo.yml
15:32:00 Error | Could not find kubectl. Make sure kubectl is on the PATH. See https://g.octopushq.com/KubernetesTarget for more information.
15:32:00 Verbose | Process /bin/bash in /etc/octopus/Work/20210915153159-1057521-55 exited with code 1
15:32:01 Verbose | Released worker octopus-worker from lease WorkerTaskLeases-45323
15:32:01 Verbose | Exit code: 1
15:32:01 Fatal | The remote script failed with exit code 1
15:32:01 Verbose | The remote script failed with exit code 1
| Octopus.Server.Orchestration.Targets.Tasks.ActionHandlerFailedException: The remote script failed with exit code 1
| at Octopus.Server.Orchestration.ServerTasks.Deploy.ActionDispatch.SuccessArbitrator.ThrowIfNotSuccessful(IActionHandlerResult result) in ./source/Octopus.Server/Orchestration/ServerTasks/Deploy/ActionDispatch/SuccessArbitrator.cs:line 22
| at Octopus.Server.Orchestration.ServerTasks.Deploy.ActionDispatch.AdHocActionDispatcher.Dispatch(Machine machine, ActionHandlerInvocation actionHandler, ITaskLog taskLog, VariableCollection variables) in ./source/Octopus.Server/Orchestration/ServerTasks/Deploy/ActionDispatch/AdHocActionDispatcher.cs:line 56
| at Octopus.Server.Orchestration.ServerTasks.HealthCheck.Controllers.VirtualTargetHealthController.CheckHealth(Machine machine, ITaskLog taskLog) in ./source/Octopus.Server/Orchestration/ServerTasks/HealthCheck/Controllers/VirtualTargetHealthController.cs:line 93
| at Octopus.Server.Orchestration.ServerTasks.HealthCheck.HealthCheckService.PerformHealthCheck(Machine machine, ITaskLog taskLogForMachine, CancellationToken cancellationToken, IHealthResultCollator healthResultCollator, ExceptionHandling exceptionHandling, Action`2 customAction) in ./source/Octopus.Server/Orchestration/ServerTasks/HealthCheck/HealthCheckService.cs:line 92
| Octopus.Server version 2021.2.7428 (2021.2.7428+Branch.release-2021.2.Sha.d771d6437f879f789be3f86d8c7d4ffa53eb3867)
15:32:01 Verbose | Recording health check results
|
| == Failed: Summary ==
15:32:01 Info | Unhealthy:
15:32:01 Info | - [my-cluster](~/app#/Spaces-182/infrastructure/machines/Machines-3657/settings) at https://BBED18E2E9B9474BBCF7EB88117FA686.gr7.eu-west-2.eks.amazonaws.com, error: The remote script failed with exit code 1
15:32:01 Fatal | One or more machines were not available. Please see the output Log for details.