Correct ALB health check URL to use for containerize octo server

peter_m_mcevoy · 21 December 2022 16:19

I’m trying to run the Octopus Server linux container in ECS. I can sucessfully manage to get an ECS task running, however as soon as I wire up the ECS service to an Application Load Balancer, the service goes into a restart loop with the following errors in ECS:

Task failed ELB health checks

in the ALB target group, I see the following:

Health checks failed with these codes: [503]

I’m pretty sure this is because I put my Windows Service based install into maintenance mode before I shut down and attempted the migration to Container based server (as per advice here)

I have declared the following Target Group health check as per advice here

	health_check {
		path                = "/api/octopusservernodes/ping"
		matcher             = "200"
		port                = 8080
		timeout             = 5
		interval            = 30
		healthy_threshold   = 10
		unhealthy_threshold = 2
	}

(note that I cannot specify 503 in the matcher, as the maximum possible in an ALB is 499 - although NLBs can go higher). I’ve also tried 200,418, but the response code I am seeing is 503.

Is the 503 because the server is in Maintenance mode?
If so, is there a better ALB health check path I could use: I can’t loose access to the server in the future if I need to enter maintenance mode?

At the moment, I’m only exposing 8080 - but I would like to expose the tentacle port as I will need to change my tentacles to polling tentacles. Am I on a losing battle with ALBs? If I move to a Network Loadbalancer, will I get the same problems?

Please note - this is not production impacting: I’m experementing in a test area with a backup DB, so do not feel I need an answer quickly. I’m logging this support ticket as one of the last things I’m doing before Xmas break, so it will be January before I pick up anyway.

Merry Christmas!
Pete

peter_m_mcevoy · 21 December 2022 16:23

I should also add, I am seeing the following in the server logs:


...
IsInMaintenanceMode has been changed to True
Now listening on: "http://[::]:8080"
"HTTP" "GET" to "10.95.38.124:8080""/api/octopusservernodes/ping" completed with 503 in 00:00:00.0072211 (007ms) by "<anonymous>"
...

So I guess the “ping” endpoint is not a suitable health check for a load balancer. Is there a better one that you can suggest?

Justin_Walsh · 21 December 2022 20:34

Hi @peter_m_mcevoy

Thanks for reaching out, and sorry to hear that you’re having issues with your load balancer’s healthchecks.

In general, that endpoint is what we recommend and use internally. It should be returning 200, even when in maintenance mode. Octopus will return a 418 if the node is in drain mode. Since Maintenance mode is a cluster-wide condition, and not a node-specific one (unlike drain mode), it still returns a 200 usually, with the IsInMaintenanceMode field set to true.

Justin_Walsh · 22 December 2022 02:48

In regards to the 503, when a node first starts up, there is a couple of seconds after the web listener starts where it will return a 503 while it’s ensuring all of the elements are online and ready. This should only persist for a very small amount of time. If you’re finding that it is persisting for multiple health checks, please let me know, and we’ll grab some server logs from you and investigate further.

peter_m_mcevoy · 3 January 2023 11:28

Hi Justin,
Thanks for replying to me and apologies for late follow-up: I’m only back from xmas break today.

I’m gonna review my setup again: I think that the constant 503 was actually related to incorrect permissions on the EFS share I had setup for persistence of package feed and logs etc. I’m gonna fix that and observe if I can get correct startup and exposed on the ALB.

I’ll update this thread later today

Pete

system · 3 February 2023 11:28

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.