We have Octopus configured in Linux containers in ECS behind a NLB. The setup works properly with one node, however when we spin up a second one, Polling tentacles start to fail during health checks. Most of the times health checks fail for reaching time outs.
Could you let us know if you have a workaround for this issue or if there’s a plan to fix this in the near future.
Thank you for contacting Octopus Support and sorry to hear you are having issues with your polling tentacles when spinning up a second Octopus Node.
I know you mentioned most of the time the polling tentacles fail with timeout connection issues but it would be good to get some tentacle logs from when the tentacle fails its health check and also some logs from your Octopus nodes if possible please (see picture below for acquiring those). You can send them to a secure link I have created for you here.
Do the polling tentacles all start failing when you spin up the node or is it a select few, do they then work after you take that node offline?
Is it just your polling tentacles that are having this issue or do you have listening tentacles that fail health checks?
I have not heard of this happen to other users and there are no known issues regrading this so I am keen to try and figure out why this is happening.
We have some great documentation on Tentacle Troubleshooting here if you have not seen it yet which outlines a few different troubleshooting methods you can use to ascertain what is going on here. Of particular note is the tentacle ping which I recommend you run. This will show you if the tentacle can communicate with the Octopus Server nodes from a networking point of view.
Let me know if anything in the troubleshooting guide stands out and let us know once those logs have been uploaded as I can take a look at them whilst you go through the troubleshooting guide.
Logs: this will take me sometime to setup a test environment, so I will update this asap.
Do the polling tentacles all start failing when you spin up the node or is it a select few, do they then work after you take that node offline?
A:/ All of them fail randomly, failing most times. All work once I have only one node.
Is it just your polling tentacles that are having this issue or do you have listening tentacles that fail health checks?
A:/ Only polling ones fail. SSH and listening ones work perfectly.
I have followed the troubleshooting docs, but I have found nothing helpful in this matter.
Can you give me specific details about how the polling health check works? As example when you run a health check, I assume Octopus leaves a task for the tentacle to perform, but where is this task placed? is it saved in the database or does octopus manage it by queues or other mechanism in each node?
Thank you for that response, a collegue has just sent me an article from our forums to show you which may help. I don’t know if you have seen this but it looks to be pretty close to what you are trying to achieve.
We also have this page (which I should have linked in my inital response) which details how you go about making sure your tentacles are polling to all of your nodes not just one.
Are you able to have a look and make sure you have not missed anything from a configuration point of view that could cause the issues you are experiencing? If you have seen the articles already let me know and I will look out for the logs.
As for the polling tentacles we have some documentation on how they communicate with Octopus here, that explains it a lot better than I can over a forum post.
Basically the health check is run by the Octopus server to the tentacle via a script (similar to this one - that checks disk space which is also done on the full health check but you get the gist). Octopus will run automated health checks to workers via scheduled tasks and it also does it on deployments to that target and manually when a user requests it (also on initial tentacle registration).
I think the top article will get you to a resolution and might explain why your health checks are failing when running a new Octopus node.
Let me know if that answers your health check query for polling tentacles and also how you get on with that article.
Thank you so much for your reply. Unfortunately, this is not doable for our ECS solution and due to the amount of listening tentacles that we have in place, even if we were using EC2 instances, it will imply a huge effort to achieve it.
Thank you for getting back to me, did you manage to get a test instance setup so we can see some logs as that will be our next step in order to see if we can diagnose this issue for you.
The secure link I posted is still active so let me know once the logs are uploaded and I can have a look at them for you.