We are frequently seeing an issue where our deployments are failing.
We cant find a pattern of the issue happening, happens at different times and not on the same servers.
Most of the time when we re trigger the deployment it passes.
Snapshot of the error:
Send me a link so I can upload the log.
Thanks for sending through your issue.
I see you had this issue previously - just wondering if you got any resolution back then or if the issue has persisted from that time?
I have created a secure upload link for you to send the logs to:
I will say in the number of times we have seen this type of error appear, they typically are environmental issues such as network dropouts or interference from firewalls, proxies or such. We have seen these internally on our own cloud network and our engineers have traced it to network issues with our provider.
I would try some of the suggestions Clare made on your previous ticket and let us know if you got any further with them:
I am wondering, in the logs for your worker are you seeing similar messages? The worker must be passing health checks as, when you re-run the runbook, the task succeeds. You mentioned you are seeing a few fails in your runbooks, does that indicate multiple runbooks that are having this issue and if so are they all using the same worker?
If it is the same worker it might be worth running a constant tentacle ping on that worker to see if there are any connection dropouts at the time you are running the runbooks. This would help narrow down the issue.
Breaking down the error you are seeing shows an interruption in the data stream between server and target. These are usually some external factor acting on the connection. The fact that a retry gets you a success is also a strong indicator of this.
We have also seen occasions of multiple tentacles installed on a machine causing lock issues - not sure if this applies. Another cause may be A/V or WAF rules.
We would definitely like to take a look at your logs for the failure and see if there is an Octopus issue we can nail down and help with. Let us know when you upload as we don’t get individual notifications when customers upload their logs.
Hoping to get to the bottom of this for you.
I didn’t get to the bottom of it, as it was only failing a few of my runbooks and passed on a rerun I didn’t investigate any further.
But now this issue is affecting some important deployments I said I would raise it again.
I have uploaded the tentacle log.
There were fails in multiple runbooks before and on different workers.
I will look into it further and get back to you.
We have seen that our AKS cluster nodes are spiking high CPU so we are going to upgrade our AKS cluster nodes to see if this helps.
Please let us know how it goes!