Unable to connect using SSH

Hi.

Our health checks and deployments to certain target servers have started failing today, and I have been unable to determine a cause. It is specifically the servers to which Octopus should connect using SSH - all of them.

I’ve checked that we can connect to those servers ourselves, and I’ve checked the IP address whitelists to ensure that Octopus should be granted access, and yet all attempts to have Octopus connect to these servers has failed.

The failures seems to have started between about 5 and 21 hours ago - sometime overnight, our time.

Any advice on how we can more precisely determine the cause of the failure? Is there anything going on at the Octopus end that we should know about, that might be a factor?

I can pull the call stack of the failure out of the logs, but it doesn’t seem to say anything very interesting. We already know that the SSH simply cannot find the target.

Hi @inv-services,

Thanks for getting in touch!

Would you be able to share your instance name and I can check the server logs to see if it provides any additional detail.

Regards,
Paul

Hi Paul.

The instance name is investorist, hosted at investorist.octopus.app

Regards,
Travis

Hi Travis,

I’ve checked the instance and other than an update to 2020.3.3 three days ago, there haven’t been any other changes to the instance.

Scanning the Octopus server logs for the past day, the only issue I can see is that your storage limit was reached around 7-8 hours ago causing a lot of of failures with log files and any other process that needs to write to the storage.

It looks like the space being used may have decreased very slightly since then, but it is still hovering near the limit: 19.01 / 20.00 GB. You will need to review the retention policies in place within lifecycles and the package repository to work on reducing this.

To look into the issue any further I’ll need to log in to your instance, are you happy for me to do this?

Regards,
Paul

Hi Paul.

Yes, I’m happy for you to log in to the instance.

Regarding the storage space, yes, I’m aware that we hit the limit earlier today, and I deleted a number of unneeded packages to reduce the amount of space in use. Obviously this is only a stopgap measure.

Regards,
Travis

I’ve checked the instance logs and tested a SSH target connection from my own Cloud instance and can’t see any issues.

The only thing to check now would be local network infrastructure to see if you can see any connection attempts from the Cloud instance.

Well, just as mysteriously as the connections started failing one night, they started working again the next night.

Since the target servers are also cloud services, there’s only so much “local network infrastructure” to check, but I’ll trawl through logs on the other end to see if I spot anything. At the moment, though, it’s working, so all good.

Thank you for your attempts to identify a cause. I’ll post again if it fails again.

(Also, I think I’ve identified a way to significantly decrease the size of the packages we build, so that should help with the space issues a bit.)

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.