Sudden Tentacle Disconnect since 11 February 2020

jacov · 12 February 2020 09:02

One of our tentacles suddenly stopped communicating yesterday, and nothing I do seems to be able to get it to connect again

I’ve
restarted,
re-installed,
removed and re-added the tentacle,
checked the firewall settings,
restarted the server twice.
Nothing has changed, I have no idea why this tentacle has suddenly stopped communicating.

If I access the port in a browser;
Octopus Tentacle configured successfully
If you can view this page, your Octopus Tentacle is configured and ready to accept deployment commands.

When I do a health check on our Octopus Cloud Sever for the Tentacle;
An error occurred when sending a request to 'https://***.***.***.***:10933/', before the request could begin: The client was unable to establish the initial connection within the timeout 00:01:00.
The client was unable to establish the initial connection within the timeout 00:01:00.

paul.calvert · 12 February 2020 09:14

Hi @jacov,

Thanks for getting in touch!

We have a lot of troubleshooting steps that can be taken with tentacles listed here: https://octopus.com/docs/infrastructure/deployment-targets/windows-targets/troubleshooting-tentacles

The first place to start though would be to check the OctopusTentacle.txt log file on the tentacle machine, the default location is C:\Octopus\Logs\. If you can attach the most recent log file there I can take a look for any issues.

When you checked the tentacle port in the browser, was this from a browser on the Octopus Server machine? If not, give that a try too.

Regards,
Paul

jacov · 12 February 2020 09:27

System.Security.Authentication.AuthenticationException: A call to SSPI failed, see inner exception. —> System.ComponentModel.Win32Exception: An unknown error occurred while processing the certificate

I should’ve checked the logs thanx, this appears to be the error in the logs at the moment.

The Octopus Server is a Cloud Server, so I’m unable to check from that location

jacov · 12 February 2020 09:31

I’ve checked TLS 1.1 and 1.2, both are enabled

paul.calvert · 12 February 2020 09:44

It does point to an issue with SChannel/TLS. There are some further steps to check here: https://octopus.com/docs/administration/security/octopus-tentacle-communication/troubleshooting-schannel-and-tls

You mention that this is only affecting one of your tentacles?
It would be worth running IISCrypto on one of the working tentacle machines and this machine and see if there are any differences in the configuration.

jacov · 12 February 2020 09:53

Hi Paul,

I’ve run IISCrypto and the only difference is that on the problem server
RCS 40, 56, 128 are disabled.

paul.calvert · 12 February 2020 09:58

Can we try enabling them and giving it the machine a restart to see if it makes a difference?

jacov · 12 February 2020 10:02

Running Health Check again now

jacov · 12 February 2020 10:09

Hey Paul,
That did not resolve the problem.
I’ve since used “tentacle new-certificate”, copied the new Thumbprint onto the Octopus Server Instance and rerun the Health Check

jacov · 12 February 2020 10:15

That also did nothing, still the same error.
I’ve checked and in the logs, the latest is (There appears to be no connection attempt);

2020-02-12 12:07:27.5100   4308      7  INFO  ==== RunAgentCommand ====
2020-02-12 12:07:27.5300   4308      7  INFO  CommandLine: C:\Program Files\Octopus Deploy\Tentacle\Tentacle.exe run --instance=Tentacle
2020-02-12 12:07:27.6830   4308      7  INFO  Agent will trust Octopus Servers with the thumbprint: *************
2020-02-12 12:07:27.7650   4308      7  INFO  listen://[::]:10933/              7  Listener started
2020-02-12 12:07:27.7781   4308      7  INFO  Agent listening on: [::]:10933
2020-02-12 12:07:27.7781   4308      7  INFO  The Windows Service has started
2020-02-12 12:07:27.8351   1908      1  INFO  Waiting for service to become Running. Current status: Running
2020-02-12 12:07:28.1486   1908      1  INFO  Service started

jacov · 12 February 2020 10:23

I’ve stopped the existing tentacle, installed a new one, copied the thumbprint into the instance on the Octopus Server and rerun the health check.

This has also not resolved the issue.

System.IO.IOException: The handshake failed due to an unexpected packet format.
   at System.Net.Security.SslState.InternalEndProcessAuthentication(LazyAsyncResult lazyResult)
   at System.Net.Security.SslState.EndProcessAuthentication(IAsyncResult result)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Halibut.Transport.SecureListener.<ExecuteRequest>d__24.MoveNext()

AND

System.Security.Authentication.AuthenticationException: A call to SSPI failed, see inner exception. ---> System.ComponentModel.Win32Exception: An unknown error occurred while processing the certificate
   --- End of inner exception stack trace ---
   at System.Net.Security.SslState.InternalEndProcessAuthentication(LazyAsyncResult lazyResult)
   at System.Net.Security.SslState.EndProcessAuthentication(IAsyncResult result)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Halibut.Transport.SecureListener.<ExecuteRequest>d__24.MoveNext()

jacov · 12 February 2020 10:31

Ah I see now,
The Certificate error gets added to the log when I access the tentacle through the browser.

jacov · 12 February 2020 10:46

It seems the problem exists on our Production Server as well now.
A health check to the Production Server’s Tentacles show the same problem.

This could be a change that occurred on the Host, as both our UAT and Production Servers are hosted with the same ISP

paul.calvert · 12 February 2020 10:59

You mentioned that your using our Cloud instance, are you able to provide the instance name for me?
We are currently in the process of migrating instances to a new hosted platform and I’m wondering if this could have coincided with this issue in some way.