Sudden Tentacle Disconnect since 11 February 2020

resolved
cloud
reliability
external
(Jacov) #1

One of our tentacles suddenly stopped communicating yesterday, and nothing I do seems to be able to get it to connect again

I’ve
restarted,
re-installed,
removed and re-added the tentacle,
checked the firewall settings,
restarted the server twice.
Nothing has changed, I have no idea why this tentacle has suddenly stopped communicating.

If I access the port in a browser;
Octopus Tentacle configured successfully
If you can view this page, your Octopus Tentacle is configured and ready to accept deployment commands.

When I do a health check on our Octopus Cloud Sever for the Tentacle;
An error occurred when sending a request to 'https://***.***.***.***:10933/', before the request could begin: The client was unable to establish the initial connection within the timeout 00:01:00.
The client was unable to establish the initial connection within the timeout 00:01:00.

(Paul Calvert) #3

Hi @jacov,

Thanks for getting in touch!

We have a lot of troubleshooting steps that can be taken with tentacles listed here: https://octopus.com/docs/infrastructure/deployment-targets/windows-targets/troubleshooting-tentacles

The first place to start though would be to check the OctopusTentacle.txt log file on the tentacle machine, the default location is C:\Octopus\Logs\. If you can attach the most recent log file there I can take a look for any issues.

When you checked the tentacle port in the browser, was this from a browser on the Octopus Server machine? If not, give that a try too.

Regards,
Paul

(Jacov) #4

System.Security.Authentication.AuthenticationException: A call to SSPI failed, see inner exception. —> System.ComponentModel.Win32Exception: An unknown error occurred while processing the certificate

I should’ve checked the logs thanx, this appears to be the error in the logs at the moment.

The Octopus Server is a Cloud Server, so I’m unable to check from that location

(Jacov) #5

I’ve checked TLS 1.1 and 1.2, both are enabled

(Paul Calvert) #6

It does point to an issue with SChannel/TLS. There are some further steps to check here: https://octopus.com/docs/administration/security/octopus-tentacle-communication/troubleshooting-schannel-and-tls

You mention that this is only affecting one of your tentacles?
It would be worth running IISCrypto on one of the working tentacle machines and this machine and see if there are any differences in the configuration.

(Jacov) #7

Hi Paul,

I’ve run IISCrypto and the only difference is that on the problem server
RCS 40, 56, 128 are disabled.

(Paul Calvert) #8

Can we try enabling them and giving it the machine a restart to see if it makes a difference?

(Jacov) #9

Running Health Check again now

(Jacov) #10

Hey Paul,
That did not resolve the problem.
I’ve since used “tentacle new-certificate”, copied the new Thumbprint onto the Octopus Server Instance and rerun the Health Check

(Jacov) #11

That also did nothing, still the same error.
I’ve checked and in the logs, the latest is (There appears to be no connection attempt);

2020-02-12 12:07:27.5100   4308      7  INFO  ==== RunAgentCommand ====
2020-02-12 12:07:27.5300   4308      7  INFO  CommandLine: C:\Program Files\Octopus Deploy\Tentacle\Tentacle.exe run --instance=Tentacle
2020-02-12 12:07:27.6830   4308      7  INFO  Agent will trust Octopus Servers with the thumbprint: *************
2020-02-12 12:07:27.7650   4308      7  INFO  listen://[::]:10933/              7  Listener started
2020-02-12 12:07:27.7781   4308      7  INFO  Agent listening on: [::]:10933
2020-02-12 12:07:27.7781   4308      7  INFO  The Windows Service has started
2020-02-12 12:07:27.8351   1908      1  INFO  Waiting for service to become Running. Current status: Running
2020-02-12 12:07:28.1486   1908      1  INFO  Service started
(Jacov) #12

I’ve stopped the existing tentacle, installed a new one, copied the thumbprint into the instance on the Octopus Server and rerun the health check.

This has also not resolved the issue.

System.IO.IOException: The handshake failed due to an unexpected packet format.
   at System.Net.Security.SslState.InternalEndProcessAuthentication(LazyAsyncResult lazyResult)
   at System.Net.Security.SslState.EndProcessAuthentication(IAsyncResult result)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Halibut.Transport.SecureListener.<ExecuteRequest>d__24.MoveNext()

AND

System.Security.Authentication.AuthenticationException: A call to SSPI failed, see inner exception. ---> System.ComponentModel.Win32Exception: An unknown error occurred while processing the certificate
   --- End of inner exception stack trace ---
   at System.Net.Security.SslState.InternalEndProcessAuthentication(LazyAsyncResult lazyResult)
   at System.Net.Security.SslState.EndProcessAuthentication(IAsyncResult result)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Halibut.Transport.SecureListener.<ExecuteRequest>d__24.MoveNext()
(Jacov) #13

Ah I see now,
The Certificate error gets added to the log when I access the tentacle through the browser.

(Jacov) #14

It seems the problem exists on our Production Server as well now.
A health check to the Production Server’s Tentacles show the same problem.

This could be a change that occurred on the Host, as both our UAT and Production Servers are hosted with the same ISP

(Paul Calvert) #15

You mentioned that your using our Cloud instance, are you able to provide the instance name for me?
We are currently in the process of migrating instances to a new hosted platform and I’m wondering if this could have coincided with this issue in some way.