AWS Polling Tentacles: Pipefish.PipefishException: The remote host aborted the connection

We have automated Dev environments which are spun up and down during the day (shutdown over night) in AWS which are behind a NAT host so as a result we are using Polling for the Tentacle connectivity. These environments will work fine for a period of days then randomly a single environment will fail to connect and when looking into the issues I find that:

a) on the Octopus server there is no connectivity and they are listed as Offline
b) on the Tentacle I see this: 2015-04-09 09:39:35.7895 10 ERROR Error posting to: https://octopus:10943/mx/v1
Pipefish.PipefishException: The remote host aborted the connection. This can happen when the remote server does not trust the certificate that we provided. —> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. —> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
— End of inner exception stack trace —

I then have to re-install the Tentacle and reconfigure it from the start to make it work. If I reinstall the service(from the Tentacle Manager), delete the tentacle (from the server), try to reset the connection (from the server) or all 3, it will still not connect and the error logs will report the above message.

If I run the TentaclePing.exe there are no issues. It’s only after a full reinstall that it will re-connect

This is also running on 2.6.4.951 and I had this issue (abit worse) on 2.6.0.???

This issues seem very simialer to http://help.octopusdeploy.com/discussions/problems/29735-od-26-tentacle-connection-issues but this is not due to uploading to the Tentacle

Is there a troubleshooting guide for Polling similar to (http://docs.octopusdeploy.com/display/OD/Troubleshoot+Listening+Tentacles) ?

The cert was also created using this guide: http://docs.octopusdeploy.com/display/OD/Export+and+import+Tentacle+certificates+without+a+profile since we use custom powershell to install (soon to be a fork of your DSC)

Thanks

James

Hi James,

Sorry for not getting back to you sooner, but I’ve been investigating this issue (as well as working on new features for 3.0) and as you probably realize the issue you have encountered is difficult to replicate unfortunately.

I have a question for you, you guys don’t disable any of the machines that are in there development environments do you?

It seems like the Octopus server has removed the Tentacle’s thumbprint from it’s trusted list of Tentacle thumbprints by the time that the environment has come back up again, and this happens if a machine is disabled (or it’s SQUID, it’s unique identifier, has been set to null).

I will keep digging to see if I can figure out why this happens every now and then for you.

Thank you and kind regards,

Henrik

Hi Henrik

No probs on the delay. With our environment they are created and destroyed as needed and will be shut down outside of work hours.

For example in that environment above, It was created on a Monday and was working until a deployment was attempted on Thursday. It was shutdown each night and brought back up the following day.

Is there anyway to view what Octopus sees as it trusted thumbprints? And is it possible to manually add one back in for say testing when this occurs again?

thanks

James

Hi James,

Unfortunately there’s no way to list what thumbprints the Octopus server has as trusted unfortunately. But you could try the following command on the Tentacle

Usage: Tentacle register-with [<options>]

Where [<options>] is any of:

      --instance=VALUE       Name of the instance to use
      --server=VALUE         The Octopus server - e.g., 'http://octopus'
      --apiKey=VALUE         Your API key; you can get this from the Octopus
                               web portal
  -u, --username=VALUE       If not using API keys, your username
  -p, --password=VALUE       In not using API keys, your password
      --env, --environment=VALUE
                             The environment name to add the machine to - e.-
                               g., 'Production'
  -r, --role=VALUE           The machine role that the machine will assume -
                               e.g., 'web-server'; specify this argument
                               multiple times to add multiple roles
      --name=VALUE           Name of the machine when registered - will
                               default to the hostname
  -h, --publicHostName=VALUE An Octopus-accessible DNS name for this machine
  -f, --force                Allow overwriting of existing machines
      --comms-style=VALUE    The communication style to use - either
                               TentacleActive or TentaclePassive; the default
                               is TentaclePassive
      --server-comms-port=VALUE
                             When using active communication, the comms port
                               on the Octopus server; the default is 10943

That might allow you to get it connecting again without having to reinstall everything.

Hope that helps!
Henrik

The only way to get it to reconnect is usually is:

  1. delete it from Octopus
  2. uninstall from the tentacle
  3. remove the config folder
  4. reinstall and allow it to connect

Thinks like reinstalling the service or the above steps never seem to work

thanks

Hi Hendrik

Had another enviroment do this, so I was playing around seeing what I could do to get it going and when the step to connect to the Ocotpus server with credentials in the setup screen It spat out this error

Error: Unable to connect to the Octopus Deploy server. See the inner exception for details.
System.Exception: Unable to connect to the Octopus Deploy server. See the inner exception for details. ---> System.Net.WebException: The underlying connection was closed: Could not establish trust relationship for the SSL/TLS secure channel. ---> System.Security.Authentication.AuthenticationException: The remote certificate is invalid according to the validation procedure.
   at System.Net.Security.SslState.StartSendAuthResetSignal(ProtocolToken message, AsyncProtocolRequest asyncRequest, Exception exception)
   at System.Net.Security.SslState.StartSendBlob(Byte[] incoming, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.ProcessReceivedBlob(Byte[] buffer, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.StartReceiveBlob(Byte[] buffer, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.StartSendBlob(Byte[] incoming, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.ProcessReceivedBlob(Byte[] buffer, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.StartReceiveBlob(Byte[] buffer, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.StartSendBlob(Byte[] incoming, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.ProcessReceivedBlob(Byte[] buffer, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.StartReceiveBlob(Byte[] buffer, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.StartSendBlob(Byte[] incoming, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.ForceAuthentication(Boolean receiveFirst, Byte[] buffer, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.ProcessAuthentication(LazyAsyncResult lazyResult)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Net.TlsStream.ProcessAuthentication(LazyAsyncResult result)
   at System.Net.TlsStream.Write(Byte[] buffer, Int32 offset, Int32 size)
   at System.Net.PooledStream.Write(Byte[] buffer, Int32 offset, Int32 size)
   at System.Net.ConnectStream.WriteHeaders(Boolean async)
   --- End of inner exception stack trace ---
   at System.Net.HttpWebRequest.GetResponse()
   at Octopus.Client.OctopusClient.DispatchRequest[TResponseResource](OctopusRequest request, Boolean readResponse) in y:\work\refs\heads\release\source\Octopus.Client\OctopusClient.cs:line 445
   at Octopus.Client.OctopusClient.EstablishSession() in y:\work\refs\heads\release\source\Octopus.Client\OctopusClient.cs:line 286
   --- End of inner exception stack trace ---
   at Octopus.Client.OctopusClient.EstablishSession() in y:\work\refs\heads\release\source\Octopus.Client\OctopusClient.cs:line 308
   at System.Lazy`1.CreateValue()
   at System.Lazy`1.LazyInitValue()
   at Octopus.Client.OctopusClient.get_RootDocument() in y:\work\refs\heads\release\source\Octopus.Client\OctopusClient.cs:line 58
   at Octopus.Tools.TentacleConfiguration.SetupWizard.TentacleSetupWizardModel.VerifyCredentials(ILog logger) in y:\work\refs\heads\release\source\Octopus.Tools\TentacleConfiguration\SetupWizard\TentacleSetupWizardModel.cs:line 266

Dunno if this helps anymore

thanks

Some more logs, these are from the Octopus Server:

2015-05-05 09:21:21.4914    906  WARN  Rejecting connection: the client at 172.21.103.242:49375 provided a certificate with thumbprint xxx, which is associated with  {snip...lots of machines} , but not configured for distribution

I also see this:

Machine AWS-XXX uses the same endpoint (physical Tentacle with SQUID ) as other machines.

Thanks

Trying out using a cert per node seems to work but again disconnects happen (during deplotments even)

2015-05-08 15:12:06.2601      9 ERROR  Error posting to: https://octopus.:10943/mx/v1
Pipefish.PipefishException: The request failed: BadRequest
The incoming request was on a communication link (subscription) that is no longer valid. Reset connectivity to perform a new handshake and reestablish communication.
   at Pipefish.Transport.SecureTcp.MessageExchange.Client.ClientWorker.<>c__DisplayClassf.<PerformExchange>b__a(SecureTcpResponse response) in y:\work\3cbe05672d69a231\source\Pipefish.Transport.SecureTcp\MessageExchange\Client\ClientWorker.cs:line 345
   at Pipefish.Transport.SecureTcp.Client.SecureTcpClient.Send(SecureTcpRequest request, Action`1 response) in y:\work\3cbe05672d69a231\source\Pipefish.Transport.SecureTcp\Client\SecureTcpClient.cs:line 88
   at Pipefish.Transport.SecureTcp.MessageExchange.Client.ClientWorker.PerformExchange() in y:\work\3cbe05672d69a231\source\Pipefish.Transport.SecureTcp\MessageExchange\Client\ClientWorker.cs:line 353
   at Pipefish.Transport.SecureTcp.MessageExchange.Client.ClientWorker.Run() in y:\work\3cbe05672d69a231\source\Pipefish.Transport.SecureTcp\MessageExchange\Client\ClientWorker.cs:line 187
2015-05-08 15:12:16.7747      9 ERROR  Error posting to: https://octopus:10943/mx/v1
Pipefish.PipefishException: The remote host aborted the connection. This can happen when the remote server does not trust the certificate that we provided. ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
   at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
   --- End of inner exception stack trace ---
   at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
   at System.Net.FixedSizeReader.ReadPacket(Byte[] buffer, Int32 offset, Int32 count)
   at System.Net.Security._SslStream.StartFrameBody(Int32 readBytes, Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security._SslStream.StartFrameHeader(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security._SslStream.StartReading(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security._SslStream.ProcessRead(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslStream.Read(Byte[] buffer, Int32 offset, Int32 count)
   at System.IO.Stream.ReadByte()
   at Pipefish.Transport.SecureTcp.ProtocolParser.ReadPrelude(Stream clientStream) in y:\work\3cbe05672d69a231\source\Pipefish.Transport.SecureTcp\ProtocolParser.cs:line 106
   at Pipefish.Transport.SecureTcp.ProtocolParser.ParseResponse(SslStream responseStream, StatusCode& statusCode, String& statusText, ResponseHeaders& headers, String& protocol) in y:\work\3cbe05672d69a231\source\Pipefish.Transport.SecureTcp\ProtocolParser.cs:line 165
   at Pipefish.Transport.SecureTcp.Client.SecureTcpClient.Send(SecureTcpRequest request, Action`1 response) in y:\work\3cbe05672d69a231\source\Pipefish.Transport.SecureTcp\Client\SecureTcpClient.cs:line 85
   --- End of inner exception stack trace ---
   at Pipefish.Transport.SecureTcp.Client.SecureTcpClient.Send(SecureTcpRequest request, Action`1 response) in y:\work\3cbe05672d69a231\source\Pipefish.Transport.SecureTcp\Client\SecureTcpClient.cs:line 105
   at Pipefish.Transport.SecureTcp.MessageExchange.Client.ClientWorker.PerformExchange() in y:\work\3cbe05672d69a231\source\Pipefish.Transport.SecureTcp\MessageExchange\Client\ClientWorker.cs:line 353
   at Pipefish.Transport.SecureTcp.MessageExchange.Client.ClientWorker.Run() in y:\work\3cbe05672d69a231\source\Pipefish.Transport.SecureTcp\MessageExchange\Client\ClientWorker.cs:line 187

So my question is this:

Does Octopus work behind NAT at all?

All signs point to no for me, not behind NAT and Octopus is perfect

Is this the possible problem

Hi James,

Thanks for all the extra information you’ve sent through, I think you may be right about there being an issue with running Tentacles behind a NAT host, we do know of issues running Tentacles behind proxies so this issue may also apply to NAT.

I don’t think the GitHub issue you’ve referenced is the source of your issues unfortunately, I’ll create a new GitHub issue with all the information you’ve provided and we will investigate it.

Thank you and kind regards,

Henrik

No probs, FWIW we have reengineered our AWS environments to TentaclePassive to work around this issue

thanks

If anyone else comes to this post without having upgraded to version 3 this made things work again for me (Octopus Deploy 2.6.5.1010)

Tentacle register-with 
--server=https://octopus.example.com 
--apiKey=API-1337FOOBAR1337FOOBAR58008 
--env="Foo Bar" 
-r=foo -r bar -r example 
-f --comms-style=TentacleActive