We have automated Dev environments which are spun up and down during the day (shutdown over night) in AWS which are behind a NAT host so as a result we are using Polling for the Tentacle connectivity. These environments will work fine for a period of days then randomly a single environment will fail to connect and when looking into the issues I find that:
a) on the Octopus server there is no connectivity and they are listed as Offline
b) on the Tentacle I see this: 2015-04-09 09:39:35.7895 10 ERROR Error posting to: https://octopus:10943/mx/v1
Pipefish.PipefishException: The remote host aborted the connection. This can happen when the remote server does not trust the certificate that we provided. —> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. —> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
— End of inner exception stack trace —
I then have to re-install the Tentacle and reconfigure it from the start to make it work. If I reinstall the service(from the Tentacle Manager), delete the tentacle (from the server), try to reset the connection (from the server) or all 3, it will still not connect and the error logs will report the above message.
If I run the TentaclePing.exe there are no issues. It’s only after a full reinstall that it will re-connect
This is also running on 2.6.4.951 and I had this issue (abit worse) on 2.6.0.???
Sorry for not getting back to you sooner, but I’ve been investigating this issue (as well as working on new features for 3.0) and as you probably realize the issue you have encountered is difficult to replicate unfortunately.
I have a question for you, you guys don’t disable any of the machines that are in there development environments do you?
It seems like the Octopus server has removed the Tentacle’s thumbprint from it’s trusted list of Tentacle thumbprints by the time that the environment has come back up again, and this happens if a machine is disabled (or it’s SQUID, it’s unique identifier, has been set to null).
I will keep digging to see if I can figure out why this happens every now and then for you.
No probs on the delay. With our environment they are created and destroyed as needed and will be shut down outside of work hours.
For example in that environment above, It was created on a Monday and was working until a deployment was attempted on Thursday. It was shutdown each night and brought back up the following day.
Is there anyway to view what Octopus sees as it trusted thumbprints? And is it possible to manually add one back in for say testing when this occurs again?
Unfortunately there’s no way to list what thumbprints the Octopus server has as trusted unfortunately. But you could try the following command on the Tentacle
Usage: Tentacle register-with [<options>]
Where [<options>] is any of:
--instance=VALUE Name of the instance to use
--server=VALUE The Octopus server - e.g., 'http://octopus'
--apiKey=VALUE Your API key; you can get this from the Octopus
web portal
-u, --username=VALUE If not using API keys, your username
-p, --password=VALUE In not using API keys, your password
--env, --environment=VALUE
The environment name to add the machine to - e.-
g., 'Production'
-r, --role=VALUE The machine role that the machine will assume -
e.g., 'web-server'; specify this argument
multiple times to add multiple roles
--name=VALUE Name of the machine when registered - will
default to the hostname
-h, --publicHostName=VALUE An Octopus-accessible DNS name for this machine
-f, --force Allow overwriting of existing machines
--comms-style=VALUE The communication style to use - either
TentacleActive or TentaclePassive; the default
is TentaclePassive
--server-comms-port=VALUE
When using active communication, the comms port
on the Octopus server; the default is 10943
That might allow you to get it connecting again without having to reinstall everything.
Had another enviroment do this, so I was playing around seeing what I could do to get it going and when the step to connect to the Ocotpus server with credentials in the setup screen It spat out this error
Error: Unable to connect to the Octopus Deploy server. See the inner exception for details.
System.Exception: Unable to connect to the Octopus Deploy server. See the inner exception for details. ---> System.Net.WebException: The underlying connection was closed: Could not establish trust relationship for the SSL/TLS secure channel. ---> System.Security.Authentication.AuthenticationException: The remote certificate is invalid according to the validation procedure.
at System.Net.Security.SslState.StartSendAuthResetSignal(ProtocolToken message, AsyncProtocolRequest asyncRequest, Exception exception)
at System.Net.Security.SslState.StartSendBlob(Byte[] incoming, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.ProcessReceivedBlob(Byte[] buffer, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.StartReceiveBlob(Byte[] buffer, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.StartSendBlob(Byte[] incoming, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.ProcessReceivedBlob(Byte[] buffer, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.StartReceiveBlob(Byte[] buffer, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.StartSendBlob(Byte[] incoming, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.ProcessReceivedBlob(Byte[] buffer, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.StartReceiveBlob(Byte[] buffer, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.StartSendBlob(Byte[] incoming, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.ForceAuthentication(Boolean receiveFirst, Byte[] buffer, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.ProcessAuthentication(LazyAsyncResult lazyResult)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Net.TlsStream.ProcessAuthentication(LazyAsyncResult result)
at System.Net.TlsStream.Write(Byte[] buffer, Int32 offset, Int32 size)
at System.Net.PooledStream.Write(Byte[] buffer, Int32 offset, Int32 size)
at System.Net.ConnectStream.WriteHeaders(Boolean async)
--- End of inner exception stack trace ---
at System.Net.HttpWebRequest.GetResponse()
at Octopus.Client.OctopusClient.DispatchRequest[TResponseResource](OctopusRequest request, Boolean readResponse) in y:\work\refs\heads\release\source\Octopus.Client\OctopusClient.cs:line 445
at Octopus.Client.OctopusClient.EstablishSession() in y:\work\refs\heads\release\source\Octopus.Client\OctopusClient.cs:line 286
--- End of inner exception stack trace ---
at Octopus.Client.OctopusClient.EstablishSession() in y:\work\refs\heads\release\source\Octopus.Client\OctopusClient.cs:line 308
at System.Lazy`1.CreateValue()
at System.Lazy`1.LazyInitValue()
at Octopus.Client.OctopusClient.get_RootDocument() in y:\work\refs\heads\release\source\Octopus.Client\OctopusClient.cs:line 58
at Octopus.Tools.TentacleConfiguration.SetupWizard.TentacleSetupWizardModel.VerifyCredentials(ILog logger) in y:\work\refs\heads\release\source\Octopus.Tools\TentacleConfiguration\SetupWizard\TentacleSetupWizardModel.cs:line 266
Some more logs, these are from the Octopus Server:
2015-05-05 09:21:21.4914 906 WARN Rejecting connection: the client at 172.21.103.242:49375 provided a certificate with thumbprint xxx, which is associated with {snip...lots of machines} , but not configured for distribution
I also see this:
Machine AWS-XXX uses the same endpoint (physical Tentacle with SQUID ) as other machines.
Trying out using a cert per node seems to work but again disconnects happen (during deplotments even)
2015-05-08 15:12:06.2601 9 ERROR Error posting to: https://octopus.:10943/mx/v1
Pipefish.PipefishException: The request failed: BadRequest
The incoming request was on a communication link (subscription) that is no longer valid. Reset connectivity to perform a new handshake and reestablish communication.
at Pipefish.Transport.SecureTcp.MessageExchange.Client.ClientWorker.<>c__DisplayClassf.<PerformExchange>b__a(SecureTcpResponse response) in y:\work\3cbe05672d69a231\source\Pipefish.Transport.SecureTcp\MessageExchange\Client\ClientWorker.cs:line 345
at Pipefish.Transport.SecureTcp.Client.SecureTcpClient.Send(SecureTcpRequest request, Action`1 response) in y:\work\3cbe05672d69a231\source\Pipefish.Transport.SecureTcp\Client\SecureTcpClient.cs:line 88
at Pipefish.Transport.SecureTcp.MessageExchange.Client.ClientWorker.PerformExchange() in y:\work\3cbe05672d69a231\source\Pipefish.Transport.SecureTcp\MessageExchange\Client\ClientWorker.cs:line 353
at Pipefish.Transport.SecureTcp.MessageExchange.Client.ClientWorker.Run() in y:\work\3cbe05672d69a231\source\Pipefish.Transport.SecureTcp\MessageExchange\Client\ClientWorker.cs:line 187
2015-05-08 15:12:16.7747 9 ERROR Error posting to: https://octopus:10943/mx/v1
Pipefish.PipefishException: The remote host aborted the connection. This can happen when the remote server does not trust the certificate that we provided. ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
--- End of inner exception stack trace ---
at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
at System.Net.FixedSizeReader.ReadPacket(Byte[] buffer, Int32 offset, Int32 count)
at System.Net.Security._SslStream.StartFrameBody(Int32 readBytes, Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security._SslStream.StartFrameHeader(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security._SslStream.StartReading(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security._SslStream.ProcessRead(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslStream.Read(Byte[] buffer, Int32 offset, Int32 count)
at System.IO.Stream.ReadByte()
at Pipefish.Transport.SecureTcp.ProtocolParser.ReadPrelude(Stream clientStream) in y:\work\3cbe05672d69a231\source\Pipefish.Transport.SecureTcp\ProtocolParser.cs:line 106
at Pipefish.Transport.SecureTcp.ProtocolParser.ParseResponse(SslStream responseStream, StatusCode& statusCode, String& statusText, ResponseHeaders& headers, String& protocol) in y:\work\3cbe05672d69a231\source\Pipefish.Transport.SecureTcp\ProtocolParser.cs:line 165
at Pipefish.Transport.SecureTcp.Client.SecureTcpClient.Send(SecureTcpRequest request, Action`1 response) in y:\work\3cbe05672d69a231\source\Pipefish.Transport.SecureTcp\Client\SecureTcpClient.cs:line 85
--- End of inner exception stack trace ---
at Pipefish.Transport.SecureTcp.Client.SecureTcpClient.Send(SecureTcpRequest request, Action`1 response) in y:\work\3cbe05672d69a231\source\Pipefish.Transport.SecureTcp\Client\SecureTcpClient.cs:line 105
at Pipefish.Transport.SecureTcp.MessageExchange.Client.ClientWorker.PerformExchange() in y:\work\3cbe05672d69a231\source\Pipefish.Transport.SecureTcp\MessageExchange\Client\ClientWorker.cs:line 353
at Pipefish.Transport.SecureTcp.MessageExchange.Client.ClientWorker.Run() in y:\work\3cbe05672d69a231\source\Pipefish.Transport.SecureTcp\MessageExchange\Client\ClientWorker.cs:line 187
So my question is this:
Does Octopus work behind NAT at all?
All signs point to no for me, not behind NAT and Octopus is perfect
Thanks for all the extra information you’ve sent through, I think you may be right about there being an issue with running Tentacles behind a NAT host, we do know of issues running Tentacles behind proxies so this issue may also apply to NAT.
I don’t think the GitHub issue you’ve referenced is the source of your issues unfortunately, I’ll create a new GitHub issue with all the information you’ve provided and we will investigate it.