All of our tentacles (10,000+) are connected remotely via VPN as polling tentacles. Normally this is fine.
Occasionally, the VPN tunnel will reset in the middle of a deployment. When this happens, the Tentacle connection to the Octopus Server is disconnected. The VPN tunnel re-establishes relatively quickly (seconds).
If this happens during a deployment, the deployment fails and we have to manually re-start it. In some cases that would be fine, but a lot of our deployments are scheduled in the middle of the night.
When this happens, the Tentacle logs the disconnect. The server seems to either fail immediately with an System IO error or it times out after 1 minute, saying the polling tentacle didn’t collect the request.
I can generally re-create this issue with the use of a Windows Firewall rule on the tentacle that blocks outbound traffic to the Octopus server on the Tentacle communication port.
In the cases where the command times out on the server, if the tentacle reconnects before the timeout (1 minute), the task resumes. However, this is hit or miss due to the timing of the retry logic on the Tentacle. It seems to take anywhere from 30 second to 120 seconds for the Tentacle to retry the first time. A significant part of that window is outside of the 1 minute server side timeout. If we could do something to get a quicker/more frequent retry, that, I think, would help.
For the server side System IO error, I would like input as to what options there might be to address it.
Thanks
Server Log when it fails after timeout
The step failed: Activity <StepName> on <TentacleName> failed with error 'A request was sent to a polling endpoint, but the polling endpoint did not collect the request within the allowed time (00:01:00), so the request timed out. Server exception: System.TimeoutException: A request was sent to a polling endpoint, but the polling endpoint did not collect the request within the allowed time (00:01:00), so the request timed out.'.
Server Log when it fails immediately
The step failed: Activity <StepName> on <TentacleName> failed with error 'Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.. Server exception: System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.. ---> System.Net.Sockets.SocketException (10054): An existing connection was forcibly closed by the remote host. at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size) --- End of inner exception stack trace --- at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size) at System.Net.Security.SslStream.FillBufferAsync[TReadAdapter](TReadAdapter adapter, Int32 minSize) at System.Net.Security.SslStream.ReadAsyncInternal[TReadAdapter](TReadAdapter adapter, Memory`1 buffer) at System.Net.Security.SslStream.Read(Byte[] buffer, Int32 offset, Int32 count) at System.IO.Compression.DeflateStream.ReadCore(Span`1 buffer) at System.IO.Compression.DeflateStream.Read(Byte[] array, Int32 offset, Int32 count) at System.IO.BinaryReader.InternalRead(Int32 numBytes) at System.IO.BinaryReader.ReadInt32() at Newtonsoft.Json.Bson.BsonDataReader.ReadNormal() at Newtonsoft.Json.Bson.BsonDataReader.Read() at Newtonsoft.Json.JsonReader.ReadForType(JsonContract contract, Boolean hasConverter) at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.Deserialize(JsonReader reader, Type objectType, Boolean checkAdditionalContent) at Newtonsoft.Json.JsonSerializer.DeserializeInternal(JsonReader reader, Type objectType) at Newtonsoft.Json.JsonSerializer.Deserialize[T](JsonReader reader) at Halibut.Transport.Protocol.MessageExchangeStream.ReadBsonMessage[T]() at Halibut.Transport.Protocol.MessageExchangeStream.Receive[T]() at Halibut.Transport.Protocol.MessageExchangeProtocol.ProcessReceiverInternalAsync(IPendingRequestQueue pendingRequests, RequestMessage nextRequest)'.
Tentacle Log in either scenario
2022-07-12 16:14:19.7333 4248 7 INFO The Windows Service has started
2022-07-12 16:14:19.8740 4248 8 INFO https://<servername>/ 8 Secure connection established. Server at [::ffff:<IPAddress>]:443 identified by thumbprint: AE207588254D18D02BFE082FA8A16F8CD2E087B2, using protocol Tls12
2022-07-12 16:14:43.8998 4248 8 ERROR https://<servername>/ 8 Unexpected exception executing transaction.
System.IO.IOException: Unable to read data from the transport connection: An established connection was aborted by the software in your host machine. ---> System.Net.Sockets.SocketException: An established connection was aborted by the software in your host machine
at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
--- End of inner exception stack trace ---
at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
at System.Net.FixedSizeReader.ReadPacket(Byte[] buffer, Int32 offset, Int32 count)
at System.Net.Security._SslStream.StartFrameHeader(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security._SslStream.StartReading(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security._SslStream.ProcessRead(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslStream.Read(Byte[] buffer, Int32 offset, Int32 count)
at System.IO.StreamReader.ReadBuffer()
at System.IO.StreamReader.ReadLine()
at Halibut.Transport.Protocol.MessageExchangeStream.ReadLine()
at Halibut.Transport.Protocol.MessageExchangeStream.ExpectProceeed()
at Halibut.Transport.Protocol.MessageExchangeProtocol.ExchangeAsSubscriber(Uri subscriptionId, Func`2 incomingRequestProcessor, Int32 maxAttempts)
at Halibut.Transport.SecureClient.ExecuteTransaction(ExchangeAction protocolHandler, CancellationToken cancellationToken)
2022-07-12 16:14:43.8998 4248 8 ERROR https://<servername>/ 8 Halibut client exception: The server https://<servername>/ aborted the connection before it was fully established. This usually means that the server rejected the certificate that we provided. We provided a certificate with a thumbprint of '79875A7EFF305B142274E19F6095E79749458469'. Retrying in 38.0 seconds
Halibut.HalibutClientException: The server https://<servername>/ aborted the connection before it was fully established. This usually means that the server rejected the certificate that we provided. We provided a certificate with a thumbprint of '79875A7EFF305B142274E19F6095E79749458469'.
at Halibut.Transport.SecureClient.HandleError(Exception lastError, Boolean retryAllowed)
at Halibut.Transport.SecureClient.ExecuteTransaction(ExchangeAction protocolHandler, CancellationToken cancellationToken)
at Halibut.Transport.PollingClient.ExecutePollingLoop(Object ignored)