Deployments started failing all of a sudden

Hi

We are experiencing a weird behavior with Octopus, without making any changes we’re aware of, one of the tentacles started failing. The configuration is as follows: server version 2.4.5.46, a tentacle hosted on the same machine as the server, with version 2.3.3.1369, fails, other tentacles with this 2.3.3.1369 version hosted on different machines, work fine. This tentacle appears to just hang, and I have the error log versions below. It looks like the port 10933 is becoming blocked, however I can browse to https://localhost:10933 (on the server/failing tentacle machine).

Here is the server log:

2014-05-21 08:57:54.1042 FATAL Upload of file C:\Octopus2\PackageCache\feeds-1\TravelRepublic.Users.SavedSearchesService.1.0.1057_09B2BF5D631C7B4DA3C9F7D21FFBDA56.nupkg with hash c8204cc5aa424812de37cb46cf183b19e4b3943f to SQ-POWELL-599D5B0A failed
Pipefish.Errors.PipefishCommunicationException: The actor FileReceiver-Pg-7NIBSIrD@SQ-POWELL-599D5B0A cannot handle failure Octopus.Shared.FileTransfer.SendNextChunkRequest
System.InvalidOperationException: The actor FileReceiver-Pg-7NIBSIrD@SQ-POWELL-599D5B0A cannot handle failure Octopus.Shared.FileTransfer.SendNextChunkRequest
at Pipefish.Actor.OnHandleFailedTyped[TBody](Message deliveryFailure, Message failedMessage, Error error) in c:\TeamCity\buildAgent\work\cf0b1f41263b24b9\source\Pipefish\Actor.cs:line 169
2014-05-21 08:57:54.1276 FATAL One or more items in the current operation failed.
2014-05-21 08:57:54.3416 FATAL Operation: Upload package to https://powell:10933/ failed with error: One or more items in the current operation failed.
2014-05-21 08:57:54.3553 FATAL One or more items in the current operation failed.
2014-05-21 08:57:54.6806 FATAL Upload of file C:\Octopus2\PackageCache\feeds-1\TravelRepublic.Bookings.1.0.1057_C798CEBCD93F9545A12C17CA71EE9A82.nupkg with hash c31c25ce2f584d6dee37792bff56b80b53850dcf to SQ-POWELL-599D5B0A failed
Pipefish.Errors.PipefishCommunicationException: The actor FileReceiver-RQ-7NIBlEgX@SQ-POWELL-599D5B0A cannot handle failure Octopus.Shared.FileTransfer.SendNextChunkRequest
System.InvalidOperationException: The actor FileReceiver-RQ-7NIBlEgX@SQ-POWELL-599D5B0A cannot handle failure Octopus.Shared.FileTransfer.SendNextChunkRequest
at Pipefish.Actor.OnHandleFailedTyped[TBody](Message deliveryFailure, Message failedMessage, Error error) in c:\TeamCity\buildAgent\work\cf0b1f41263b24b9\source\Pipefish\Actor.cs:line 169
2014-05-21 08:57:54.6806 FATAL One or more items in the current operation failed.
2014-05-21 08:57:54.7773 FATAL Operation: Upload package to https://powell:10933/ failed with error: One or more items in the current operation failed.
2014-05-21 09:03:49.0203 INFO Cancellation requested…
2014-05-21 09:05:19.2697 ERROR Cancellation of the task timed out
2014-05-21 09:15:46.1259 ERROR Error checking pending timeouts
System.PlatformNotSupportedException: The specified cryptographic algorithm is not supported on this platform.
at System.Security.Cryptography.AesCryptoServiceProvider…ctor()
at Octopus.Shared.Security.MasterKey.MasterKeyEncryption.WriteCiphertextTo(Byte[] masterKey, Stream stream)
at Pipefish.Persistence.Filesystem.ActorStateFile.Save(ActorStateDictionary state)
at Pipefish.PersistentActor`1.Save()
at Pipefish.WellKnown.Timing.Clock.Save()
at Pipefish.WellKnown.Timing.Clock.Check()

Here is the error in the tentacle:

2014-05-21 16:05:42.5572 ERROR Unhandled error when processing request from client
System.IO.IOException: Authentication failed because the remote party has closed the transport stream.
at System.Net.Security.SslState.StartReadFrame(Byte[] buffer, Int32 readBytes, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.StartReceiveBlob(Byte[] buffer, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.StartSendBlob(Byte[] incoming, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.ProcessReceivedBlob(Byte[] buffer, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.StartReceiveBlob(Byte[] buffer, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.ForceAuthentication(Boolean receiveFirst, Byte[] buffer, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.ProcessAuthentication(LazyAsyncResult lazyResult)
at Pipefish.Transport.SecureTcp.Server.SecureTcpServer.ExecuteRequest(TcpClient client) in c:\TeamCity\buildAgent\work\cf0b1f41263b24b9\source\Pipefish.Transport.SecureTcp\Server\SecureTcpServer.cs:line 111
2014-05-21 16:05:42.5572 ERROR Unhandled error when processing request from client
System.IO.IOException: Authentication failed because the remote party has closed the transport stream.
at System.Net.Security.SslState.StartReadFrame(Byte[] buffer, Int32 readBytes, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.StartReceiveBlob(Byte[] buffer, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.StartSendBlob(Byte[] incoming, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.ProcessReceivedBlob(Byte[] buffer, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.StartReceiveBlob(Byte[] buffer, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.ForceAuthentication(Boolean receiveFirst, Byte[] buffer, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.ProcessAuthentication(LazyAsyncResult lazyResult)
at Pipefish.Transport.SecureTcp.Server.SecureTcpServer.ExecuteRequest(TcpClient client) in c:\TeamCity\buildAgent\work\cf0b1f41263b24b9\source\Pipefish.Transport.SecureTcp\Server\SecureTcpServer.cs:line 111
2014-05-21 16:05:42.5865 ERROR Unhandled error when processing request from client
System.IO.IOException: Authentication failed because the remote party has closed the transport stream.
at System.Net.Security.SslState.StartReadFrame(Byte[] buffer, Int32 readBytes, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.StartReceiveBlob(Byte[] buffer, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.StartSendBlob(Byte[] incoming, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.ProcessReceivedBlob(Byte[] buffer, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.StartReceiveBlob(Byte[] buffer, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.ForceAuthentication(Boolean receiveFirst, Byte[] buffer, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.ProcessAuthentication(LazyAsyncResult lazyResult)
at Pipefish.Transport.SecureTcp.Server.SecureTcpServer.ExecuteRequest(TcpClient client) in c:\TeamCity\buildAgent\work\cf0b1f41263b24b9\source\Pipefish.Transport.SecureTcp\Server\SecureTcpServer.cs:line 111
2014-05-21 16:05:43.5017 ERROR Invalid request
System.Net.ProtocolViolationException: Request syntax was invalid
at Pipefish.Transport.SecureTcp.ProtocolParser.ReadPrelude(Stream clientStream) in c:\TeamCity\buildAgent\work\cf0b1f41263b24b9\source\Pipefish.Transport.SecureTcp\ProtocolParser.cs:line 112
at Pipefish.Transport.SecureTcp.ProtocolParser.ParseRequest(Stream clientStream, Method& method, Uri& uri, RequestHeaders& headers, String& protocol) in c:\TeamCity\buildAgent\work\cf0b1f41263b24b9\source\Pipefish.Transport.SecureTcp\ProtocolParser.cs:line 53
at Pipefish.Transport.SecureTcp.Server.SecureTcpServer.ApplyProtocol(AuthorizationResult authorizationResult, EndPoint clientEndPoint, String clientThumbprint, Stream clientStream) in c:\TeamCity\buildAgent\work\cf0b1f41263b24b9\source\Pipefish.Transport.SecureTcp\Server\SecureTcpServer.cs:line 140
2014-05-21 16:05:43.6160 ERROR Unhandled error when processing request from client
System.IO.IOException: Unable to write data to the transport connection: An established connection was aborted by the software in your host machine. —> System.Net.Sockets.SocketException: An established connection was aborted by the software in your host machine
at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 size)
— End of inner exception stack trace —
at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 size)
at System.Net.Security._SslStream.StartWriting(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security._SslStream.ProcessWrite(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslStream.Write(Byte[] buffer, Int32 offset, Int32 count)
at Pipefish.Transport.SecureTcp.Server.SecureTcpServer.ApplyProtocol(AuthorizationResult authorizationResult, EndPoint clientEndPoint, String clientThumbprint, Stream clientStream) in c:\TeamCity\buildAgent\work\cf0b1f41263b24b9\source\Pipefish.Transport.SecureTcp\Server\SecureTcpServer.cs:line 199
at Pipefish.Transport.SecureTcp.Server.SecureTcpServer.ExecuteRequest(TcpClient client) in c:\TeamCity\buildAgent\work\cf0b1f41263b24b9\source\Pipefish.Transport.SecureTcp\Server\SecureTcpServer.cs:line 111
2014-05-21 16:05:45.9366 ERROR Invalid request
System.Net.ProtocolViolationException: Request syntax was invalid
at Pipefish.Transport.SecureTcp.ProtocolParser.ReadPrelude(Stream clientStream) in c:\TeamCity\buildAgent\work\cf0b1f41263b24b9\source\Pipefish.Transport.SecureTcp\ProtocolParser.cs:line 112
at Pipefish.Transport.SecureTcp.ProtocolParser.ParseRequest(Stream clientStream, Method& method, Uri& uri, RequestHeaders& headers, String& protocol) in c:\TeamCity\buildAgent\work\cf0b1f41263b24b9\source\Pipefish.Transport.SecureTcp\ProtocolParser.cs:line 53
at Pipefish.Transport.SecureTcp.Server.SecureTcpServer.ApplyProtocol(AuthorizationResult authorizationResult, EndPoint clientEndPoint, String clientThumbprint, Stream clientStream) in c:\TeamCity\buildAgent\work\cf0b1f41263b24b9\source\Pipefish.Transport.SecureTcp\Server\SecureTcpServer.cs:line 140

Any help will be greatly appreciated!

New information: this started working after I changed the type of the tentacle to pooling… so does indicate like it was a port issue however I was able to browse to https://localhost:10933. Still if anyone has any ideas on what might be the route cause, please do reply.

Thank you.

Hi Christina,

Thanks for getting in touch! It sounds like the Tentacle was listening on port 10933 OK (which you verified by browsing to), but the Octopus server wasn’t able to connect to it on that port. The next step would be to remote desktop to the Octopus server, open a web browser, and try to browse to https://your-tentacle-machine-name:10933, to see if that works.

Hope that helps!

Paul

Hi

Thank you for coming back.

I did try what you said and that has worked fine. Actually both https://localhost:10933/ and the https://tentacle-machine:10933/ were the same, as the machine hosting the server is hosting the tentacle too. I have tried the URL you mentioned from both the server/tentacle box itself and from an external one (my dev machine, same network).

What was more weird and did not mention in my message, was that the problem started happening intermittently few days ago, some deployments would hang mid process, they deployed 3-4 packages then hang, and that was usually corrected by retriggering the deployment. Then after a while every deployment started having this issue. It certainly does look like that port somehow became unresponsive.

At some point I did a netstat trace and got this (powell = 10.20.0.95 = tentacle-machine = tentacle server machine):

TCP 0.0.0.0:10933 POWELL:0 LISTENING
[Tentacle.exe]
TCP 0.0.0.0:10943 POWELL:0 LISTENING
[Octopus.Server.exe]

TCP 10.20.0.95:10933 POWELL:54906 TIME_WAIT
TCP 10.20.0.95:10933 POWELL:54908 TIME_WAIT
TCP 10.20.0.95:10933 POWELL:54911 TIME_WAIT
TCP 10.20.0.95:10933 POWELL:54912 TIME_WAIT
TCP 10.20.0.95:10933 POWELL:54914 TIME_WAIT
TCP 10.20.0.95:10933 POWELL:54915 TIME_WAIT
TCP 10.20.0.95:10933 POWELL:54916 TIME_WAIT
TCP 10.20.0.95:10933 POWELL:54917 TIME_WAIT
TCP 10.20.0.95:10933 POWELL:54920 TIME_WAIT
TCP 10.20.0.95:10933 POWELL:54921 TIME_WAIT
TCP 10.20.0.95:10933 POWELL:54922 TIME_WAIT
TCP 10.20.0.95:10933 POWELL:54924 TIME_WAIT
TCP 10.20.0.95:49682 trpnetmon:8194 ESTABLISHED
[RouterNT.exe]

RouterNt is an utility called Sophos Message Router, is used by Sophos antivirus. So I thought I’d give it a try with the antivirus turned off but that did not help in any way (plus the rest of the working tentacles are hosted on machines with Sophos on them as well).

Hi Christina,

The server will listen on port 10943, not 10933, so it may be possible that the polling tentacle is pointed at itself - may be something to check.

This error in the log:

System.PlatformNotSupportedException: The specified cryptographic algorithm is not supported on this platform.
at System.Security.Cryptography.AesCryptoServiceProvider..ctor() at Octopus.Shared.Security.MasterKey.MasterKeyEncryption.WriteCiphertextTo(Byte[] masterKey, Stream stream) at ...

…can indicate either insufficient permissions granted to the Tentacle process (is it running as LocalSystem?), Group Policy issues that prevent certain algorithms from being used (FIPS-compliant mode?) or issues in the O/S configuration.

There are also minor changes between Octopus 2.3 and 2.4, so I wouldn’t rule out a version compatibility issue.

To clear this up most quickly, I’d recommend working through these steps to reinstall and upgrade the Tentacle on the affected machine.

(Please make sure you have a backup of your Octopus server, and its Master Key, before working through any config changes :))

  1. Using Tentacle Manager, delete the Tentacle instance. This will be a red ‘X’ button at the very bottom of the Tentacle Manager page.
  2. From the Control Panel’s “Add/Remove Programs” uninstall the Tentacle.
  3. Use the new Octopus 2.4.x MSI installer from https://octopusdeploy.com/download to reinstall the Tentacle and work through the setup process.

If the Tentacle installation or configuration fails, copying the complete setup log and sending it through should help us pinpoint the issue.

Hope this helps,
Nick

Hi Nicholas

I have redone the tentacle installation a few times, tried different versions too. I tried changing the installation folder, always ensured it’s running under LocalSystem etc. At some point I brought in another colleague just to have an extra set of eyes on following the installation steps. None of this helped…
And unfortunately I can’t say what changed on that machine that broke things. As far as I know there were no new program installations, no Windows updates, just some new packages added to the Octopus deployment queue a few days before the problem appeared (but this did not affect the tentacles hosted on different machines). The only way I managed to bring this particular tentacle back to life was to install it in polling mode… which brings me to a question unrelated to the topic, hope it’s alright, how do I add roles to a polling tentacle after installation? I looked this information up in the documentation wiki but I couldn’t find an answer.

Thank you

Hi Christina,

Glad you have a working configuration, sorry about the hassle.

To add roles to a polling tentacle, go to Environments and find the machine. When you click the machine you’ll see a page with a Settings tab; on it there is a box into which roles can be typed.

(Note that if you type in a new role name you may need to click the item that appears in the drop-down before it will be added to the selection.)

Hope this helps, please let me know if there’s anything else we can guide you on.

Regards,
Nick

Hi

Thank you for the reply.
About the roles for the tentacle, are you saying the only change has to be done on the server? I am still under the impression that is not all that has to be done and my question was unclear.
When I configured the tentacle in the tentacle wizard, I had to fill in the roles for it. I am talking about this: http://docs.octopusdeploy.com/display/OD/Polling+Tentacles , go to time 1:50. Now, when I add new roles in the dashboard, as per your email, how do I add these new ones in the tentacle, as it seems the roles edit box is only available when installing the tentacle? I’ve looked through some of the configuration files but could not find an enumeration of these roles. So do I have to reinstall the tentacle with the new roles every time I make a change in the roles defined on the server?

Thank you

Hi Christina,

Thanks for the update. During the polling Tentacle installation, the wizards asks you to select the roles just so that we can automatically register it in Octopus (to save you having to manually create the machine in Octopus). The roles aren’t actually stored on the Tentacle at all - only the Octopus server knows what roles a machine is in. So the process that Nick explained is all you will need to do to change the roles.

Paul

Great, I understand now, thank you for the explanation.

Hi

Just in unlikely case someone ever encounters this baffling behavior, I managed to get to the root cause for it. By a very weird sequence of events, we ended up with live clone of the machine hosting the Octopus Server in our network. This machine and the main one were both competing for the tentacles and sometimes the ‘ghost’ won. This is why the problem seemed intermittent and why the tentacle port was fine whenever I was checking it (as the 2nd machine was finished with it by then).