We recently updated our Octopus Server to version 3.17.2 from 3.3.19 and we are running into an issue upgrading the tentacles that I am wondering if anyone else has run into.
After completing the server upgrade we logged into the application and found (as expected) that the tentacles needed to be upgraded. We proceeded by clicking the [Upgrade these 9 machines] button in our development environment which immediately began the upgrade on 4 of the 9 machines in that environment. That process appeared to hang waiting for the tentacles to restart. It was the end of the day so we let it run overnight to see if it would complete normally on its own given enough time. Upon coming into the office in the morning we found that they were still waiting (It had been running for over 16 hours at that point). We logged into the machines in question and verified the services were indeed running and restarted the server service to see if it would pick up the fact that they had indeed restarted. It did immediately recognize them as restarted and began processing the next four, of which two more hung in the same state. After some digging and looking into the tentacle logs we found what appear to be an error in communication between the server and the tentacle:
2017-09-27 10:06:25.5753 3200 6 INFO The Windows Service has started
2017-09-27 10:06:25.6534 3200 3 INFO listen://[::]:99999/ 3 Accepted TCP client: [::ffff:172.1.1.1]:55160
2017-09-27 10:06:25.6690 3200 3 INFO listen://[::]:99999/ 3 Performing TLS server handshake
2017-09-27 10:06:47.2779 3200 3 INFO listen://[::]:99999/ 3 Unhandled error when handling request from client: [::ffff:172.1.1.1]:55160
System.IO.IOException: Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. —> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
at System.Net.Sockets.NetworkStream.EndRead(IAsyncResult asyncResult)
— End of inner exception stack trace —
at System.Net.Security.SslState.EndProcessAuthentication(IAsyncResult result)
at System.Threading.Tasks.TaskFactory1.FromAsyncCoreLogic(IAsyncResult iar, Func
2 endFunction, Action1 endAction, Task
1 promise, Boolean requiresSynchronization)
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Halibut.Transport.SecureListener.d__15.MoveNext()
(see attached log for more details/Note: port and IP address details have been changed).
We proceeded to follow the same procedure we did earlier (restart the server service) and everything came back online again.
Noting that the servers in our development environment are older machines and hoping that was maybe the culprit we proceeded with running the tentacle updates in our test environment (which is all shiny new Windows 2016 servers) and one of those hung the same way as well.
This appears to be a network communication issue, but once again I am curious if anyone has seen anything like this before and what they may have found to fix/work around the issue.
Thanks!
TentacleLog.log (4 KB)