Test connectivity tasks hang after cancellation 3.8.5

Hello,

Could you please advise if we can set timeout for tentacles connectivity checks as we have those processes running for hours sometimes.

We had some long running tentaccle health checks thats tuck on performing tls handshake:
2017-02-07 08:25:26.5442 24 INFO https://servername:10933/ 24 Opening a new connection
2017-02-07 08:25:27.6524 24 INFO https://servername:10933/ 24 Performing TLS handshake

On the tntacle details page I can see following:
Current Version 0.0.0 (upgrade available)
so it was unable to establish version

after cancellation attempt the tasks just hang and not disappear form the interface like on the screenshot attached.

also worth to note is that when we do healtch check manually on the same tentacle it works and complete with seconds while when initatied by health check job it takes long time (it shows correct version as well)

I can also see errors in tentacle logs on target server:

2017-02-07 09:33:49.9606 26 INFO listen://[::]:10933/ 26 Unhandled error when handling request from client: [::ffff:10.10.2.198]:61424
System.IO.IOException: Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. —> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

I can also see following:

7-02-07 07:56:48.1010 9 ERROR Unhandled exception from web server processing GET to http://octodeploy.wintech.eu/api/dashboard: The semaphore timeout period has expired
System.Net.HttpListenerException (0x80004005): The semaphore timeout period has expired
at System.Net.HttpResponseStream.Write(Byte[] buffer, Int32 offset, Int32 size)
at System.IO.Compression.DeflateStream.PurgeBuffers(Boolean disposing)
at System.IO.Compression.DeflateStream.Dispose(Boolean disposing)
at System.IO.Stream.Close()
at System.IO.Compression.GZipStream.Dispose(Boolean disposing)
at System.IO.Stream.Close()
at Octopus.Server.Web.Infrastructure.NancyCompression.<>c__DisplayClass3_0.b__0(Stream responseStream) in Z:\buildAgent\workDir\eec88466c176b607\source\Octopus.Server\Web\Infrastructure\NancyCompression.cs:line 100
at Octopus.Server.Web.OctopusNancyHost.OutputWithDefaultTransferEncoding(Response nancyResponse, HttpListenerResponse response) in Z:\buildAgent\workDir\eec88466c176b607\source\Octopus.Server\Web\OctopusNancyHost.cs:line 356
at Octopus.Server.Web.OctopusNancyHost.ConvertNancyResponseToResponse(NancyContext nancyRequest, Response nancyResponse, HttpListenerResponse response) in Z:\buildAgent\workDir\eec88466c176b607\source\Octopus.Server\Web\OctopusNancyHost.cs:line 341
at Nancy.NancyEngineExtensions.HandleRequest(INancyEngine nancyEngine, Request request, Func2 preRequest, Action1 onComplete, Action`1 onError, CancellationToken cancellationToken)
at Octopus.Server.Web.OctopusNancyHost.Process(HttpListenerContext ctx) in Z:\buildAgent\workDir\eec88466c176b607\source\Octopus.Server\Web\OctopusNancyHost.cs:line 0

2017-02-07 07:56:53.3064 16 WARN Outstanding health checks for the machine policy Default Machine Policy were not completed before the next task was due to be scheduled. If this error persists, check the Tasks tab for any running health check tasks, and cancel them manually.

ERROR Unhandled exception from web server processing GET to http://hostname/api/dashboard: The specified network name is no longer available

ERROR Unhandled exception from web server processing GET to http://servername/api/serverstatus: The handle is invalid

ERROR Unhandled exception from web server processing GET to http://servername/api/communityactiontemplates/CommunityActionTemplates-6/logo: Object reference not set to an instance of an object.

Thanks
Raf

not_cancelling.PNG

Hello,

I raised another issue which might be related to this cancellation problem we are having

http://help.octopusdeploy.com/discussions/problems/51141-some-test-connectivity-tasks-hangs-after-adding-141-machines-in-a-row-v385

Hi Raf,

One of my colleagues is currently working on this issue which may resolve your problems or at least make it better.

We have recently made a change whereby a task waits for all sub tasks to finish before marking a task as cancelled. This has possibly caused this underlying issue to come to the surface.

I will confer with my colleague to see if your situation will be covered, or what other action we need to take on this and your other request. I will get back to you soon.

Regards,

Robert W

Thanks Robert,

Will be waiting for moreinformation this is causing a lot of problems for us at the moment as the test connectivity jobs are exhousting the node task pool and prevents new deployments from running

It happens when we add new servers to the octopus.

Is there any way to just kill those tasks while they executing ?

Thanks
Raf

Hi Raf,

Are you able to upgrade to 3.8.8 (released today)? We fixed two issues that may be relevant, 3142 and 3156. Bundled in with those changes is better timeouts and cancellation of health checks (connectivity checks).

If the issue persists, could you please also send logs from one of the failing tentacles along with the task log (Step 6 and 7).

The The semaphore timeout period has expired error is due to a API request being terminated early, usually due to a browser closing. The tentacles do not use the API, so it is unrelated.

Unfortunately there is no way to hard cancel those tasks.

Robert W

Thanks a lot Robert, I want to upgrade at the beginning next we as soon as the testign is finished, so will keep you updated.
If the problem persist I will try to get the verbose tasks logs and send them