Acquire packages errors - connection forcibly closed

Hello,

We are getting the following type of error more frequently:

A transient machine error has been detected
                    |       An error occurred when sending a request to 'https://ss-g-app01.sslocal.com:10933/', after the request began: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host..
                    |       Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host..
                    |       An existing connection was forcibly closed by the remote host.
18:36:08   Info     |       Guidance received: Retry

We’re concerned we’re going to have a serious issue as we add more tenants that will increase the number of concurrent package downloads from the octopus server. Could this error result from the concurrent download of the same package to multiple deployment targets?

I have attached a full deployment log as an example.

Thank you!
Bob

ServerTasks-3511816.log.txt.zip (393.3 KB)

Hey Bob,

Thanks for reaching out. Sorry to hear you’re having issues with your deployment.

Normally this error stems from a firewall or antivirus closing the connection. Is there a firewall between the server and tentacle(or a local firewall on the tentacle) that could be closing the connection? Is there antivirus on the tentacle that could be closing the connection?

Please let me know.

Thanks,
Jeremy

Hi Jeremy, thanks for getting back to me. How could this be since the retry always works? Is there any other possibility?

Hey Bob,

With it working on retry most likely rules out a firewall. Antivirus can be finicky though on when it decides to do stuff depending on the actual antivirus software. Is there anything in the tentacle logs around the same time frame of the connection closing? Also, are you by chance using an HA setup?

Thanks,
Jeremy

Hi Jeremy, it just happened again and I grabbed the tentacle log. I didn’t find any anti-virus installed on the deployment target. What could cause the tentacle log error?

Note, the deployment target date/time is Pacific, but the server runs on CDT so that’s why the tentacle log times are 2 hours earlier than the deploy log shows.

The deploy log error is:
A transient machine error has been detected Error: July 16th 2020 15:12:46
An error occurred when sending a request to ‘https://10.0.26.207:10933/’, after the request began: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host…
Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host…
An existing connection was forcibly closed by the remote host.

The tentacle log error is:
2020-07-16 13:12:10.8590 1976 5 INFO listen://[::]:10933/ 5 Client at [::ffff:10.0.89.157]:60111 authenticated as B10B577DA5D96DD9B2A003D1464C7B19F93BB50F
2020-07-16 13:17:55.7463 1976 5 ERROR listen://[::]:10933/ 5 Socket IO exception: [::ffff:10.0.89.157]:60111
System.Net.Sockets.SocketException (0x80004005): A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
** at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)**
2020-07-16 13:32:54.2816 1976 8 INFO listen://[::]:10933/ 8 Accepted TCP client: [::ffff:10.0.89.157]:60530

The whole tentacle log on that machine for today is:
2020-07-16 00:29:22.4877 1976 8 INFO listen://[::]:10933/ 8 Accepted TCP client: [::ffff:10.0.89.157]:50120
2020-07-16 00:29:22.5206 1976 42 INFO listen://[::]:10933/ 42 Client at [::ffff:10.0.89.157]:50120 authenticated as B10B577DA5D96DD9B2A003D1464C7B19F93BB50F
2020-07-16 01:29:33.9292 1976 8 INFO listen://[::]:10933/ 8 Accepted TCP client: [::ffff:10.0.89.157]:50721
2020-07-16 01:29:33.9448 1976 42 INFO listen://[::]:10933/ 42 Client at [::ffff:10.0.89.157]:50721 authenticated as B10B577DA5D96DD9B2A003D1464C7B19F93BB50F
2020-07-16 02:00:33.9462 1976 8 INFO listen://[::]:10933/ 8 Accepted TCP client: [::ffff:10.0.89.157]:51023
2020-07-16 02:00:33.9774 1976 42 INFO listen://[::]:10933/ 42 Client at [::ffff:10.0.89.157]:51023 authenticated as B10B577DA5D96DD9B2A003D1464C7B19F93BB50F
2020-07-16 02:29:50.6436 1976 8 INFO listen://[::]:10933/ 8 Accepted TCP client: [::ffff:10.0.89.157]:51383
2020-07-16 02:29:50.6592 1976 42 INFO listen://[::]:10933/ 42 Client at [::ffff:10.0.89.157]:51383 authenticated as B10B577DA5D96DD9B2A003D1464C7B19F93BB50F
2020-07-16 03:30:01.9291 1976 8 INFO listen://[::]:10933/ 8 Accepted TCP client: [::ffff:10.0.89.157]:51970
2020-07-16 03:30:01.9603 1976 42 INFO listen://[::]:10933/ 42 Client at [::ffff:10.0.89.157]:51970 authenticated as B10B577DA5D96DD9B2A003D1464C7B19F93BB50F
2020-07-16 04:30:16.5333 1976 8 INFO listen://[::]:10933/ 8 Accepted TCP client: [::ffff:10.0.89.157]:52566
2020-07-16 04:30:16.5646 1976 42 INFO listen://[::]:10933/ 42 Client at [::ffff:10.0.89.157]:52566 authenticated as B10B577DA5D96DD9B2A003D1464C7B19F93BB50F
2020-07-16 05:30:30.0307 1976 8 INFO listen://[::]:10933/ 8 Accepted TCP client: [::ffff:10.0.89.157]:53166
2020-07-16 05:30:30.0620 1976 42 INFO listen://[::]:10933/ 42 Client at [::ffff:10.0.89.157]:53166 authenticated as B10B577DA5D96DD9B2A003D1464C7B19F93BB50F
2020-07-16 06:30:46.0607 1976 8 INFO listen://[::]:10933/ 8 Accepted TCP client: [::ffff:10.0.89.157]:53765
2020-07-16 06:30:46.0919 1976 42 INFO listen://[::]:10933/ 42 Client at [::ffff:10.0.89.157]:53765 authenticated as B10B577DA5D96DD9B2A003D1464C7B19F93BB50F
2020-07-16 07:31:00.7531 1976 8 INFO listen://[::]:10933/ 8 Accepted TCP client: [::ffff:10.0.89.157]:54426
2020-07-16 07:31:00.7687 1976 42 INFO listen://[::]:10933/ 42 Client at [::ffff:10.0.89.157]:54426 authenticated as B10B577DA5D96DD9B2A003D1464C7B19F93BB50F
2020-07-16 08:31:18.1932 1976 8 INFO listen://[::]:10933/ 8 Accepted TCP client: [::ffff:10.0.89.157]:55388
2020-07-16 08:31:18.2088 1976 42 INFO listen://[::]:10933/ 42 Client at [::ffff:10.0.89.157]:55388 authenticated as B10B577DA5D96DD9B2A003D1464C7B19F93BB50F
2020-07-16 09:19:18.1923 1976 8 INFO listen://[::]:10933/ 8 Accepted TCP client: [::ffff:10.0.89.157]:55956
2020-07-16 09:19:18.2079 1976 42 INFO listen://[::]:10933/ 42 Client at [::ffff:10.0.89.157]:55956 authenticated as B10B577DA5D96DD9B2A003D1464C7B19F93BB50F
2020-07-16 09:47:02.1954 1976 8 INFO listen://[::]:10933/ 8 Accepted TCP client: [::ffff:10.0.89.157]:56747
2020-07-16 09:47:02.2563 1976 42 INFO listen://[::]:10933/ 42 Client at [::ffff:10.0.89.157]:56747 authenticated as B10B577DA5D96DD9B2A003D1464C7B19F93BB50F
2020-07-16 10:31:59.6612 1976 8 INFO listen://[::]:10933/ 8 Accepted TCP client: [::ffff:10.0.89.157]:57661
2020-07-16 10:31:59.7080 1976 58 INFO listen://[::]:10933/ 58 Client at [::ffff:10.0.89.157]:57661 authenticated as B10B577DA5D96DD9B2A003D1464C7B19F93BB50F
2020-07-16 11:32:16.9528 1976 8 INFO listen://[::]:10933/ 8 Accepted TCP client: [::ffff:10.0.89.157]:58474
2020-07-16 11:32:16.9684 1976 58 INFO listen://[::]:10933/ 58 Client at [::ffff:10.0.89.157]:58474 authenticated as B10B577DA5D96DD9B2A003D1464C7B19F93BB50F
2020-07-16 11:35:05.9956 1976 8 INFO listen://[::]:10933/ 8 Accepted TCP client: [::ffff:10.0.89.157]:58548
2020-07-16 11:35:06.0580 1976 31 INFO listen://[::]:10933/ 31 Client at [::ffff:10.0.89.157]:58548 authenticated as B10B577DA5D96DD9B2A003D1464C7B19F93BB50F
2020-07-16 12:20:04.1598 1976 8 INFO listen://[::]:10933/ 8 Accepted TCP client: [::ffff:10.0.89.157]:59069
2020-07-16 12:20:04.1910 1976 58 INFO listen://[::]:10933/ 58 Client at [::ffff:10.0.89.157]:59069 authenticated as B10B577DA5D96DD9B2A003D1464C7B19F93BB50F
2020-07-16 12:32:32.5128 1976 8 INFO listen://[::]:10933/ 8 Accepted TCP client: [::ffff:10.0.89.157]:59501
2020-07-16 12:32:32.5440 1976 58 INFO listen://[::]:10933/ 58 Client at [::ffff:10.0.89.157]:59501 authenticated as B10B577DA5D96DD9B2A003D1464C7B19F93BB50F
2020-07-16 13:12:01.3274 1976 8 INFO listen://[::]:10933/ 8 Accepted TCP client: [::ffff:10.0.89.157]:60062
2020-07-16 13:12:01.3897 1976 58 INFO listen://[::]:10933/ 58 Client at [::ffff:10.0.89.157]:60062 authenticated as B10B577DA5D96DD9B2A003D1464C7B19F93BB50F
2020-07-16 13:12:10.8278 1976 8 INFO listen://[::]:10933/ 8 Accepted TCP client: [::ffff:10.0.89.157]:60111
2020-07-16 13:12:10.8590 1976 5 INFO listen://[::]:10933/ 5 Client at [::ffff:10.0.89.157]:60111 authenticated as B10B577DA5D96DD9B2A003D1464C7B19F93BB50F
2020-07-16 13:17:55.7463 1976 5 ERROR listen://[::]:10933/ 5 Socket IO exception: [::ffff:10.0.89.157]:60111
System.Net.Sockets.SocketException (0x80004005): A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
2020-07-16 13:32:54.2816 1976 8 INFO listen://[::]:10933/ 8 Accepted TCP client: [::ffff:10.0.89.157]:60530
2020-07-16 13:32:54.3128 1976 58 INFO listen://[::]:10933/ 58 Client at [::ffff:10.0.89.157]:60530 authenticated as B10B577DA5D96DD9B2A003D1464C7B19F93BB50F
2020-07-16 13:46:21.2598 1976 8 INFO listen://[::]:10933/ 8 Accepted TCP client: [::ffff:10.0.89.157]:60770
2020-07-16 13:46:21.2754 1976 58 INFO listen://[::]:10933/ 58 Client at [::ffff:10.0.89.157]:60770 authenticated as B10B577DA5D96DD9B2A003D1464C7B19F93BB50F
2020-07-16 13:55:53.8619 1976 8 INFO listen://[::]:10933/ 8 Accepted TCP client: [::ffff:10.0.89.157]:60848
2020-07-16 13:55:53.9867 1976 15 INFO listen://[::]:10933/ 15 Client at [::ffff:10.0.89.157]:60848 authenticated as B10B577DA5D96DD9B2A003D1464C7B19F93BB50F

Hey Bob,

Yeah, from the logs, it definitely looks like a network error. Would you be able speak to your network team to see if there’s anything that could be prematurely terminating the network stream? Is it always the same tentacle or is it random?

Thanks,
Jeremy

Its more random, but could be some servers and not others. Could this be timeout related? Any timeout parameters that might come into play here?

Here’s our initial theory, what do you think?

“I “think” we’re going to have an issue with the more project/tenants we add that both send and download packages from the octopus server. for example, if we’re deploying to lets say webv3api across all tenants, each tenant is going to have one of its web servers start to acquire packages. so 12+ webs could be trying to pull a package. if another deploy (v2, webapi, etc) comes along around the same time, then basically double that number. I think this is part of why we’re seeing the occasional timeouts/errors during the acquire package phase.”

Hi Bob,

We do actually have a task cap that you could manipulate to limit the amount of concurrent tasks running on your server if that’s a concern for you.

The documentation for that is here: https://octopus.com/docs/support/increase-the-octopus-server-task-cap

In addition to that, we have a setting you can lower/raise the number of targets that can acquire packages at once. The default number for this is 10. You can decide this value on a per-project basis to ensure faster speeds per transfer.

The documentation for that is here: https://octopus.com/docs/administration/managing-infrastructure/run-multiple-processes-on-a-tentacle-simultaneously

With those two settings combined you could definitely mitigate maxing out the network/server resources as your tenants and projects grow, but then you will have potentially longer waits for deployments.

As far as time-outs, I’m not aware of one Octopus has as far as a download taking too long so it kills the connection.

This seems like either the Octopus Server itself is bottlenecked on its hardware somewhere, or antivirus/firewall is killing the connection. Can you monitor the resources of the Octopus server while deploying the same project to see if it is getting maxed out? Can you speak to your network team to see if they see anything in the logs during that time that may have caused the issue? Are you guys running multiple nodes of Octopus or just one?

Thanks,
Jeremy

Thanks Jeremy! That’s all very helpful information.

To provide some information: we’re not using HA and have one Octopus server (AWS m4.2xlarge: vCPU 8, Mem 32, Mbps 1,000, net performance high). I’ve attached some metrics for the past 3 weeks.

Our current task cap is 25. Is that too high? How do we know when we need to move to HA/multiple Octopus nodes?

Also, yesterday I found that our package library feed (we use the built in one) had not been indexed for several weeks. When I indexed, there were 48 package files on the server that were missing in the feed as listed in the Octopus UI. After removing those files, I reindexed again, this time with no errors. Any thoughts on that? Should we consider a 3rd party feed? Move the feed to a different server (is that possible?)

Bob

Hey Bob,

The specs on your server should be more than enough to handle 25 concurrent tasks. You would only need to consider going to HA if you want redundancy, or the servers resources are getting maxed out and you can’t add any more to it. Your data usage spikes do look pretty large, though. What would you say the average size of one of your packages is that you’re sending out to tentacles/workers?

Which version of Octopus Server are you running?

As far as your feed issues I haven’t seen any built-in feed issues like you’re mentioning, but I can reach out and see what might have caused this and how we can prevent it in the future.

As far as your initial problem, CPU usage looks good, but the network looks a little high. It could be that the network is getting maxed out, it could be that there are some firewall rules that could be causing this. Would you be able to speak to your networking team to check the logs around the times of the failures? You could attempt to lower the number of tentacles/workers that acquire packages at any given time to reduce network traffic and see if that alleviates the issue.

Let me know what you think and I will get back to you on the feed issue.

Thanks,
Jeremy

Thanks Jeremy, again good help :slight_smile: Our DevOps team, which I’m a member of, also handles all our networking, but I’m not the expert in that area. I’ll go over what you’ve covered with the team next week. Thanks again! Bob

P.S. we’re on v2020.2.9
P.S. also, the average package file size overall is 66mb. But, the projects that typically have the acquisition error have packages that are: 175 mb, 102 mb, 86 mb.

Hey Bob,

You’re very welcome. My apologies, I did some quick mental math thinking that your “network out” was in Kb instead of bytes, so I thought it was in the order of TBs. 3GBs seems pretty reasonable with your setup.

It’s a long shot but worth checking, if theres anti-malware/anti-virus on the Octopus server it might be worth checking the logs as well as ensuring everything Octopus related is whitelisted.

Please let me know if you guys find anything in your networking logs that points to an issue. And feel free to let me know if you need us to try and dig in further.

Have a great weekend.

Thanks,
Jeremy