We are preparing to do a SLL certificate rotation for our octopus server, and exclusively use poling tentacles. (Expiring certificates and polling tentacles are hard requirements from our customers)
While preparing for the rotation for the production server we have been attempting to test rotating on our Development server. In order to maintain constant tentacle connections we are attempting to add the new SSL thumbprint trust to existing tentacles, and otherwise use the same connection properties. This approach is working for keeping constant connectivity, however it has uncovered another issue.
As we increase the number of tentacles that are attempting to connect with the wrong thumbprint the Octopus server responsiveness is reduced. One of our development servers has approximated 300 tentacles and when most of them are updated to have the second thumbprint the Server UI is intermittently unusable. The active ‘health’ pings from the UI will report as taking up to a minute and a half. There is no option for configuring and extending the retry period for the polling Halibut connections, and our Production server has approximately 4000 tentacles. I am afraid it will become unusable entirely with that many tentacle connections bogging down the Server.
If there is any additional information I can provide please let me know, Thanks!
Thanks for getting in touch! This is a great question, and I can see why this behavior would be annoying and concerning in your case with that many Tentacles. It feels like Octopus should be better at handling this situation, but it’ll probably require some testing to try to uncover some potential bug or shortcoming that we should improve on.
Could you let us know which Octopus and Tentacle versions you’re running? If applicable, does upgrading improve this? If you have any logs that provide more details, that’ll also be helpful. Lastly to confirm, are you using the update-trust command in Tentacle.exe command line for the thumbprint change?
I look forward to hearing back!
- Octopus Server
- Tentacle Version
- Not sure what logs would be helpful for you here. When looking at the Octopus Server logs, I don’t see anything that strikes me as particularly useful. There are a LOT of Halibut logs for the failed connections, for about 6 hours worth of operation there are 30 MB worth of Halibut logs. An example of the log is pasted below, there are up to a dozen occurrences of this log entry in the same millisecond.
- Our first few test approaches has been to add the new thumbprint as a new trust entry for the same octopus server and duplicate the poll address afterwards. This was a ‘configure --trust’ command followed by a ‘server-comms’ command, then the polling endpoint data was aligned by manipulating the tentacle.config directly. This left us with 2 entries to the same octopus server, same polling endpoint, just thumbprints differing.
- We have attempted to use the health check to run an update-trust command, and this leaves us with tentacles that don’t trust the octopus server anymore which is not ideal for us at all. With 4000 tentacles, it can sometimes be a challenge to make sure we have gotten them all up and active in a shorter time period. Even without that concern, if we update-trust and rotate the certificate soon after we have all the tentacles that haven’t restarted to pickup the change slowing down the UI, and if we wait for the bulk of the tentacles to restart first those ones are slowing down the UI. Either direction we go here causes slowness.
- I am going through the process of updating our first of 4 servers to 2020.2 this week, I may be able to implement another certificate rotation by the end of the week for the test server with 300 machines.
2020-06-22 09:27:58.5755 1400 1531 ERROR listen://[::]:443/ 1531 Socket IO exception: [::ffff:192.168.220.88]:57086 System.Net.Sockets.SocketException (0x80004005): A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond at System.Net.Sockets.NetworkStream.Read(Byte buffer, Int32 offset, Int32 size)
Thanks for following and providing those details. This error you pasted seems to indicate some sort of network problem, but that line of thinking seems a bit strange here in the case you’re describing at this point. At this point I think the more logs the better, so the server and Tentacle logs, both set to
Trace level logging (as shown in this doc page, and remember to set it back to
Info when done), and even the task log from the health check task.
I’m hoping those provide more information to work with to compare and investigate. The best place to supply all these logs would be on Octopus.com while logged in under the Support tab. Alternatively you can email us at firstname.lastname@example.org.
I look forward to hearing back and getting to the bottom of this one!
Thanks, I have emailed a zip of the logs for about 10ish minutes. That I hope provides enough information as the halibut retry attempts are fairly quick.
I did happen to notice afterwards when reverting the log away from Trace for one of the tentacles that I had infact not set it to trace initially (or forgot to save). So one of the tentacle logs is not at Trace level. If I need to provide some more logs please let me know.
Thank you kindly for following up and letting me know. I can confirm I’ve received the email with the zip attachment. We can continue the conversation via email and I’ll let you know when we find something.
Kevin Orr | ScriptPro
Software Engineer | direct 913.403.5581 | fax 913.384.2180 | email@example.com
5828 Reeds Road, Mission KS 66202 | main 913.384.1008 | www.scriptpro.com
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.