The engineers and one of our AUS support staff were looking into this last night and it seems they have not found a valid workaround for mass updating polling tentacles.
They did mention they are bringing up this issue we have open which relates to how we tackle the tentacle certificate rotation with regards to trusting a new Octopus Server certificate in their engineering meeting so talks are ongoing. The engineer has posted a comment on that issue, I imagine they will be running more tests tonight so we will keep you updated on any progress.
I have mentioned the fact you have over 600 polling tentacles, that is definitely too many to manually update or re-install so hopefully we can get something sorted in that area.
I cannot really offer a timeframe for this, if we can get a valid workaround I will let you know but it looks like this may end up being a project for us to get something in place to bulk trust Octopus Server certs for tentacles.
Just stepping in with an update, the engineers are looking into a script solution for rotating the Octopus Server certificate for Polling Tentacles and would like to confirm a few things about your environment.
Are these Tentacles all the same OS type or is there a mixture of Linux and Windows?
Are the Tentacles Deployment Targets or Workers or both?
Is the Tentacle Service able to write to itās own configuration file and restart itās own service? E.g. Admin permissions.
Feel free to reach out with any questions of your own, looking forward to getting a solution for this!
Thanks for getting back to us with the answers to the engineers questions, I just passed those along so hopefully they can start scripting a workaround for you tonight when they come online, as always we will keep you updated with any news.
Essentially those questions effect where the Tentacle config file resides which ideally the script would cover all situations, however weāll tailor the initial version to work for your setup to get you unblocked, which hopefully shouldnāt be much longer depending on the complexity.
Thanks for confirming that and youāre patience while we get the script/s ready!
I have finished testing and am confident with the process, however I still recommend configuring a test instance to trial everything before actioning it in a production environment. Backups of the config files are created however once the certificate is rotated in the Octopus Server they will no longer work.
I have attached 4 scripts to run through the script console, the 1st will create a certificate and can be skipped if you already have a certificate to use.
The 2nd script will add the specified certificate thumbprint to the config file and restart the tentacle. The task will fail due to the service restarting but since it only restarts after modifying the config file it will have succeeded. I would suggest running a health-check afterwards to confirm if there are any issues before proceeding to the next step.
The 3rd script will perform the certificate rotation on the Octopus Server and will stop the Octopus Service taking the Server offline. This will cause downtime and cannot be reverted unless you complete the same process with the previous certificate, I suggest only running this step when you are confident that the config files are using both thumbprints. The server will not start again after shutting down and will need to be manually started once it goes offline.
Finally the 4th script removes the old thumbprint and restarts the Tentacle again. Completing another health check should be successful and use the new thumbprint!
To summarise the steps required:
(Optional) Script Console task running on Server - Create Certificate
Get cert thumbprint for variable in next step
Script Console task on Targets in Environment - Imports Thumbprint Into Tentacle Config
Add thumbprint value to variable certThumbprint
Complete HealthChecks and check config file looks ok
Script Console task running on Server - Rotates The Certificates and Restarts OctopusServer
Need to start the Octopus Server service manually afterwards
Script Console task on Targets in Environment - Removes Old Thumbprint from Tentacle Config
Complete health-checks and confirm thumbprint used
Please feel free to let me know if you have any questions or I can help with any part of the process at all!
I have run the 2nd script on a test tentacle. Health check and config file looks OK. The client logs the following exception which I assume I can ignore?
2023-02-24 10:14:49.9170 5132 8 ERROR https://[OCTOPUS_SERVER_URL]/ 8 Halibut client exception: An error occurred when sending a request to āhttps://[OCTOPUS_SERVER_URL]/ā, after the request began: The server at https://[OCTOPUS_SERVER_URL]/ presented an unexpected security certificate. We expected the server to present a certificate with the thumbprint ā[NEW_CERT]ā. Instead, it presented a certificate with a thumbprint of āOLD_CERTā and subject āCN=Octopus Portalā. This usually happens when the client has been configured to expect the server to have the wrong certificate, or when the certificate on the server has been regenerated and the client has not been updated. It may also happen if someone is performing a man-in-the-middle attack on the remote machine, or if a proxy server is intercepting requests. Please check the certificate used on the server, and verify that the client has been configured correctly. Retrying in 120,0 seconds
Just confirming what @sean.stanway mentioned. In my testing my health checks still used the previous certificate however it looks like itās picking up the new one for you. Iād be curious if this occurred on every tentacle or just occasionally. It should be fine to proceed to step 3 as long as the config file contains the new certificate and looks ok!
Hi @finnian.dempsey. Just tried running step 3. She script completes without errors, but login to the octopus portal hangs after restart of the service (Checking your credentials. Please waitā¦). Everything is fine when I roll back to the old cert.
The server logs the following exception:
2023-02-27 22:39:39.4247 7164 23 WARN An exception was thrown while trying to establish a principal for the current request
System.OperationCanceledException: The operation was canceled.
at System.Threading.CancellationToken.ThrowOperationCanceledException()
at System.Linq.AsyncIteratorBase1.GetAsyncEnumerator(CancellationToken cancellationToken) in /_/Ix.NET/Source/System.Linq.Async/System/Linq/AsyncIterator.cs:line 33 at System.Runtime.CompilerServices.ConfiguredCancelableAsyncEnumerable1.GetAsyncEnumerator()
at System.Threading.Tasks.AsyncEnumerableExt.GetConfiguredAsyncEnumerator[T](IAsyncEnumerable1 enumerable, CancellationToken cancellationToken, Boolean continueOnCapturedContext) in /_/Ix.NET/Source/System.Linq.Async/System/Threading/Tasks/AsyncEnumerableExt.cs:line 14 at System.Linq.AsyncEnumerable.TryGetFirst[TSource](IAsyncEnumerable1 source, Func2 predicate, CancellationToken cancellationToken) at System.Linq.AsyncEnumerable.<FirstOrDefaultAsync>g__Core|92_0[TSource](IAsyncEnumerable1 source, Func2 predicate, CancellationToken cancellationToken) in /_/Ix.NET/Source/System.Linq.Async/System/Linq/Operators/FirstOrDefault.cs:line 56 at Octopus.Server.Web.Infrastructure.Authentication.UserAuthenticator.AuthenticateRequest(HttpContext context, CancellationToken cancellationToken) in ./source/Octopus.Server/Web/Infrastructure/Authentication/UserAuthenticator.cs:line 50 2023-02-27 22:39:39.6754 7164 23 WARN An exception was thrown while trying to establish a principal for the current request System.OperationCanceledException: The operation was canceled. at System.Threading.CancellationToken.ThrowOperationCanceledException() at System.Linq.AsyncIteratorBase1.GetAsyncEnumerator(CancellationToken cancellationToken) in //Ix.NET/Source/System.Linq.Async/System/Linq/AsyncIterator.cs:line 33
at System.Runtime.CompilerServices.ConfiguredCancelableAsyncEnumerable1.GetAsyncEnumerator() at System.Threading.Tasks.AsyncEnumerableExt.GetConfiguredAsyncEnumerator[T](IAsyncEnumerable1 enumerable, CancellationToken cancellationToken, Boolean continueOnCapturedContext) in //Ix.NET/Source/System.Linq.Async/System/Threading/Tasks/AsyncEnumerableExt.cs:line 14
at System.Linq.AsyncEnumerable.TryGetFirst[TSource](IAsyncEnumerable1 source, Func2 predicate, CancellationToken cancellationToken)
at System.Linq.AsyncEnumerable.g__Core|92_0[TSource](IAsyncEnumerable1 source, Func2 predicate, CancellationToken cancellationToken) in //Ix.NET/Source/System.Linq.Async/System/Linq/Operators/FirstOrDefault.cs:line 56
at Octopus.Server.Web.Infrastructure.Authentication.UserAuthenticator.AuthenticateRequest(HttpContext context, CancellationToken cancellationToken) in ./source/Octopus.Server/Web/Infrastructure/Authentication/UserAuthenticator.cs:line 50
2023-02-27 22:39:39.6872 7164 23 WARN An exception was thrown while trying to establish a principal for the current request
System.OperationCanceledException: The operation was canceled.
at System.Threading.CancellationToken.ThrowOperationCanceledException()
at System.Linq.AsyncIteratorBase1.GetAsyncEnumerator(CancellationToken cancellationToken) in /_/Ix.NET/Source/System.Linq.Async/System/Linq/AsyncIterator.cs:line 33 at System.Runtime.CompilerServices.ConfiguredCancelableAsyncEnumerable1.GetAsyncEnumerator()
at System.Threading.Tasks.AsyncEnumerableExt.GetConfiguredAsyncEnumerator[T](IAsyncEnumerable1 enumerable, CancellationToken cancellationToken, Boolean continueOnCapturedContext) in /_/Ix.NET/Source/System.Linq.Async/System/Threading/Tasks/AsyncEnumerableExt.cs:line 14 at System.Linq.AsyncEnumerable.TryGetFirst[TSource](IAsyncEnumerable1 source, Func2 predicate, CancellationToken cancellationToken) at System.Linq.AsyncEnumerable.<FirstOrDefaultAsync>g__Core|92_0[TSource](IAsyncEnumerable1 source, Func2 predicate, CancellationToken cancellationToken) in /_/Ix.NET/Source/System.Linq.Async/System/Linq/Operators/FirstOrDefault.cs:line 56 at Octopus.Server.Web.Infrastructure.Authentication.UserAuthenticator.AuthenticateRequest(HttpContext context, CancellationToken cancellationToken) in ./source/Octopus.Server/Web/Infrastructure/Authentication/UserAuthenticator.cs:line 50 2023-02-27 22:39:39.6872 7164 23 WARN An exception was thrown while trying to establish a principal for the current request System.OperationCanceledException: The operation was canceled. at System.Threading.CancellationToken.ThrowOperationCanceledException() at System.Linq.AsyncIteratorBase1.GetAsyncEnumerator(CancellationToken cancellationToken) in //Ix.NET/Source/System.Linq.Async/System/Linq/AsyncIterator.cs:line 33
at System.Runtime.CompilerServices.ConfiguredCancelableAsyncEnumerable1.GetAsyncEnumerator() at System.Threading.Tasks.AsyncEnumerableExt.GetConfiguredAsyncEnumerator[T](IAsyncEnumerable1 enumerable, CancellationToken cancellationToken, Boolean continueOnCapturedContext) in //Ix.NET/Source/System.Linq.Async/System/Threading/Tasks/AsyncEnumerableExt.cs:line 14
at System.Linq.AsyncEnumerable.TryGetFirst[TSource](IAsyncEnumerable1 source, Func2 predicate, CancellationToken cancellationToken)
at System.Linq.AsyncEnumerable.g__Core|92_0[TSource](IAsyncEnumerable1 source, Func2 predicate, CancellationToken cancellationToken) in //Ix.NET/Source/System.Linq.Async/System/Linq/Operators/FirstOrDefault.cs:line 56
at Octopus.Server.Web.Infrastructure.Authentication.UserAuthenticator.AuthenticateRequest(HttpContext context, CancellationToken cancellationToken) in ./source/Octopus.Se
The X509 certificate CN=XXXX was loaded but the private key was not loaded.
Furthermore, the private key file could not be located: Unable to obtain private key file name
Could you please confirm how you are creating the certificate? Is the certificate also able to be imported to the local certificate store (cert:\LocalMachine\My) ok?
This was fixed last time by adding -Provider āMicrosoft Strong Cryptographic Providerā to the new certificate command:
Hi @finnian.dempsey The cert was created using your script. I can import the certificate to the local store without problems. I will try to create a new cert this evening using your extended command.
Thanks for trying that and uploading the new logs, confirming I have received them ok!
I have to admit this is really quite strange and not something weāve encountered when rotating the server certificate. The logs appear to suggest that your user is already authenticated, does the portal still hang when using an incognito browser session or clearing the siteās cookies? Do any other users also have this issue?
Could you please send through a HAR file of the portal hanging? That should allow us to see exactly whatās going on and which request is hanging.
Iāll keep digging into this and see if I can figure out whatās going on, Iāll keep you posted with any other suggestions!