Octopus Server Trust Certificate Rotation

#1

Our Octopus servers use certificates for web and tentacle trust that are issued by a CA and have expiry dates that will require rotation (as opposed to self-signed 100 year certificates as suggested). It however isn’t feasible to manually update every tentacle for the new Octopus Server certificate when that time comes, so we have been attempting to automate this process.

We created a project to deploy to tentacles that will run the tentacle.exe with the command-line
‘configure --instance “Tentacle” --trust --console’
followed by a tentacle.exe call with command-line
‘server-comms --instance "Tentacle --thumbprint --style "TentacleActive --host --port --console’

Those 2 commands will create 2 new entries in the Tentacle.config file, one to get the thumbprint in as trusted (from configure --trust) and another with full thumbprint/address/communication style. The issue is that the server-comms command also creates a new subscriptionId that the Octopus Server does not recognize.

Although Octopus recommends using the 100 year self-signed certificates, I was hoping there would be a straight-forward way to add a new thumbprint to the tentacle trust for a server the tentacle is already connected to.

(Jim Burger) #4

Hi there @hardKOrr

Thanks for this question, and you’re correct in that we use self signed certificates by default, (I’ll link to this blog post to flesh out the reasoning there for others)

That said I can totally understand wanting to practice good certificate hygiene and rotating these certs on a regular basis would go a long way towards that.

You’re also not the first person to ask for this, so we recently built a command line feature for updating the trust of a tentacle.

I’ll make some updates to our documentation here to ensure that this is a more visible feature for others.

I do hope this helps you automate your certificate changeover - please do let us know if there are still issues here!

All the best,

#5

I had seen and considered the update-trust call from Tentacle, but that approach seems potentially very fragile. Tentacle update-trust will replace the current certificate thumbprint, which means that it can only be ran in a process that is immediately updating the octopus server certificate as it replaces the old thumbprint. The goal I had was to be able to trust the old and the new certificate at the same time, and after the new certificate was in place we can remove the old certificate.

Using update-trust means that the deployment to machines has to be 100% the first time, any machine left behind will not trust the octopus server once its certificate has changed… or if the certificate on the server did not rotate any machine that did receive the deployment won’t trust the server. The automated goal is to get as many machines as possible on the new certificate thumbprint as possible, while retaining trust for the older certificate, and then make the swap to the new certificate. This would reduce the manual load a great deal.

#6

Is there any additional information I can provide?
Generally speaking i need to be able to just add a new thumbprint for an existing server AND retain its SubscriptionId… since neither the client/tentacle nor the server are changing anything but the certificate.

There currently isnt a way I can use the tentacle to duplicate a trusted server connection and just update the thumbprint. I can either replace the thumbprint entirely or create a new trusted server connection with a SubscriptionId the server doesn’t recognize (causing a disconnected tentacle).

#7

So, is this issue just moot and the answer ‘just use octopus self-signed certificates to last 100 years’ ?

(Michael Noonan) #8

Hi @hardKOrr,

I apologise. Our notifications disappeared on this thread. Are you still interested in doing something here?

Our current Tentacle configuration allows a Tentacle to be in communication with multiple Octopus Servers, primarily to enable Polling Tentacles to work with an Octopus HA Cluster. However, we do not have the configuration to allow a Tentacle to trust the same Octopus Server which may present one or more X.509 Certificates, during the overlap phase as you mentioned.

We started down the path of automating this process to allow people to upgrade their Octopus Server X.509 Certificate to SHA256, but then stopped work on that as it became increasingly more complex for the value it would gain.

What you’ve suggested in your first post is the same process I would use, and where we started with the implementation. The blocker here is that Tentacle cannot be configured to trust multiple certificates, and I’m not convinced it would be the best use of our time at the moment.

Perhaps you can convince me to champion this ability on your behalf, but at the moment I don’t have enough context to understand why this matters to you more than other things we could do to improve Octopus on your behalf.

Hope that helps!
Mike

#11

I did manage to get some scripts/steps in place that allow for multiple trusted certificates to the same server. Its works just fine, except for it throws out a TON of logs for the certificate that no longer (or does not yet) match what the server provides. These logs appear on the tentacle and the server, so the server would get overloaded awfully quick with a lot of tentacles.

As far as why this matters, is mostly in that we have Octopus servers for government applications. The requirement for them is that the government CA’s must be used to create the certificates, which limits how long the certificates last. Which means as often as every year we may need to re-issue a certificate for the octopus server… without a way to automatically rotate the certificates we would need to manually update every machine connected to those government servers. This could number easily reaches thousands of tentacles.

As far as how I got this to work, I have a script that does the following :
Calls tentacle.exe configure --trust <> for the new thumbprint which was required to call the server-comms command
Calls tentacle.exe server-comms --thumbprint <> --style <> --host <> --port <> to ensure that the octopus server information is appropriate connected/duplicated for the new thumbprint
Regex replaces the new poll://<> information with the old poll://<> information. So that both new certificate and old certificate thumbprints connect on the same polling channel.

This works 100%, either certificate that is presented is accepted by the tentacle. It does however create about half a gig of logs over a weeks time on the tentacle, which I believe all (or most) is duplicated on the server (and for every tentacle) because there is an error when matching the wrong cert.

I will paste the service startup of a tentacle configured for both servers at the bottom. In it you will notice it has the same poll:// for both certificates (it has 2 new cert entries due to needing to trust the thumbprint before being able to server-comms it… leaving a trusted row in the tentacle.config with no information other than the cert thumbprint). It will try to connect to the server with the first/old certificate and throw an INFO level error for this connection every 5 seconds. However the last line shows a successful connection to the same server with the correct/updated certificate. I haven’t looked at the tentacle code for handling any of this, but since this does already work it makes me hopeful that the solution would be to have the tentacle aggregate its information in a way to accept either certificate on the same server & poll://<> connection.

If you’d like some more information, logs, details, ideas, etc please reach out I would be happy to assist.

2019-07-11 10:15:28.3171   4952      7  INFO  ==== RunAgentCommand ====
2019-07-11 10:15:28.3171   4952      7  INFO  CommandLine: C:\Program Files\Octopus Deploy\Tentacle\Tentacle.exe run --instance=Tentacle
2019-07-11 10:15:29.2952   4952      7  INFO  Agent will trust Octopus Servers with the thumbprint: <OLDCERTIFICATE>
2019-07-11 10:15:29.3372   4952      7  INFO  Agent will trust Octopus Servers with the thumbprint: <NEWCERTIFICATE>
2019-07-11 10:15:29.3372   4952      7  INFO  Agent will trust Octopus Servers with the thumbprint: <NEWCERTIFICATE>
2019-07-11 10:15:29.3372   4952      7  INFO  Agent will poll Octopus Server at <OCTOPUSSERVERURL> for subscription poll://k8n9nsk7ctjmkmvedm5k/ expecting thumbprint <OLDCERTIFICATE>
2019-07-11 10:15:29.6512   4952      7  INFO  Agent configured to use the system proxy, but no system proxy is configured for <OCTOPUSSERVERURL>
2019-07-11 10:15:29.7382   4952      7  INFO  Agent will poll Octopus Server at <OCTOPUSSERVERURL> for subscription poll://k8n9nsk7ctjmkmvedm5k/ expecting thumbprint <NEWCERTIFICATE>
2019-07-11 10:15:30.0153   4952      8  INFO  <OCTOPUSSERVERURL>    8  Opening a new connection
2019-07-11 10:15:30.1083   4952      7  INFO  Agent configured to use the system proxy, but no system proxy is configured for <OCTOPUSSERVERURL>
2019-07-11 10:15:30.1113   4952      9  INFO  <OCTOPUSSERVERURL>    9  Opening a new connection
2019-07-11 10:15:30.1113   4952      7  INFO  Agent will not listen on any TCP ports
2019-07-11 10:15:30.1113   4952      7  INFO  The Windows Service has started
2019-07-11 10:15:32.9746   4952      8  INFO  <OCTOPUSSERVERURL>    8  Authentication failed while setting up connection to <OCTOPUSSERVERURL>
The server at <OCTOPUSSERVERURL> presented an unexpected security certificate. We expected the server to present a certificate with the thumbprint '<OLDCERTIFICATE>'. Instead, it presented a certificate with a thumbprint of '<NEWCERTIFICATE>' and subject '<CERTDATA>'. This usually happens when the client has been configured to expect the server to have the wrong certificate, or when the certificate on the server has been regenerated and the client has not been updated. It may also happen if someone is performing a man-in-the-middle attack on the remote machine, or if a proxy server is intercepting requests. Please check the certificate used on the server, and verify that the client has been configured correctly.
2019-07-11 10:15:32.9826   4952      8  INFO  <OCTOPUSSERVERURL>    8  Exception in the polling loop, sleeping for 5 seconds. This may be cause by a network error and usually rectifies itself. Disregard this message unless you are having communication problems.
Halibut.HalibutClientException: An error occurred when sending a request to '<OCTOPUSSERVERURL>', after the request began: The server at <OCTOPUSSERVERURL> presented an unexpected security certificate. We expected the server to present a certificate with the thumbprint '<OLDCERTIFICATE>'. Instead, it presented a certificate with a thumbprint of '<NEWCERTIFICATE>' and subject '<CERTDATA>'. This usually happens when the client has been configured to expect the server to have the wrong certificate, or when the certificate on the server has been regenerated and the client has not been updated. It may also happen if someone is performing a man-in-the-middle attack on the remote machine, or if a proxy server is intercepting requests. Please check the certificate used on the server, and verify that the client has been configured correctly. ---> Halibut.Transport.UnexpectedCertificateException: The server at <OCTOPUSSERVERURL> presented an unexpected security certificate. We expected the server to present a certificate with the thumbprint '<OLDCERTIFICATE>'. Instead, it presented a certificate with a thumbprint of '<NEWCERTIFICATE>' and subject '<CERTDATA>'. This usually happens when the client has been configured to expect the server to have the wrong certificate, or when the certificate on the server has been regenerated and the client has not been updated. It may also happen if someone is performing a man-in-the-middle attack on the remote machine, or if a proxy server is intercepting requests. Please check the certificate used on the server, and verify that the client has been configured correctly.
   at Halibut.Transport.ClientCertificateValidator.Validate(Object sender, X509Certificate certificate, X509Chain chain, SslPolicyErrors sslpolicyerrors) in Z:\buildAgent\workDir\fe2b45bbd4978f75\source\Halibut\Transport\ClientCertificateValidator.cs:line 26
   at System.Net.Security.SecureChannel.VerifyRemoteCertificate(RemoteCertValidationCallback remoteCertValidationCallback, ProtocolToken& alertToken)
   at System.Net.Security.SslState.CompleteHandshake(ProtocolToken& alertToken)
   at System.Net.Security.SslState.CheckCompletionBeforeNextReceive(ProtocolToken message, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.ProcessReceivedBlob(Byte[] buffer, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.StartReceiveBlob(Byte[] buffer, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.CheckCompletionBeforeNextReceive(ProtocolToken message, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.ProcessReceivedBlob(Byte[] buffer, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.StartReceiveBlob(Byte[] buffer, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.CheckCompletionBeforeNextReceive(ProtocolToken message, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.ProcessReceivedBlob(Byte[] buffer, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.StartReceiveBlob(Byte[] buffer, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.CheckCompletionBeforeNextReceive(ProtocolToken message, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.ForceAuthentication(Boolean receiveFirst, Byte[] buffer, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.ProcessAuthentication(LazyAsyncResult lazyResult)
   at Halibut.Transport.SecureClient.EstablishNewConnection() in Z:\buildAgent\workDir\fe2b45bbd4978f75\source\Halibut\Transport\SecureClient.cs:line 152
   at Halibut.Transport.SecureClient.ExecuteTransaction(Action`1 protocolHandler) in Z:\buildAgent\workDir\fe2b45bbd4978f75\source\Halibut\Transport\SecureClient.cs:line 65
   --- End of inner exception stack trace ---
   at Halibut.Transport.SecureClient.HandleError(Exception lastError, Boolean retryAllowed) in Z:\buildAgent\workDir\fe2b45bbd4978f75\source\Halibut\Transport\SecureClient.cs:line 207
   at Halibut.Transport.PollingClient.ExecutePollingLoop(Object ignored) in Z:\buildAgent\workDir\fe2b45bbd4978f75\source\Halibut\Transport\PollingClient.cs:line 47
2019-07-11 10:15:33.0476   4952      9  INFO  <OCTOPUSSERVERURL>    9  Secure connection established. Server at [::ffff:172.16.11.54]:443 identified by thumbprint: <NEWCERTIFICATE>, using protocol Tls12
(Michael Noonan) #12

Hi @hardKOrr,

Thanks for getting back to me! I’m glad you found a workaround in the meantime.

So I can continue to add context to the problem: do you exclusively use Polling Tentacles? Or do you have a mixture of Polling and Listening Tentacles in your fleet?

If you had a set of purpose-built commands like these, would it have helped make your own scripting simpler and more discoverable?

tentacle.exe begin-trust-rotation --currentThumbprint <CURRENT> --newThumbprint <NEW>
--- Update Octopus Server Certificate
tentacle.exe complete-trust-rotation --currentThumbprint <CURRENT> --newThumbprint <NEW>

Do you have other suggestions?

Thanks!
Mike

#13

We do exclusively use Polling Tentacles, which at least simplified the scripting process for our purpose.

Having explicit commands like those you have listed would have saved quite a good amount of effort on our end. We started and stayed with using tentacle.exe commands to accomplish the rotation as much as possible, and only broke out into the regex replacement I mentioned when it was clear we couldn’t accomplish what we wanted with built-in commands. We used powershell to do the scripting, and fell into multiple various barriers there; ConvertTo-Json isn’t in Powershell 2.0, its workarounds are inconsistent at best, it was Json within an Xml file to begin with, since we can’t ConvertTo-Json we have to be extra careful with regex replacements to not knock out a potential second server connected tentacle, then all the testing on top of that. The process was arduous at best, and near ugly at worst, and we haven’t made the project yet to complete the rotation (which we will need before production server certificates rotate, as the large bandwidth from logging and disk I/O from it appears to be very taxing).

Commands like the ones you listed above would have made the project a small fraction of the work, 2 tentacle calls + 1 regex w/ logic becomes a single tentacle command.

PS: I’m not always the best on naming conventions, but may I suggest not using “complete-trust-rotation”. Complete synonyms are things like “entire”, “execute”, “accomplish”, “fulfill”, which may be misread as the call being a 1 step process.

#14

As another passing note here, from looking at the SHA256 issue linked, it would definitely be preferable for the Octopus Server to handle certificate rotation “in the background” as it seems that linked issue suggests. Even with tentacle commands to execute we would need to have a project for each ‘begin’ and ‘complete’ calls.

However, that capability in the tentacle would make the process using octopus projects much less painful.

(Michael Noonan) #15

Hi @hardKOrr,

Thanks for alll that information. I’m championing this through our backlog to see if we can turn it into a “small fraction of the work” like you suggested.

I’ll do my best to keep you in the loop.

Thanks!
Mike

#16

Thanks, I really appreciate it.

I had noticed on the Octopus Deploy Product Roadmap there is an upcoming ‘Ops Processes’ which seems like a very appropriate location for this exact type of problem and solution. It didn’t really seem to make sense to have to have deployments control the certificate rotation since the rotation is really a process and not a ‘product’ that would be delivered.