Check System Integrity failure for SSL certificate

I discovered an issue when trying to delete an unused environment. I needed to first delete an environment binding for a certificate, but this wasn’t allowed because of an unexpected value.

Digging a little deeper, i was able to find the integrity checks that were failing for the same reasons.

The RelatedDocumentId 'certificates' for Event Id 'Events-17215' is not in the expected format.
The RelatedDocumentId 'certificates' for Event Id 'Events-17216' is not in the expected format.
The RelatedDocumentId 'certificates' for Event Id 'Events-17218' is not in the expected format. 

Digging even deeper into the SQL database itself. I can see that Event.RelatedDocumentIds has a reference to a bunch of IDs, including one called certificates. I also looked in the EventRelatedDocument and see that there is an entry in there for “certificates” in the RelatedDocumentId column.

I had a look on the following Github issue, but the proposed SQL query fix doesn’t appear to be appropriate for my issue.

We’re currently running version v2020.6 (Build 4915).

Is there anything we can do to resolve this? As it’s stopping us tidying up some old applications, certificates and environments.

Hi @amadiere,

Thanks for reaching out.

Can we please take a look at those events with a SQL query?

SELECT * FROM dbo.Event WHERE Id = 'Events-17215' or Id = 'Events-17216' or Id = 'Events-17218'

Then save the results as a CSV and send them to me in a private message?

Please let me know if thats something you’re allowed to do. I can also give you a link to directly upload them to our support files site.

Best,
Jeremy

Thanks for the reply. I’ve messaged you privately with the .csv as requested. But in the interest of openness for anyone else that comes into this issue, this is a redacted version.

Note: JSONBlob and ChangeDetails are encrypted fields and both have values and I chose not to include them on this screenshot. If they turn out useful, I can decrypt and redact parts of them.

Hey Alex,

Thanks for sending that through.

In column 2 you can see where there are Environment IDs as well as Certificate IDs. For some reason, the certificates don’t have IDs like they should. I’m not sure how they got in this state, but I think that is the root of the issue. I believe we have two paths we could go down.

The better option would be you could find the correct certificate ID that should be associated with those events and inject and remediate that column for those event IDs.

Alternatively, if you can’t find which certificate it should be, you could delete “certificates” from the column for those rows.

In either case, I would take a database backup before making any changes and confirm everything works after the change. We only like to directly manipulate the SQL database when it’s absolutely necessary as it can have unintended consequences. This change should likely not cause any issues but I would test it just in case after the change you decide to make.

Please let me know how it goes and if you have any questions.

Best,
Jeremy

I’ll be able to delete the certificates (and one pipe) from that columns I think - that shouldn’t be too bad.

However, the reference appears to be correct.
There is a certificate called certificates in the Certificates table (see below, but pardon all the green redaction). Row 1 appears to be correct.

So given that, unless I misunderstand your points - I don’t think I can solve it the proper way?
Would you prefer I just remove it as your alternative proposal, or would you prefer another experimental change?

I went ahead to try and update this to prove the fix was as expected. I ran the following query:

UPDATE dbo.Event SET RelatedDocumentIds = '|Environments-1|Environments-21|Environments-101|Environments-22|' 
WHERE Id = 'Events-17215' or Id = 'Events-17216' or Id = 'Events-17218'

However, this still has the same error. I tried restarting the Octopus Server itself (to brute force removing any cache if there was some).

I think the issue might be because it still exists in the EventRelatedDocument table.

SELECT TOP (1000) [Id]
      ,[RelatedDocumentId]
      ,[EventId]
  FROM [Octopus].[dbo].[EventRelatedDocument]
  WHERE (EventId = 'Events-17215' or EventId = 'Events-17216' or EventId = 'Events-17218')
  AND RelatedDocumentId = 'certificates'

image

Should these records be removed as well?
Additionally, I can revert the change I’ve already made if required and you want to test another scenario.

Hi Alex,

We were operating under the assumption that “certificates” was an invalid Id as that is usually the cause of these sorts of errors. Do you know if that certificate exists and is valid? If so, I think our best bet here is to remove the scoping of that certificate from the environment. To do that please go to Library->Certificates and find that certificate, and then go to environments and remove the environment. Once that is done please try to delete the environment again.

Please let me know how it goes.

Best,
Jeremy

Hi - I tried this approach early on in my own tests with no luck unfortunately.
When I try to just remove the scope of the specific environment, then click save, the following error is received:

With regards to if I know if the certificate exists and “is valid”. It exists in Octopus and is valid, but has expired.

Hi Alex,

Thanks for the information.

I think our next best bet here is to manually delete the environment mapping from that certificate row in the dbo.Certificate table. As always with manual SQL database manipulation please take a backup and test everything afterward.

Please let me know how it goes.

Best,
Jeremy

1 Like

We’ll make this change in the morning (it’s too late for me to change that at this point). I’ll report back with my findings.

This will not fix the integrity check from the initial post, I’m assuming?
Is it worth removing the records that are tied to that event in the [EventRelatedDocument] table and solve that problem as well? (to ensure that the integrity check task works going forward).

Hey Alex,

Let me reach out to our developers and ask if that’s the best/safest route to remediate your integrity check.

I’ll have someone reach out to you this evening as they are based in Australia.

Please feel free to reach out in the meantime.

Best,
Jeremy

Hi Alex,

Jeremy asked me to get back to you directly.

Interesting, I’m guessing that at some stage we allowed a blank certificate name or there was a bug in migration or something along those lines (did you by any chance used to use version 2.0 or earlier?). Based on the certificate expiration date it’s been there a while.

I would:

  • Delete from EventRelatedDocument where the RelatedDocumentId is certificate
  • Delete the certificate directly from the table

This is what a normal certificate delete would do.

Rob

Ok, thanks Robert (and Jeremy).
I’ll script up the deletion of the EventRelatedDocument rows and once I’ve confirmed the backup, I’ll run it in and see how it goes.

I’m not sure I want to delete the actual certificate though? I’m OK removing the specific environment from it, or even renaming the primary key from certificates to something unique, but with that as a prefix. I’m assuming that will have the same effect?

While I can’t remember the versions we were on, we’ve certainly been on Octopus for a while and pre-date the current year-based naming convention. It looks like the certificate was originally called * - so while not empty, it is not alpha-numeric either. Subsequent renames made it a better name.

I ran in two scripts to change things. The first to remove the EventRelatedDocument rows.

DELETE 
FROM [Octopus].[dbo].[EventRelatedDocument]
WHERE EventId IN ('Events-17215', 'Events-17216', 'Events-17218')
  AND RelatedDocumentId = 'certificates'

The second to rename, rather than delete, the certificate’s ID (as well as removing the specific environment).

UPDATE [Octopus].[dbo].[Certificate] 
SET [EnvironmentIds] = '|Environments-1|Environments-21|Environments-101|' ,
	[Id] = 'certificates-domainname-tld'
WHERE Id IN ('certificates', 'certificates-domainname-tld')

(note: certificates-domainname-tld wasn’t the actual name used, but that is the format where “*.domainname.tld” was the certificate URL - just to anonymise).

Additionally, this only worked because I confirmed before running that there were no certificates with that very specific [Id] value (additionally, there were no more certificates for that domain name either).

That appears to have solved my issue.
I now have a passing integrity check:
image

I’m also able to save and change the certificate and remove/add other environments - so that appears to have resolved the issue across the board.

Thank you very much for your patience and help resolving this issue !

Hey Alex,

You’re very welcome!

I dont think it should cause any issues for you to rename the certificate to a unique ID. I’m assuming you found a Certificates-### that wasn’t in use and chose that?

If you do end up running into any certificate issues that may seem related, please refer to this ticket in that new ticket so that they have the context there.

Please let me know if you have any other questions or concerns, but if I don’t hear from you I hope you have a great rest of your week.

Best,
Jeremy

It didn’t appear to cause an issue - but if I find anything I’ll update this ticket with a note and potentially create start a new thread for whatever that caused. :slight_smile:

And to confirm, I did choose a certificate- suffix that I knew wasn’t already used (verified by browsing the Certificate table) and just performing the update above.

I’ve been able to do everything I was originally trying to do when I found this issue - so your help has definitely resolved my issue and this ticket can be closed. Thank you again and enjoy your week too!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.