Upgrade from 2022.2.8136 to 2022.3.10405 breaks Config as Code

We just upgraded one of our clusters from 2022.2.8136 to 2022.3.10405. The upgrade succeeded, however, the projects are not accessible. In the UI, I am seeing this:
image

This was working before the upgrade, not sure what to look at now

Hi @lbrody,

Thank you for reaching out. I’m sorry to hear that you are having trouble with your Config as Code projects since upgrading from Octopus Deploy version 2022.2.8136 to 2022.3.10405, but I would be happy to help take a closer look at the issue.

As a first step, would you be able to upload your Octopus Server logs for review? These should be located within the C:\Octopus directory by default, or can grab these from the web UI by navigating to Configuration > Diagnostics > Download System Diagnostics Report:

Once you have this ready feel free to upload the relevant log(s) to the following secure link.

Would you also be able to upload a HAR file recording of what happens when you navigate to the page that generates the error present in your screenshot? You should be able to upload this at the same link I provided earlier, but let me know if you have any problems.

Lastly, if you haven’t already, could you try clearing the local Octopus Deploy Git cache to see if this changes the behavior you’re seeing? The button to do this should be located under Configuration > Git within the Octopus Deploy Server UI:

Sorry you are running into trouble since upgrading, but I’ll do my best to help get things sorted.

Looking forward to hearing back from you,

Britton

I have uploaded the files. Clearing the cache didn’t seem to make a change in behavior

I am also seeing this

Hi @lbrody,

Thank you for uploading the log files and HAR recording for review.

In doing some further digging on my side I came across the following Octopus Deploy Github issue, which mentions an almost identical error message to your request. Although the error message isn’t very helpful, the source of this particular issue is an expired or invalid version control personal access token (PAT). Would you be able to confirm that the PAT(s) being used for these projects are valid and have not expired, in order to rule this out as being the issue here? One thing you might try is to generate a fresh PAT and use this in one of the broken projects to see if it resolves the redirection/authentication issue.

If the above looks okay then I will most likely need to escalate your request to our engineering team for further review, but I appreciate you working with me to rule some things out first.

Best,

Britton

The PAT is not expired. I know this because it is being used by the other Octopus cluster and is working

Hi @lbrody,

Thanks for confirming that the PAT is still valid and has not expired.

I will go ahead and escalate this request with our engineering team, but could you also let me know a little bit more about how your Octopus Deploy cluster is configured from an architecture perspective? I don’t know if this aspect has anything to do with the issue you’re currently encountering, but I want to gather as much detail as I can about your particular configuration in case it does come into play (feel free to upload this information to the link I provided previously if you would prefer not to share this on the public forum).

Looking forward to hearing back from you, and I will also reach back out as soon as I have any updates.

Regards,

Britton

This is hosted in AWS. There is a load balancer with multiple EC2 instances managed by an auto-scaling group (ASG). Not sure what else would be useful to add

Hi @lbrody,

Thanks for this additional information, I’ve appended it to the thread I created with our engineering team earlier.

I will reach back out as soon as I have more information from the team, and also let me know if you come across anything new on your side in the meantime.

Regards,

Britton

I have uploaded to the secure site part of the server log file that shows the stack trace where I am seeing the error.

Hey @lbrody,

Just jumping in for Britton as he is currently off-shift as part of our US Based team, thank you for the extra bit of stack trace, I have sent that to our engineers too. Once they have something we will let you know.

Kind Regards,

Clare

Hi @lbrody,

I wanted to follow up on this thread and let you know that another user also ran into this issue and was able to work around it by supplying a password for the connection to version control (rather than a PAT). I’m not sure if you are fully blocked with where things are right now, but if so, this might be a possible workaround you could leverage until our team is able to resolve the underlying issue here.

I hope this helps, and thanks again for your patience while our team takes a deeper dive into this.

Best,

Britton

Britton,

We use an on-prem version of Microsoft Azure DevOps Server to host our git repos. That unfortunately no longer works for this communication with a userid and password. In case it comes up, the version we are using is Azure DevOps Server 2020 Update 1.2.

1 Like

Hey @lbrody,

Thanks for that info, the engineers are working on this now actually so its good to note what version of ADO you are using so they can repro this more to your environment specs.

As always we will let you know as soon as they have something for you.

Kind Regards,

Clare

Hey @lbrody,

The engineers have been working hard on this one and I just wanted to update you with some good news that they have found why this is happening. A recent update to Libgit2Sharp for 2022.3 has affected Azure DevOps on Prem in some specific use cases, which we were not aware of.

The engineers have a fix in place but it is not ready to roll out yet, once it is we will let you know and get the download out to customers ASAP.

We are sorry for the inconvenience this has caused but we are hoping to get the fix out to you soon.

If there is anything else you need in the meantime please reach out.

Kind Regards,

Clare

2 Likes

thank you for the update

1 Like

Hey @lbrody,

Some more good news for you this morning. Our engineers have fixed this for you and have created a Public GitHub Issue you are welcome to take a look at.

The fix is out for 2022.3.10528 so if you are able to upgrade to 2022.3.10530 from our downloads page the fix should be included in there.

Let us know if you manage to upgrade and it fixes this issue.

Kind Regards,

Clare

The upgrade failed. However one of the nodes seemed to eventually come up. Attached is the log file from the upgrade.
upgradefail.txt (10.0 KB)
When we upgrade, only a single node is getting upgraded here is the snippet of the script where we do the upgrade

Set-Location "C:\Program Files\Octopus Deploy\Octopus\"
.\Octopus.Server.exe database --upgrade
.\Octopus.Server.exe service --start

Directly before that snippet, we install the MSI

Eventually that instance did start the service, not sure how. The final line is The remote script failed with exit code 100

Our process is when the process fails, to restart the service on all the nodes, there are only 2 in this cluster. The 2nd node never came back up.

Hey @lbrody,

Sorry to hear you are now having issues bringing your instances back online.

In your logs I can see this errror:

System.ComponentModel.Win32Exception (1069): The service did not start due to a logon failure.

Can you just confirm what account you are running the Octopus service as, is it a domain account, if so can you make sure that account’s password doesn’t require changing or that account has been disabled for some reason.

If the Octopus service is running as Local System I am not sure what this could be as you ae not hitting the login screen for it to be the Octopus Administrator account it is throwing errors about.

Let me know if your instance is running under a domain account and we can go from there.

Kind Regards,

Clare

The service is running, and always has been, as a domain account. The password has not changed, as can be seen since the servers did eventually come up.