Azure step failure - Azure credentials have not been set up or have expired


(Aleksandras) #1

Hi,

after upgrading Octopus Server from quite old v2018.6.2 version to v2018.9.17 we started to encounter recurring error executiong Azure steps:

Invoke-AzureRmResourceAction : Your Azure credentials have not been set up or have expired, please run Connect-AzureRmAccount to set up your Azure credentials.

Few facts:

  • Before upgrading from Octopus v2018.6.2 we didn’t have such issue occurring
  • It’s transient issue and retry always succeeds well
  • There are several Azure steps in same deployment, which are passing - which steps is failing, is pretty much random
  • From 8-10 daily deployments at least 1-2 are failing with this issue

It might be related to some Azure SDK/CLI updates bundled with newer Octopus version.
Or can it be something on our infrastructure level as well? Any ideas, possible workarounds?

Full log from failed step:
octopus-azure-failure-log.txt (6.3 KB)


(Mark Siedle) #3

Hi Aleksandras,

Thanks for getting in touch and for including the task log.

To help us reproduce, would it be possible for you to provide a deployment process JSON export for one of the projects that typically fails? We’ve setup a secure place you can upload this here. For example, on your project’s deployment process screen, there’s an overflow menu with “Download as JSON”

E.g.

If you’ve just noticed this occurring after upgrading (and nothing else about your project has changed), then this sounds like a bug with the new Azure CLI updates we have bundled with Octopus. We’ll attempt to reproduce this today with various app services and scripts, but an export from your project will hopefully narrow it down.

Cheers
Mark


(Mark Siedle) #4

Also, when these deployments fail, could you confirm whether there are multiple Azure-related deployments (or Azure steps) running concurrently in your Tasks list?

Cheers
Mark


(Aleksandras) #5

Thank you for your quick response!

I cannot blame Octopus new version for sure, since we saw many different transient errors related to Azure/Octopus/networking over time. However this time it approximately matches the time we upgraded our Octopus server - it’s reproducible quite often and doesn’t disappear completely for ~2 weeks already.

I uploaded project json to location you provided. At first sight there seem was no sensitive information included, but to be secure - please apply some retention policy for this file once you finish investigating it.

Regarding additional facts about failures - I don’t have many exact examples when it happened before me now, but I’m closely investigating it and collecting more data from now on.
It might be true, that there are some concurrently running Tasks or Projects at the same time. The json file I attached was from project which failed tonight (just few hours ago), the failed step wasn’t executed in parallel. However it seems there were other Azure deployment projects running at the same time.

I have a feeling that it might be occurring mainly in Azure Powershell Script steps. However it might be related to the fact that such steps are more often going last in a process (when some connection might expire), while other Azure steps like resource group deployment or webapp deployment are going at the beginning.

There is nothing non-ordinary in our Azure Powershell Scripts steps - just pure logic, without any manual manipulations with Azure context/subscription switching or something like this.


(Mark Siedle) #6

Thanks for the additional information and sending your process export through Aleksandras.

Sorry I should have mentioned, that SmartFile/upload location is only visible to Octopus staff and will be deleted when we’ve finished the analysis of your file. Your process won’t be imported anywhere and will only be read manually, so we can see what scripting is being done.

We setup a mini version of your deployment process using our subscription, but unfortunately we haven’t been able to reproduce this yet. But we do think there are some options worth exploring…

One thing that stood out slightly was the use of Save-AzureRmContext in your “Cleanup old ARM deployments” step. We initially thought this may potentially be causing some things to stomp on each other, but if it’s just saving the auth information to a file so it can be used in a separate process (via Start-Job and the Import-AzureRmContext), this shouldn’t be a problem. To rule it out though, you could temporarily disable this cleanup step and see if it has any effect.

Between those versions of Octopus you mentioned, we have upgraded the Azure PowerShell Cmdlets from 5.8 to 6.11, so we’re wondering if something here may be different all of a sudden. To test that theory, you could install your own version of the Azure PowerShell modules instead of the ones we bundle (try grabbing an older version 5.8, or even try the latest version (6.13 I believe), and configuring Octopus to use your custom version instead (see instructions here).

Another possibility (although we think unlikely), may be coming from the introduction of the Azure CLI since your upgrade. Since you don’t appear to be using the CLI anywhere in your scripts, you could try disabling this Azure CLI feature altogether by creating an OctopusDisableAzureCLI variable and setting its value to True. Full details here, then monitor to see if that change has any effect on your deployments.

Sorry we don’t have any exact answers at this stage, but hopefully we can get a reproduction to narrow things down.

Cheers
Mark


(Aleksandras) #7

Hi, Mark,

thanks for trying to reproduce it. I will try to clarify some points you raised.

Did you try to reproduce not so mini version? :slight_smile: And maybe several similar projects running at the same time? I guess it might be related to timing. Our failing step is about ~10-15minutes after deployment started (deployment starts with Azure step).

Regarding Save-AzureRmContext - you are absolutely correct, it saves context for parallelized jobs running with Start-Job. Will try to disable it, but it was working without problems until recently. So probably yes - it’s not likely as you say.

I’m afraid I won’t be able to try installing older Azure PowerShell Cmdlets. As I understand - they should be installed on the Octopus Server itself, but our Octopus instance is used in quite big company, by many teams, and I’m just a regular user there. It would be hard and time-consuming to convince responsible team to try to fix it, until this error is so unclear in transient.

I will try to disable Azure CLI with variable - this part seems not a blocker.:slight_smile:


(John Simons) #8

Hi Aleksandras,

I am sorry that you are running into this issue.
I have done a bit of research and it seems this is a known issue, see https://github.com/Azure/azure-powershell/issues/7110.
The interesting solution Microsoft provides is described in https://github.com/Azure/azure-powershell/issues/7110#issuecomment-426750910.

So I have raised an issue for us to call Disable-AzureRMContextAutosave -Scope Process before authenticating to Azure, but for now, do you want to try to execute Disable-AzureRMContextAutosave -Scope CurrentUser (you need to run this as the user that executes the Powershell scripts), and let us know if this does indeed fixes the randomness you are seeing ?

Regards
John


(Aleksandras) #9

Thanks for follow-up! Very nice you nailed it down!
Any idea how it can be related to the thing that it started to appear only recently? Seems it’s same old Azure PowerShell cmdlets…

Can you clarify a bit - where this command should be executed? In some Azure deployment step? Or in all Azure related steps? Or in the first Azure step in the deployment process?
Or just one time - on Octopus Server? Just a note - I don’t have direct access to our Octopus server, but if benefits are clear and it’s clear how to do it - I could book a ticket for our support team.


(John Simons) #10

Hi Aleksandras,

Because you are running an Azure Powershell step, I would call call Disable-AzureRMContextAutosave -Scope Process at the beginning of the script.
Does this make sense ?

Cheers
John


(Aleksandras) #11

Not only Azure Powershell steps, but also built-in Azure deployment steps (like “Deploy an Azure Resource Manager template”). In this case then we probably need to enable Custom PowerShell feature?

There is a bunch of projects we have, with a bunch of different Azure-related steps - so would take a while to add your suggested script everywhere. So just want to be sure we doing it right. :slight_smile:


(John Simons) #12

Hi Aleksandras,

In that case maybe run a one off Disable-AzureRMContextAutosave -Scope CurrentUse. This should do it permanently for the user account that you are running Octopus as.

Hope this helps

Cheers
John