Slow execution of AWS CLI Script process step

Hi,

I recently added a step to my deployment process to execute some AWS CLI commands.
The issue I’m having is that it takes an extremely long time for it to start executing. Once the process step starts executing the actual powershell/aws cli commands it takes a matter of seconds but it spends about 40 seconds just waiting around before starting to execute. This is what the log looks like:

                |       Success: Rollback Detection (on the Octopus Server)

13:38:38 Verbose | Octopus Server version: 2019.5.4+Branch.tags-2019.5.4.Sha.5d1025504b9ea2bf1903ea327f1dfa88d291f709
13:38:38 Verbose | Environment Information:
| OperatingSystem: Microsoft Windows 10.0.14393
| OsBitVersion: x64
| Is64BitProcess: True
| CurrentUser: NT AUTHORITY\SYSTEM
| MachineName: Redacted
| ProcessorCount: 2
| CurrentDirectory: Redacted
| TempDirectory: Redacted
| HostProcessName: Octopus.Server
| PID: Redacted
13:38:38 Verbose | Executing Rollback Detection (type Run an AWS CLI Script) on Octopus Server
13:38:38 Verbose | Using Calamari.Cloud 4.17.6
13:38:38 Verbose | Using Octopus.Dependencies.AWSPS 3.3.390
13:38:38 Verbose | Using Octopus.Dependencies.AWSCLI32 1.16.52
13:38:38 Verbose | Running this script in the Octopus Server security context (NT AUTHORITY\SYSTEM)
13:38:38 Verbose | Starting C:\Windows\system32\WindowsPowershell\v1.0\PowerShell.exe in working directory ‘Redacted’ using ‘OEM United States’ encoding running as ‘NT AUTHORITY\SYSTEM’ with the same environment variables as the launching process
13:38:39 Verbose | Octopus Deploy: Calamari version 4.17.6
13:38:39 Verbose | Environment Information:
13:38:39 Verbose | OperatingSystem: Microsoft Windows NT 10.0.14393.0
13:38:39 Verbose | OsBitVersion: x64
13:38:39 Verbose | Is64BitProcess: True
13:38:39 Verbose | CurrentUser: NT AUTHORITY\SYSTEM
13:38:39 Verbose | MachineName: Redacted
13:38:39 Verbose | ProcessorCount: 2
13:38:39 Verbose | CurrentDirectory: Redacted
13:38:39 Verbose | TempDirectory: Redacted
13:38:39 Verbose | HostProcessName: Calamari
13:38:39 Verbose | Performing variable substitution on Redacted
13:38:39 Verbose | Executing Redacted
13:38:40 Verbose | Name Value
13:38:40 Verbose | ---- -----
13:38:40 Verbose | PSVersion 5.1.14393.2969
13:38:40 Verbose | PSEdition Desktop
13:38:40 Verbose | PSCompatibleVersions {1.0, 2.0, 3.0, 4.0…}
13:38:40 Verbose | BuildVersion 10.0.14393.2969
13:38:40 Verbose | CLRVersion 4.0.30319.42000
13:38:40 Verbose | WSManStackVersion 3.0
13:38:40 Verbose | PSRemotingProtocolVersion 2.3
13:38:40 Verbose | SerializationVersion 1.1.0.1
13:38:40 Verbose | PowerShell Environment Information:
13:38:40 Verbose | OperatingSystem: Microsoft Windows NT 10.0.14393.0
13:38:40 Verbose | OsBitVersion: x64
13:38:40 Verbose | Is64BitProcess: True
13:38:40 Verbose | CurrentUser: NT AUTHORITY\SYSTEM
13:38:40 Verbose | MachineName: Redacted
13:38:40 Verbose | ProcessorCount: 2
13:38:40 Verbose | CurrentDirectory: Redacted
13:38:40 Verbose | CurrentLocation: Redacted
13:38:40 Verbose | TempDirectory: Redacted
13:38:40 Verbose | HostProcessName: powershell
13:38:40 Verbose | TotalPhysicalMemory: 2063964 KB
13:38:40 Verbose | AvailablePhysicalMemory: 310136 KB
13:39:24 Info | Starting PS Execution
13:39:24 Info | Variables set
13:39:24 Info | Defining New and old version
13:39:24 Info | Versions set
13:39:24 Verbose | Process C:\Windows\system32\WindowsPowershell\v1.0\PowerShell.exe in Redacted exited with code 0
13:39:24 Verbose | Updating manifest with output variables
13:39:24 Verbose | Updating manifest with action evaluated variables
13:39:24 Verbose | Successfully finished Rollback Detection on Redacted

As the log show there is no log output between 13:38:40 and 13:39:24. Please help me investigate this odd behavior. If you need any more information don’t hesitate to ask.

Hi @Wildpipe!

Sorry to hear that we’re having these issues - we did make some improvements in this area in version 2019.5.12 to resolve issues with these periods of idle waiting. I’d like to suggest upgrading to a version of Octopus at or greater than 2019.5.12, and see if this is resolved.

Please feel free to reach out if you have any further questions, or this matter isn’t resolved with the upgrade. Happy to dig into this with you further to try and get to the bottom of it!

Thanks for the quick support. I will try updating Octopus to the latest version and get back to you with the results.

I’ve updated to Octopus Deploy 2019.6.7 and just tried to do a deploy which resulted in pretty much the same execution time. It waited around for 41s before starting to execute the PS code.

Thanks for getting back to me, @Wildpipe!

I’ll dig into this further today with our engineers and try to figure out what’s happening at this step that could be causing a slowdown for you. I’ll be in touch as soon as I have something for you.

Hi @Wildpipe!

The engineers and I were discussing this today, and we’re not sure where the delay is coming from inside Octopus, and wondered if perhaps your powershell profile may be (at least) part of the issue. I wonder if you could try disabling the use of the local PS profile to see if that reduces the wait time here.

To do this, simply create a project variable named Octopus.Action.PowerShell.ExecuteWithoutProfile with a value of true, create a new release, and deploy.

Please let me know how you go with this, and if you have any further questions.

Hi,

As you suggested I’ve added the project variable Octopus.Action.PowerShell.ExecuteWithoutProfile to my Project and set it to true. Just created a new release and deployed it which resulted in a 40s wait just as before.

Hi @Wildpipe!

I’m helping Justin investigate this slowness you are experiencing.

I’ve tried to replicate the issue locally but I’ve been unable to see the same delay that you’re seeing.

I’ve also looked at our bootstrap script to see if anything in there would be causing the slowness, but none of the setup we do before invoking the script is causing this delay so it’s the invocation/running of the script that is the source of slowness.

1. The script

Would you be able to share the script that is being executed for us to have a look at? If it contains sensitive information, you can email it to us at support@octopus.com referencing this support ticket.

2. Minimal test to confirm delay

Can you run an AWS CLI script that just does something basic (I just listed the contents of an S3 bucket) and see if that also has the 40 second gap before the script starts running.

3. Possible AntiVirus interference

If you have antivirus running on your Octopus server, could you disable it and try to run the script again and see if the delay persists.

Thank you and best regards,
Henrik

Hi,

Thanks for helping us figure this out.

I’ve created an AWS CLI script containing just one line of code:

write-host “Testing AWS CLI script”

Running a deploy with this process results in the exact same 40s wait time before executing.
I also performed a deploy with the antivirus turned off which made no difference, still a 40s wait before script execution.

The process is configured to use the AWS service role for the EC2 instance and use eu-west-1 as it’s region. Is it possible it takes ~40s to assume this role?

The script:

write-host “Starting PS Execution”
$releaseNumber = $OctopusParameters[“Octopus.Release.Number”];
$oldReleaseNumber = $OctopusParameters[“Octopus.Action[StepNameRedacted].Output.previousRelease”];
$tenantName = $OctopusParameters[“Octopus.Deployment.Tenant.Name”];
write-host “Variables set”
write-host “Defining New and old version”
[system.version]$newRelease=$releaseNumber.split(‘-’)[0]
[system.version]$oldRelease=$oldReleaseNumber.split(‘-’)[1]
write-host “Versions set”
if ($newRelease.Minor -lt $oldRelease.Minor) {
Write-Host “Rollback detected. Logging metric value to CloudWatch”
aws cloudwatch put-metric-data --metric-name redacted --namespace redacted --unit Count --value 1 --dimensions FromVersion=$oldRelease,ToVersion=$newRelease,Tenant=$tenantName
}

Hi,

It is quite possible that getting the credentials when using the AWS service role for the EC2 instance option is what’s causing the delay.

I don’t have an EC2 instance readily available right now, but you can do the same thing we do in code to get the credentials when using that option to see how long it takes. So if you logon to your EC2 instance and open a browser and then do the following:

  1. Call http://169.254.169.254/latest/meta-data/iam/security-credentials/, this should return the role of the instance
  2. Call http://169.254.169.254/latest/meta-data/iam/security-credentials/{InstanceRoleFromStep1}

This should hopefully tell us if it is in fact this that is causing the 40s delay.

If this is still not explaining why there’s a 40 second delay, could you please provide the following information:

  1. EC2 instance size
  2. Screenshot of how you’ve configured your AWS CLI script step

I look forward to hearing how you go.

Thank you and best regards,
Henrik

Hi,

Accessing the role from the ec2 instance is very quick:

measure-command {Invoke-WebRequest http://169.254.169.254/latest/meta-data/
iam/security-credentials/rolename}

TotalMilliseconds : 27.9321

I also ran the same command as the SYSTEM user (had to supply -usebasicparsing to get it to execute):

measure-command {Invoke-WebRequest http://169.254.169.254/latest/meta-data/
iam/security-credentials/rolename -usebasicparsing}

TotalMilliseconds : 21.3896

The Octopus instance type is a t3a.medium.

Screenshot of configuration:

Thanks Karl,

I’ll get an EC2 instance spun up and do some further testing, I’ll get back to you as soon as I can.

Thank you and best regards,
Henrik

Hi again,

I’ve been doing some more investigation into this, I spun up an EC2 instance and configured my process like yours but I am still unable to replicate the 40 second (not even closely) delay on any of my tests…they take max a couple of seconds to run.

One option we have is to use PowerShell debugging to step through the script that is run, this could potentially tell us more about what the delay is.

If you want you can have a go at doing this on your own, or if you want you can schedule a support call and we can assist you with this.

Thank you and best regards,
Henrik

Hi,

I tried activating Powershell debugging for this process and after a couple of deploys got the hang of it.
It takes about 15-20s before I am even able to attach to the debug-runspace - once connected I can let the script run which finishes in a matter of seconds. Seems like it’s taking a really long time just getting the environment ready to run any sort of code. If you’d like to take a closer look at the issue we can setup a support call.

Hi @Wildpipe,

That 15-20s delay feels like it’s more down to setting up the debug environment rather than kicking off the script.

If you run just a normal PowerShell script step on the Octopus server, does it also experience the 40s delay in executing the script?

Could you try and disable the Check for publisher’s certificate revocation option as per the instructions here, my hopes of this rectifying the delay is low but worth a try at least.

Thank you and best regards,
Henrik

Hi,

Running a normal PowerShell script step does not encounter a 40s delay in executing the script. Running a regular PS script step of several hundred lines of code takes merely 5-10s with no delay time at all.

I disabled Check for publisher’s certificate revocation and ran the deploy again but unfortunately no change - still a 40s delay.

Hi,

Thinking it’s an issue with ssl-certificates or the like got me thinking there might be an issue with firewall openings. The security group attached to our Octopus instance only allows outbound https(port 443) traffic towards the internet. I added a temporary opening for http(port 80), ran the deploy again, and voila the 40s delay is gone :slight_smile:

Is port 80 outgoing traffic required for this to work without delay or is there some configuration we can do to eliminate this need?

Hi Karl,

To try and narrow down where the delay is being triggered we have a build of Calamari that has some additional logging. The following steps will deploy the custom build

  1. Update to Octopus 2019.7.11, and run the test script like normal. This will give you version 7.1.1 of Calamari on the EC2 instance.
  2. On the EC2 instance, rename the file C:\Octopus\Tools\Calamari.Cloud\7.1.1\Calamari.Aws.dll to Calamari.Aws.dll.old
  3. Download the file from https://file.ac/21euCO1oZvDRnqLnMG-K-g/ and save it to C:\Octopus\Tools\Calamari.Cloud\7.1.1\Calamari.Aws.dll on the EC2 instance
  4. Perform the AWS script step

The deployment will then display a number of new log entries, and hopefully the gap between these entries will show us where the code is being delayed.

If you could then paste in the log messages here we should be able to narrow down the cause.

Cheers
Matt C

Hi,

Unfortunately we have only got one Octopus Deploy system running which is the one we use for Production. I would like to keep the production system on the LTS version so that means I would have to spend some time setting up a test system. If you could setup a test system and run the custom build on your system it would be much appreciated. If not - it will take some time before I can get back to you with the results of the test.

Hi Karl,

Unfortunately, I’m unable to replicate the delay on an EC2 instance myself, however the supplied DLL can be used with the Octopus LTS version for a single test. The LTS version uses Calamari 6.0.0, so to run the tests you would:

  • On the EC2 instance, rename the file C:\Octopus\Tools\Calamari.Cloud\6.0.0\Calamari.Aws.dll to Calamari.Aws.dll.old
  • Download the file from https://file.ac/21euCO1oZvDRnqLnMG-K-g/ and save it to C:\Octopus\Tools\Calamari.Cloud\6.0.0\Calamari.Aws.dll on the EC2 instance
  • Perform the AWS script step
  • Delete Calamari.Aws.dll and rename Calamari.Aws.dll.old to Calamari.Aws.dll to restore the previous version.

Regards
Matt C