I have a script that creates some text files and then commits them to a git repository. This is working as expected (the commits are visible in the repository) BUT octopus doesn’t seem to understand that the script has actually completed! This is not happening every time the task runs, only some times (can’t reproduce it but it happens quite often). Below is the final part of the script.
The script runs every day at 5am. When this happens, the only way for the task to end, is to manually cancel it (last time it run for 19 hours till i noticed it).
The biggest problem is that all other tasks for the same server, are queued waiting for this task to complete.
So 2 questions:
Can we solve the root of the problem? Why is this happening? I updated from 2.5 to latest hoping this is fixed, but it keeps happening again.
Until we find the root cause, is there a max step execution time i can set, to avoid having this task running for days?
Thanks for getting in touch! This is an interesting problem. I think the strangest part is that it happens sporadically. I almost think that if it was going to fail, that it would happen consistently. That said, I have a couple questions so hopefully we can isolate and fix the issue.
What version of Octopus did you upgrade to?
What version of Windows Server are you using?
What version of git are you using?
Do you see any errors in your Octopus server logs or the deployment task logs? More information on log file locations is available at the following URL. http://docs.octopusdeploy.com/display/OD/Log+files
Do you see the changes in the local and remote repository? i.e. Does the script execute correctly every time? Or does it fail at the git push step? How are you handling credentials? One of my colleagues thought it could be hanging on credentials if the git push is failing.
Try adding an explicit Exit 0 to the end of the script.
Googling for information on possible reasons why a git push would fail suggests solutions like ensuring you have an up to date version of git installed locally etc.
windows server 2008 r2
octopus version: 3.3.26
git version: 2.8.1.windows.1
The commits are visible in the remote repository. I don’t think it’s a git related issue because if the first “if” check evaluates to true, then “Nothing to commit!” is shown and the issue still happens (without having executed a git command).
I’ll get back to you when i check the logs. I’ll also try exit 0.
The raw log of the deployment shows nothing, it just shows the last write-host from my script and then just hanging. When i click cancel, no error is returned. In the previous version i had (2.5) when i canceled, a message saying “The process with id xxx was not found” (something like that) was displayed.
As for the server logs, there is alson nothing in there. Just the “Secure connection established.” at the start of the deployment, and then nothing.
I added the exit 0 at the end of the script. Let’s see if this happens again.
When this happens, after i cancel the task, the server remains unavailable for future deployments. I always get the message “Cannot start this task yet. There is already another task running that cannot be run in conjunction with any other task. Please wait…”.
The only way i found to fix this, is to restart the tentacle service on the server. After that, the tasks are executed again normally.
Thanks for sending through more details. I’ve had a chat with a few of my teammates and we think it’s likely that it’s hanging on the git push. One thing to note is that git status doesn’t communicate a remote server so it’s doesn’t eliminate git completely. I’d really be interested to see if it’s getting passed that git push. If you added another Write-Host after the push, this would confirm it.
That said, I do think it’s very weird that the changes are appearing in the remote repository. The next time this happens, can you connect to the target server and run task manager? I’m guessing that you’ll find git.exe will still be running and if you kill it, then the script should complete.
Another comment from one of my teammates who had a similar issue previously was that he needed to force git to use not prompt for credentials. I found that git 2.3 added support to force it not to prompt for credentials from the terminal by setting an environment variable GIT_TERMINAL_PROMPT=0 on the target server. More information on this is available at the following URL under ‘The credential subsystem is now friendlier to scripting’.
This all assumes that git is causing the issue. If we find that git isn’t involved, then we’ll need to dig deeper.
I thought this was normal, but actually the identity should be there and there shouldn’t be a need to create it from time to time. Maybe something related to ssh-agent.
One other thing i forgot to mention. The project variable OctopusBypassDeploymentMutex is set to True. I’ve set it this way, because this is a maintenance task, and i don’t want other deployments to wait for it to finish (for the same server). But i’m not sure if this is working at all.
I use this line before running git commands (i use posh-git).
. ‘D:\repositories\posh-git\profile.example.ps1’
When ssh-agent.exe is not running, then the posh-git starts it, and outputs the indentity added line.
This is the case that octopus hangs. If i manually stop the ssh-agent process then octopus completes the task successfully.
If the ssh-agent is already running, then the posh-git doesn’t start it and octopus runs as expected.
For reference here is what posh-git is doing (not my code):
# Loosely based on bash script from http://help.github.com/ssh-key-passphrases/
function Start-SshAgent([switch]$Quiet) {
[int]$agentPid = Get-SshAgent
if ($agentPid -gt 0) {
if (!$Quiet) {
$agentName = Get-Process -Id $agentPid | Select -ExpandProperty Name
if (!$agentName) { $agentName = "SSH Agent" }
Write-Host "$agentName is already running (pid $($agentPid))"
}
return
}
if ($env:GIT_SSH -imatch 'plink') {
Write-Host "GIT_SSH set to $($env:GIT_SSH), using Pageant as SSH agent."
$pageant = Get-Command pageant -TotalCount 1 -Erroraction SilentlyContinue
$pageant = if ($pageant) {$pageant} else {Find-Pageant}
if (!$pageant) { Write-Warning "Could not find Pageant."; return }
Start-Process -NoNewWindow $pageant
} else {
$sshAgent = Get-Command ssh-agent -TotalCount 1 -ErrorAction SilentlyContinue
$sshAgent = if ($sshAgent) {$sshAgent} else {Find-Ssh('ssh-agent')}
if (!$sshAgent) { Write-Warning 'Could not find ssh-agent'; return }
& $sshAgent | foreach {
if($_ -match '(?<key>[^=]+)=(?<value>[^;]+);') {
setenv $Matches['key'] $Matches['value']
}
}
}
Add-SshKey
}
And in the profile i load (not my code, posh-git example):
Push-Location (Split-Path -Path $MyInvocation.MyCommand.Definition -Parent)
# Load posh-git module from current directory
Import-Module .\posh-git
# If module is installed in a default location ($env:PSModulePath),
# use this instead (see about_Modules for more information):
# Import-Module posh-git
# Set up a simple prompt, adding the git prompt parts inside git repos
function global:prompt {
$realLASTEXITCODE = $LASTEXITCODE
Write-Host($pwd.ProviderPath) -nonewline
Write-VcsStatus
$global:LASTEXITCODE = $realLASTEXITCODE
return "> "
}
Pop-Location
Start-SshAgent -Quiet
I find it very difficult to pinpoint what exactly octopus doesn’t like. I would be glad if you could make some tests and find out if it is something that can be fixed in octopus, or is it something that needs to change in posh-git scripts?
Thanks for continuing to investigate this and sending through detailed replies. It helps so much! I’ve dug into this issue a bit more and I found a bunch of information pointing to issues with ssh-agent on windows. In this situation, I’ll share a couple key points.
It really sounds like this posh-git issue. https://github.com/dahlbyk/posh-git/issues/258 I’d highly recommend reading through the post as it has a few suggestions. That said, the post make it clear that it’s not an issue with posh-git but simply the ssh-agent.
I found another post on a Jenkins forum where someone solved a similar issue by creating a new key pair without a password. Obviously, this isn’t ideal but if it worked in your situation, it could be a short term workaround until the issue above is resolved.
This issue has been closed due to inactivity. If you encounter the same or a similar issue and require help, please open a new discussion (if we asked for logs or extra details in this thread, consider including them in the new thread). If you are the creator of this thread and believe it should not be closed let us know via our support email.