Task executing indefinitely

Hello

I have a script that creates some text files and then commits them to a git repository. This is working as expected (the commits are visible in the repository) BUT octopus doesn’t seem to understand that the script has actually completed! This is not happening every time the task runs, only some times (can’t reproduce it but it happens quite often). Below is the final part of the script.

function Test-LastExit($cmd) {
    if ($LastExitCode -ne 0) {
        Write-Host "##octopus[stderr-error]"
        write-error "$cmd failed with exit code: $LastExitCode"
    }
}

$status = Get-GitStatus
if(-Not($status.HasWorking))
{
    Write-Host "Nothing to commit!"
}
else
{
  Write-Host "git pull ..."
  git pull
  Test-LastExit "git pull"

  Write-Host "git add ..."
  git add -A
  Test-LastExit "git add"

  Write-Host "git commit ..."
  git commit -m "comment"
  Test-LastExit "git add"

  Write-Host "git push ..."
  git push
  Test-LastExit "git push"
}

The script runs every day at 5am. When this happens, the only way for the task to end, is to manually cancel it (last time it run for 19 hours till i noticed it).
The biggest problem is that all other tasks for the same server, are queued waiting for this task to complete.

So 2 questions:

  1. Can we solve the root of the problem? Why is this happening? I updated from 2.5 to latest hoping this is fixed, but it keeps happening again.
  2. Until we find the root cause, is there a max step execution time i can set, to avoid having this task running for days?

thanks

Hi,

Thanks for getting in touch! This is an interesting problem. I think the strangest part is that it happens sporadically. I almost think that if it was going to fail, that it would happen consistently. That said, I have a couple questions so hopefully we can isolate and fix the issue.

  • What version of Octopus did you upgrade to?
  • What version of Windows Server are you using?
  • What version of git are you using?
  • Do you see any errors in your Octopus server logs or the deployment task logs? More information on log file locations is available at the following URL. http://docs.octopusdeploy.com/display/OD/Log+files
  • Do you see the changes in the local and remote repository? i.e. Does the script execute correctly every time? Or does it fail at the git push step? How are you handling credentials? One of my colleagues thought it could be hanging on credentials if the git push is failing.
  • Try adding an explicit Exit 0 to the end of the script.

Googling for information on possible reasons why a git push would fail suggests solutions like ensuring you have an up to date version of git installed locally etc.

Looking forward to your reply.

Rob

windows server 2008 r2
octopus version: 3.3.26
git version: 2.8.1.windows.1

The commits are visible in the remote repository. I don’t think it’s a git related issue because if the first “if” check evaluates to true, then “Nothing to commit!” is shown and the issue still happens (without having executed a git command).

I’ll get back to you when i check the logs. I’ll also try exit 0.

Thank you

Ok it happened again, and i was monitoring it.

The raw log of the deployment shows nothing, it just shows the last write-host from my script and then just hanging. When i click cancel, no error is returned. In the previous version i had (2.5) when i canceled, a message saying “The process with id xxx was not found” (something like that) was displayed.

As for the server logs, there is alson nothing in there. Just the “Secure connection established.” at the start of the deployment, and then nothing.

I added the exit 0 at the end of the script. Let’s see if this happens again.

One more thing:

When this happens, after i cancel the task, the server remains unavailable for future deployments. I always get the message “Cannot start this task yet. There is already another task running that cannot be run in conjunction with any other task. Please wait…”.

The only way i found to fix this, is to restart the tentacle service on the server. After that, the tasks are executed again normally.

edit: i opened another question for this, as it seems to be happening in other occasions too
http://help.octopusdeploy.com/discussions/problems/47121-cannot-start-this-task-yet-there-is-already-another-task-running

Hi,

Thanks for sending through more details. I’ve had a chat with a few of my teammates and we think it’s likely that it’s hanging on the git push. One thing to note is that git status doesn’t communicate a remote server so it’s doesn’t eliminate git completely. I’d really be interested to see if it’s getting passed that git push. If you added another Write-Host after the push, this would confirm it.

That said, I do think it’s very weird that the changes are appearing in the remote repository. The next time this happens, can you connect to the target server and run task manager? I’m guessing that you’ll find git.exe will still be running and if you kill it, then the script should complete.

Another comment from one of my teammates who had a similar issue previously was that he needed to force git to use not prompt for credentials. I found that git 2.3 added support to force it not to prompt for credentials from the terminal by setting an environment variable GIT_TERMINAL_PROMPT=0 on the target server. More information on this is available at the following URL under ‘The credential subsystem is now friendlier to scripting’.

This all assumes that git is causing the issue. If we find that git isn’t involved, then we’ll need to dig deeper.

Let me know how you go.

Rob

So i added another write-host after the git-push. I also added Exit 0 as the last command of the script.

It happened again today and this is what i’ve seen:

  1. The repository has the commits (ok we know that).
  2. In the output, the last write-host is shown (after push).
  3. Exit 0 obviously, didn’t make the script end.
  4. git.exe was not running (checked task manager) when the task was hanging.

But i think i found something. In all the logs of the failed builds i see a line that doesn’t exist in the succeded builds:

Identity added: /c/Users/tentacleuser/.ssh/id_rsa (/c/Users/tentacleuser/.ssh/id_rsa)

I thought this was normal, but actually the identity should be there and there shouldn’t be a need to create it from time to time. Maybe something related to ssh-agent.

I’ll further investigate it.

One other thing i forgot to mention. The project variable OctopusBypassDeploymentMutex is set to True. I’ve set it this way, because this is a maintenance task, and i don’t want other deployments to wait for it to finish (for the same server). But i’m not sure if this is working at all.

Ok i reproduced it.

I use this line before running git commands (i use posh-git).
. ‘D:\repositories\posh-git\profile.example.ps1’

When ssh-agent.exe is not running, then the posh-git starts it, and outputs the indentity added line.

This is the case that octopus hangs. If i manually stop the ssh-agent process then octopus completes the task successfully.

If the ssh-agent is already running, then the posh-git doesn’t start it and octopus runs as expected.

For reference here is what posh-git is doing (not my code):

# Loosely based on bash script from http://help.github.com/ssh-key-passphrases/
function Start-SshAgent([switch]$Quiet) {
    [int]$agentPid = Get-SshAgent
    if ($agentPid -gt 0) {
        if (!$Quiet) {
            $agentName = Get-Process -Id $agentPid | Select -ExpandProperty Name
            if (!$agentName) { $agentName = "SSH Agent" }
            Write-Host "$agentName is already running (pid $($agentPid))"
        }
        return
    }

    if ($env:GIT_SSH -imatch 'plink') {
        Write-Host "GIT_SSH set to $($env:GIT_SSH), using Pageant as SSH agent."
        $pageant = Get-Command pageant -TotalCount 1 -Erroraction SilentlyContinue
        $pageant = if ($pageant) {$pageant} else {Find-Pageant}
        if (!$pageant) { Write-Warning "Could not find Pageant."; return }
        Start-Process -NoNewWindow $pageant
    } else {
        $sshAgent = Get-Command ssh-agent -TotalCount 1 -ErrorAction SilentlyContinue
        $sshAgent = if ($sshAgent) {$sshAgent} else {Find-Ssh('ssh-agent')}
        if (!$sshAgent) { Write-Warning 'Could not find ssh-agent'; return }

        & $sshAgent | foreach {
            if($_ -match '(?<key>[^=]+)=(?<value>[^;]+);') {
                setenv $Matches['key'] $Matches['value']
            }
        }
    }
    Add-SshKey
}

And in the profile i load (not my code, posh-git example):

Push-Location (Split-Path -Path $MyInvocation.MyCommand.Definition -Parent)

# Load posh-git module from current directory
Import-Module .\posh-git

# If module is installed in a default location ($env:PSModulePath),
# use this instead (see about_Modules for more information):
# Import-Module posh-git


# Set up a simple prompt, adding the git prompt parts inside git repos
function global:prompt {
    $realLASTEXITCODE = $LASTEXITCODE

    Write-Host($pwd.ProviderPath) -nonewline

    Write-VcsStatus

    $global:LASTEXITCODE = $realLASTEXITCODE
    return "> "
}

Pop-Location

Start-SshAgent -Quiet

I find it very difficult to pinpoint what exactly octopus doesn’t like. I would be glad if you could make some tests and find out if it is something that can be fixed in octopus, or is it something that needs to change in posh-git scripts?

thank you

Hi,

Thanks for continuing to investigate this and sending through detailed replies. It helps so much! :slight_smile: I’ve dug into this issue a bit more and I found a bunch of information pointing to issues with ssh-agent on windows. In this situation, I’ll share a couple key points.

  • It really sounds like this posh-git issue. https://github.com/dahlbyk/posh-git/issues/258 I’d highly recommend reading through the post as it has a few suggestions. That said, the post make it clear that it’s not an issue with posh-git but simply the ssh-agent.

  • I found another post on a Jenkins forum where someone solved a similar issue by creating a new key pair without a password. Obviously, this isn’t ideal but if it worked in your situation, it could be a short term workaround until the issue above is resolved.

Let me know how you go!

Rob

Notice:

This issue has been closed due to inactivity. If you encounter the same or a similar issue and require help, please open a new discussion (if we asked for logs or extra details in this thread, consider including them in the new thread). If you are the creator of this thread and believe it should not be closed let us know via our support email.