Issue with Linux Deployments stucks on same Deployment

manoj_singi · 7 February 2019 14:17

Last time i have raised a ticket with the same issue and still iam facing same issue with linux deployments.

for ref i have link above pasted.

Alex.Rolley · 8 February 2019 03:16

Hi Manoj,

Are you able to let me know what troubleshooting you have done on these particular targets since you last were in contact? The last message from @Shane is included below:

I can help answer questions or provide details about that particular lock but I feel like some troubleshooting needs to happen on your end to determine why those specific machines are having a problem.

I can confirm that we still haven’t seen any other reports of this issue with other clients which heavily indicates that this issue is related in particular to your 4 boxes, hence knowing what has been done in the intervening period is needed.

Thanks Manoj,

Regards,
Alex

sharper56 · 19 February 2019 18:40

Hi, I’m working with Manoj on this issue.

The servers task output previously set was not for the problem we’re having. Please use this ServersTasks-254840.txt instead. ServerTasks-254840.log.txt (196.1 KB)

At line 397, after “Aquire Packages” we get the error “Cannot start this task yet because task [ServerTasks-254840]” for every step. Even after we go to Step 3 which is run serially, the individual servers in the step are still generating the “Cannot start this task yet because task [ServerTasks-254840]”

The changes we’ve tried are:

Re-order processes
Change parallel setup for steps to serial steps.
Run deployments on single web and single application with parallel steps.
Run deployments on different environments.

All changes had the same “Cannot start this task …” errors.

We did have this working correctly at the start of the project, but now 50 or so iterations later… the parallel isnt working.

I’m also attaching the Octopus Tasklog for ServersTasks-254840 (servertasks-252480_h6y2mhh4y6.txt (205.6 KB)
servertasks-252480_h6y2mhh4y6.txt (205.6 KB)
)

Here a quick plain text of the process steps:
Orderlab-POC

Process

1.NGINX (Servers OLWEB ordweb-wc-1i, ordweb-wc-2i)
Multi-step deployment across deployment targets in role

1.1.NGINX - Install
Run a script across targets in roles

1.2.NGINX - Stop
Run a script across targets in roles

1.3.NGINX - Deploy Config
Deploy package OrderLabConfig from Octopus Server (built-in)

1.4.NGINX - Install Cert
Run a script across targets in roles

1.5.NGINX - Start or Reload NGINX
Run a script across targets in roles

Parallel (Step 1 & Step 2)

2.Tomcat (Servers OLAPP -> ordapp-wc-1i, ordapp-wc-2i)
Multi-step deployment across deployment targets in role

2.1.Tomcat - Stop
Run a script across targets in roles

2.2.Tomcat - Install
Run a script across targets in roles

2.3.Tomcat - Deploy Config
Deploy package OrderLabConfig from Octopus Server (built-in)

2.4.Tomcat - Java Keystore and properties PEM CER and HMAC
Run a script across targets in roles

2.5.Tomcat - Restart Service
Run a script across targets in roles

3.War Deploy (Servers OLAPP -> ordapp-wc-1i, ordapp-wc-2i)
Multi-step deployment across deployment targets in role

3.1.War Deploy - Stop and Undeploy Tomcat Application
Run a script across targets in roles

3.2.War Deploy - Package Deployment
Deploy package SalesPortalAgent-Web from Octopus Server (built-in)

3.3.War Deploy - Start Application
Run a script across targets in roles

manoj_singi · 28 February 2019 19:45

Any Update on this?

Alex.Rolley · 28 February 2019 23:56

Hi @manoj_singi

Apologies for the delay in response, somehow I missed the response from Stuart.

I’ve had a look at the provided logs and I can see that the individual tasks in this deployment are locking against each other, which is slowing the deployment down even though it does eventually succeed. In your case I suspect that your SSH targets have the same thumbprint, which is one item that will trigger a task lock as part of a deployment.

To resolve this you have two options, the first is to update the thumbprints on your SSH targets to ensure that they are unique. The second option would be to bypass the deployment mutex (in effect disable the locking checks) by adding the following variable and value to the affected project(s).

OctopusBypassDeploymentMutex = TRUE

That should resolve the locking errors for you, please let me know if you have any further issues or if there is anything else we can assist with.

Regards,
Alex

sharper56 · 1 March 2019 15:46

Alex, where does the ‘OctopusBypassDeploymentMutex = TRUE’ go? Is it in the Project Variables directly?

sharper56 · 1 March 2019 16:49

Alex… thank you for the help on this.

I’ve re-run with the ‘OctopusBypassDeploymentMutex = TRUE and that works.

You also guessed correctly that all the servers had the same fingerprint… so I’ve recreated the host keys for all the servers, updated the fingerprints in OD, and redploy a previously deploy… and they deployed in parallel as well.

Thank you again!

system · 31 March 2019 16:49

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.