San crashed with scheduled jobs


(lance.gamble) #1

So 10 minutes before our scheduled deployments the san crashed and the server went dark. We scheduled the jobs with octo.exe without a value for timeout. We are assuming the default 600 would be used. What is that timeout tied to? Is it 600 seconds from the scheduled time? or 600 seconds from beginning execution?

We want to cancel the deploys but not sure how to do that if the timeout is tied to a start time. Right now the server service is turned off.


(lance.gamble) #2

The Server rebooted again this am and the jobs were timed out, but I’d still like to know what the expiry time is based on. It looks like Scheduled time. So the default for octo.exe would be 10 minutes after scheduled time?


(Michael Compton) #4

Hi,

Thanks for getting in touch.

There’s a few moving parts here, so I’ll explain the settings and if I miss something please ask.

There’s a difference between the task running on the server and Octo.exe waiting for it to complete. The option deploymenttimeout sets a time for which Octo.exe will observe the task and poll the Octopus server waiting for it to complte. If that time runs out, the running task on the server isn’t cancelled unless cancelontimeout is also set.

Server-side, if the task starts and the Octopus server stops while it’s running, it gets cancelled on server restart.

Hope that helps. Let me know if not.

Michael


(lance.gamble) #5

The jobs were scheduled at 2pm to start at 9pm. The server crashed at 8:00 PM. What we want to know is when will the jobs timeout? We had 342 servers down, if this one came up first, partial deployments would have gone out and then eventually failed when it hit a down server.

We use octo.exe --deploy-release I can’t figure out which switch would guarantee that if a job has not started in x minutes timeout the job and do not run it.


(Michael Compton) #6

Sorry, if my first answer wasn’t quite on the spot.

There is a setting to say cancel the task if it doesn’t start soon enough. There’s the time to start a deployment (I think you are using --deployat in your Octo.exe) and there’s another to say if it hasn’t started with x of that time, cancel it. Unfortunately, it looks like we don’t expose that through Octo.exe.

Firstly, the workaround for now is to set directly through the API. For example, here’s the call Octo.exe is making

QueueTime is the time to start the deployment (= --deployat) and the missing QueueTimeExpiry is the time to cancel if it hasn’t started. There are also examples of creating a deployment in powershell here and here. You’d have to do something similar and set the QueueTimeExpiry.

I’ve also opened an issue on GitHub about this omission, and I’ll jump on that asap, so you can get it in your hands.

Michael


(lance.gamble) #7

Thanks Michael for now it looks like the default is 30 minutes, same as the ui.


(system) #8

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.