Thanks @Justin_Walsh. Runbooks sound great, especially if they come with recurrent scheduling.
We want our deployments to be reliable every time too, that is why we need Step time-outs
Two use cases for Step time-outs are (1) retries and (2) detecting wedged/failed deployments.
Shockingly some things on the Internet are not 100% reliable . If we can’t implement retry logic, we can’t make our deploys reliable. As others have cited, things like Azure FTP connections or CloudFormation updates (using the Octopus Step template) can hang forever or at least for many hours. If we can’t timeout then we can’t retry.
Second, without a timeout, a wedged deployment never fails. The Octopus official CloudFormation Step is a good example, it has no built-in time-out/retry/rollback logic. For some resources CloudFormation will ‘stick’ for 3+ hours. When you schedule a deployment in Octopus, you can set a critical start window, but not a critical duration. So without a timeout you can’t get to the notification Step to tell you a deployment failed/wedged. It never fails because it is wedged (for hours in this case, but potentially forever).
With a Step timeout we could move to the next Step, detect that the CF action has not completed in any reasonable time, cancel it to roll it back, and notify operators to investigate. Or we could enter Guided Failure mode. Without Step timeouts this is a silent failure we can’t detect or address.
If Octopus can provide a CloudFormation and other Step templates that are reliable every time, and never ever wedge, then sure, I’m stop asking for Step timeouts