No response from support and unable deploy for over 16 hours

reliability
(Carson) #1

Hello,

There are rare (fortunately) occasions when we are completely blocked and unable to deploy. We’re in this situation now and I’ve filed a ticket about it 16 hours ago now.

Essentially our cloud instance is queueing all of our deployments and that queue is not progressing and there’s nothing that I can see doing about it.

The most pressing issue right now is that we are unable to deploy to any environment and we have some urgent issues that need addressing.

A secondary issue is there’s no means to reach out to anyone, save send an email and hoping someone responds at some point. There’s not even an acknowledgement that the issue was received.

There’s also no means to escalate the ticket, no number to call, nothing. When I’ve raised this issue in the passed I’ve been assured that the email is monitored 24x7 for emergency issues but that doesn’t seem to be working at the moment.

There needs to be some means to “file a ticket” and escalate that ticket, especially if there’s no number to call. Because the instance is cloud hosted we can’t restart services or take really any other action, we’re just stuck waiting.

Like I said, issues are few and far between, and in all but two cases (this included) have not been showstoppers, but when we do have showstoppers outside our control we need a means to get urgent support and I’m not seeing it.

(Justin Walsh) #2

Hi @carson,

Sorry to hear that you’ve got a showstopping situation. It looks like your ticket was picked up by one of our support team when it came in. I’ll dig in to the ticket and see if it’s something I can get resolved for you ASAP.

Will be in contact soon!

(Carson) #3

Thank you so much!

(Aaron Roydhouse) #5

Hi @carson I have been looking at the ODC product and wondering the same think about stuck ODC instances and the risk of not being able to fix it ourselves.

@Justin_Walsh it would be nice if the ODC instance console has the option to restart/reboot the ODC instance. Chances are that would often help.

@carson looking at your situation could be the built-in worker is stuck somehow. Did you try adding and additional external worker to the default worker pool? That might be a work-around until OD support can help you.

(Carson) #6

The ability to recycle our own instance would be great! That said it’s pretty rare that we run into these sorts of technical issues – interestingly this was a result of a feature change that we weren’t aware of. Now, when a manual intervention
is required, you can no longer deploy builds without addressing it.

It adds additional steps without providing a lot of extra value (or any really for us). What would be great is if the previous interruption / build were automatically cancelled in light of the new build, this would better support our particular
workflow at least. I’m not sure how others use the system, but for us when QA deploys a build there’s an interruption that prevents promotion to the next environment without their sign off (i.e. they allow the build to proceed to the next step). Because they
deploy many, many builds, it leads to many dead interruptions and builds (we had over 1000 of the two combined).

We’ve since cleared the queue, but it does mean extra steps every time going forward. Hopefully that’s something that can be considered. Thanks!

Carson.

(Aaron Roydhouse) #7

Thanks for the background @carson and the info about the changed behavior.

You may be able to add a deployment Step that uses an API key and the OD API to find and cancel all stalled deployments for the same project with an older build.

(Carson) #8

Oh good idea thanks!

1 Like