Unable to use "Calamari.AzureScripting" with custom tentacle package directory

Hi Leon,

As Tina stated above, the developers would like to get an overview of your setup and steps leading to the issue. They don’t have any specific requests as to what they will need to see.

I have a link below which will allow you to schedule a meeting with Andrew and John, two of our developers looking into this issue.

https://a.goodtime.io/w/octopus-deploy/daniel.fischer/developer-screen-share

If none of these times work for you or if you have any other thoughts or issues, please don’t hesitate to let me know.

Best regards,
Daniel

Thanks @Daniel_Fischer @tina.bamford - I’ve checked internally here, and it won’t be possible for me to share anything internal with regards to our setup.

I’m hoping to spend some time later this week building a containerized environment to be able to reproduce this, and get some new process dumps.

Hey @Daniel_Fischer - have been able to reproduce, and have the following info

  • stack trace info (Calamari > Powershell Script > ConhostV2)
  • Powershell script (that I believe could be generated from Calamari)
  • Logs from output indicating where it hangs (call to Clear-ItemProperty)
  • dump files for all three processes

Hi Leon,

Thanks for collecting all of this information. Are you or someone with access able to log into Octopus.com and see a “support” tab? This should provide a location for you to upload these files.

We have had to be rather strict with our upload permissions for support files, so you or whomever has access to your license/organization on Octopus.com may not be unable to upload any support files.

Will you be able to provide me with your Octopus License Key (not the entire xml, just the first field with the key). This will allow me to find your license on our end and ensure you or one of the users on the license are able to upload these files.

I have made this conversation private so only yourself and Octopus staff are able to view it and any attachments.

Let me know if you have any questions here.

Best regards,
Daniel

Hey @Daniel_Fischer

First field from license is: 24988-65845-51426-21111

Due to the potential of sensitive data being included in the procdumps, these haven’t been included in the uploads, but I have included screenshots of the callstacks, and the generated script where the issue occurs. If we need to dig into the procdumps, we’re available for a screenshare session.

At the same time, is it possible to allow @miguelelvir and @inboxprchprocurement.us to also access this thread?

Hi Leon,

Thanks for getting back. I have modified this conversation so it is unlisted instead of private. This will allow @miguelelvir and @inboxprchprocurement.us to view the thread and anyone with a link to it.

I think our developers are very much interested in seeing the process dump over anything else at this point, so I believe a screenshare session is our best option. You can use the following link to schedule an appropriate time.

https://a.goodtime.io/w/octopus-deploy/daniel.fischer/developer-screen-share

Let me know if you have any issues scheduling.

Best regards,
Daniel

Thanks @Daniel_Fischer - I’m based in Germany, and @miguelelvir is based in New York, any chance we can get an GMT+1/EST friendly time (know that’s not easy from AEST).

Hi Leon,

Thanks for getting back. I have discussed this with the developers and our international staff and it looks like we don’t have any staff near these time-zones who are confident in their ability to diagnose the issue. Majority of our programming staff (and the specific developers with expertise in this area of Octopus) are AEST based. Unfortunately, GMT+1 seems like one of the worst possible matches for organizing calls with our developers. :frowning:

We do have some options available though which may work for you.

The first option which was discussed by the team is the possibility of all involved on our side signing a non disclosure agreement. This is something our staff have done at times for customers with sensitive information requiring live support. So if the NDA option is viable, you could upload the process dump and our team will take additional care to ensure only those involved in the NDA access the files and delete them immediately after they’re required.

The second option is to schedule the call slightly outside of our developers business hours and possibly slightly outside of yours. 8am GMT+1 is 5pm AEST, we may be able to manage a call around this time but cannot guarantee a long time to cover the issue.

A final option would be to have @miguelelvir work with one of our US based support engineers. They may not be able to identify the issue, but could collect some more contextual information and possibly relay it to our developers.

I think the best result for diagnosing this issue would be from our developers having a copy of the process dump or being able to see it live, but if neither of these options work, our support team in the US may at the very lest be able to work with @miguelelvir to investigate further.

Let me know what you think of these options and if you have any questions at all, please don’t hesitate to let me know. We would very much like to help you resolve this issue and identify if we need to alter any code to avoid it happening to others in the future.

Best regards,
Daniel

Hey @Daniel_Fischer,

Many thanks for all the options and for checking. Have spoken with @miguelelvir and we’re happy to stick with what you’ve got available, but we really appreciate you trying to help out here.

Will book in slot as soon as we find a time that works.

All the best,

Leon

Hi @leon.io,

I’m giving re-creating this issue another crack. Could I get some details from you please. Apologies if you have already provided some of this, but I can’t find it right now (they may have been auto-deleted) and Daniel is finished up for today.

  1. Which version of IIS?
  2. Which version of Windows
  3. Which version of PowerShell (it should be in the Task Log)
  4. For the step(s) that hangs, is it deploying a Web Site, a Virtual Directory or a Web Application? Which options are enabled in the UI (Authentication, start website, etc)?
  5. Are you able to send me the part of the task log showing the hanging of the step? I’m looking for the verbose messages that show which part is being configured, anything related to your configuration can be scrubbed out.
  6. For the PowerShell proc dump you have, could you run this process on it and send the stack trace that looks like it’s hanging (The tool itself seems to have moved here.
  7. The only usage of Clear-ItemProperty is around bindings. For the steps that freeze, are they always creating a new website or setting updated bindings changed? If nothing has changed in the bindings that part of the code should be skipped. The logs should output which path it goes down

No worries if you can’t get all this, I’m keen to get back into this in the morning, so any information would be helpful.

@leon.io,

Another thought. Do you have any steps or scripts other than the “Deploy to IIS” step that modify IIS on the servers where the problem occurs? For example a custom script step, or one of the IIS steps from the community library?

What I’m thinking is that those scripts are modifying IIS outside of the locks we have in the “Deploy to IIS” step and causing it the built-in step to freeze.

Rob

Hey @Daniel_Fischer @Robert_Wagner

We did some more investigation on this yesterday, and your message is spot on - so the step we’ve got this occurring in for this latest instance was the IIS Virtual Directory - Create community step.

A brief overview of the process is

  1. Other setup things…
  2. Deploy to IIS
  3. IIS - App Pool
  4. Setup App_Data (issue in Configure App_Data virtual directory).

We then revisited the original issue we’ve had where we get a timeout on Set-ItemProperty and realized there isn’t a correlation with this issue.

We then switched off our custom timeout logic to try to recreate the original failures (over 20 projects, with over 2000 various deployments in a 24hr period) and we’re happy to report that this seems to have been resolved.

We think that some of the latest updates in Calamari have resolved this (as previously this was caused by powershell scripts). Either way, we’re very grateful for you all follow up and the support offered.

We don’t want to take any more of your time, so we’ll cancel our screen-share session for now. We’ll revert to using the vanilla flavor of Calamari (pun-intended) and get back to you if we’re able to capture any strange procdumps of these issues in the future.

Hi Leon,

Awesome! I’m a bit wary about the problem being fully resolved. I’m unaware of any changes to the IIS PowerShell scripts in the past year or so. We did add all the locking to the script (to make it safe for Bypass Mutex) in 3.11.

The other steps don’t have the locking, so if it does occur again, it might be worth trying to add the locking code to those steps. The locking is Alternatively you could scope the OctopusBypassDeploymentMutex variable just to the “Deploy to IIS” step and steps that don’t modify IIS.

Regards,

Rob

Thanks @Robert_Wagner, I agree there is still a question mark there about the resolution. We’re doing an upgrade in the coming week and with this we’ll also remove the use of custom tentacles. At the same time, we’ll be on the hunt for any hung deployments so we can grab proc dumps.

Thanks again to you all for the great support and help here

Hi Leon,

Thanks for the update on this. Please feel free to get back in touch at any time if this becomes an issue again. We have archived our internal conversations and our developers can easily reference them in the future.

Best regards,
Daniel

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.