190 Tenants in Deployment Breaks Server

Hi,

When attempting to queue a deployment for 190 tenants the Octopus server becomes unresponsive and eventually will ‘reset’, become responsive again but none of the requested deployments will be queued. It is very similar to our previous issue except I didn’t see any errors this time.

Version with issue: 2021.1(7316)
Current working version: 2020.4.4

This was discovered right after upgrading to the version mentioned above. Fortunately we were able to revert after our testing however we will be unable to upgrade until this issue is resolved.

Thanks

Tom

Hi @t.illingworth01,

Thanks for reaching out and sorry to hear you’re having issues.
We’ll look into this for you to see if we can identify the problem.

In the meantime are you able to send us a copy of the server logs from when the issue occurred so that we may look further into it, please?

Kind Regards,

Hi Adam,

Thanks for getting back to me. I have a copy of the log file and the approximate time of the incident for review. What is the preferred method for sending the file?

Thanks

Tom

Hi @t.illingworth01,

You can either send me the log directly via private message or if the log files are too large feel free to private message me your e-mail (so I can authorize the upload) and I can send you a link to upload the logs to.

Kind Regards,
Adam

Hi @t.illingworth01,

Just to continue our conversation here - thanks for uploading the log files!
We’ll focus our attention, per your comments, on around the 17:00 timestamp.

I’ll look into this for you and let you know what I find.

Kind Regards,
Adam

Thanks Adam,

If you’re unable to find anything helpful I might be able to invest some time going between various releases from our current to the latest to identify where the breaking change was introduced.

Thanks

Tom

Hi @t.illingworth01,

Thank you for being patient while I look into this for you!

It seems likely that you’re running into a recently rediscovered issue that has been introduced in 2021.1.x.

Our engineers are currently working on developing a fix for this, which should come bundled into 2021.2.x. Unfortunately, I can’t give you a confirmed release date on this. However, we expect this release to be going out soon.

In the meantime, we’ve seen improvements from increasing the Max Pool Size on the SQL Connection. I can give you some instructions on this; however, as you are currently up and running on an older version of Octopus, I’d recommend you stay on this version until 2021.2 is released.

You will need to find your config file and alter the connection string line in order to increase your Max Pool Size. I recommend taking a backup of Octopus Deploy and your Master Key. You can find more information on this here.

Once you have a backup, proceed with the following:

  • Place your instance into maintenance mode and stop the instance via the Octopus Manager.
    You can find your Octopus home directory within the Octopus Manager also.
  • Navigate to your home directory and find the .config file.
  • Add the parameter ‘Max Pool Size’ to the Database Connection String with a higher number than the default; our docs use 200 as an example (100 should be the default).

Syntax Example:
<set key="Octopus.Storage.ExternalDatabaseConnectionString">Data Source=(local);Initial Catalog=OctopusDeploy-MyDB;Integrated Security=True;Max Pool Size=xxx;</set>

  • Save the file and start up your Octopus Instance.

Again I must reiterate that if possible, it would be better for you to remain on 2020.4.4 and await the release of 2021.2, which should be soon.

I hope this helps with your current issues and gives you the reassurance that you can upgrade in the future. If you haven’t had a chance to check out our documentation on performing major upgrades, then it’s a great resource for doing it safely.

Should you have any further questions or issues - please don’t hesitate to reach out!

Kind Regards,
Adam

Thanks Adam,

I’m glad there is a fix line up and I’ll probably just hold off up changing versions until 2021.2.x is released. Interestingly the current version we’re running (2020.4.4) was chosen because it resolved a similar issue in the past where a large number of tenants caused a critical failure. While this does highlight that this is something I need to test for after each version upgrade will any teams on your end include high tenant usage in pre-release testing going forwards? Our current production environment has almost 200 tenants, do you have a maximum supported number of tenants?

Tom

Hi @t.illingworth01,

Apologies for the delayed response here!
It’s good to hear you’re able to hold out until you can get to 2021.2.
I don’t believe there is a maximum number of supported tenants.

We do run testing for high tenant usage. However, I don’t believe it’s for every build.
I appreciate that if you’ve experienced a similar problem in the past, then it’s less than ideal to re-encounter this in a different, more recent version.

Our investigations have discovered that it’s possible to work around the issue by separating the tenants with tags and staggering the deployments in batches. The issue also appears to manifest more if the SQL server is low on resources and there are large variable sets as these get saved during deployment creation.

I’ll be happy to pass on your feedback regarding pre-release testing.

Reach out if you have any questions or issues in the future. We’re always happy to help!

Kind Regards,
Adam

Thank you for the help on this. I did consider splitting tenants based on tags but I’m certain at least one would end up missing a deployment at some point in the future due to human tagging error so decided against it

Kind Regards,

Thomas

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.