We had a rough deployment last night and I’m looking for suggestions on how I can make our Octopus server better able to support large scale deployments?
Our Octopus server is a VM running Windows 2012 with 2 CPU’s and 6GB of RAM. We’re running 745 tentacles and I’m guessing that 675 are in the production environment.
The deployment last night involved around 500 of the production tentacles. The problems started when the deploy into the largest group, about 400 tentacles, started outputting a higher number of log messages than usual. About 80 to 100 extra messages per tentacle. This brought Octopus to a halt. The server was unresponsive and we eventually rebooted it after waiting a couple of hours to see if it was just a backlog issue. After rebooting I disabled the anti-virus which helped performance. There was a backlog of 100,000 log messages to process and we had to cancel about 20 tasks that been running for 2 hours without any messages from the tentacles.
After all that we were able to rerun the failed deployments one at a time. What usually takes a couple of hours took about seven. Needless to say my operations people aren’t happy.
The log message processing seems to be the key. I saw another thread where the server had 8 CPUs. Is log message processing CPU bound?
Do I need to setup some exclusions for the anti-virus?
Do I need the suggested feature to cap the number of concurrent installs? Do I need some other feature?
We’re running 2.5.8.447. Would upgrade to 2.5.12.666 help?