Octopus backup validation

gregory.blanc · 8 July 2020 10:50

Hello Octopus team,

I need to validate every month that Production backups (SQL+Datas) can be use, by restoring them to a Staging environment (other server and database couple) before planning Production Server update.

I created a Runbook with those steps :

Stopping Staging server instance
Renaming instance home folders
Manual step giving time to :
- Restore Staging Database with Production dump
- Restore Datas from Production to Staging server
Moving back Staging instance configuration file
Start Staging instance
Use API calls to the Staging instance :
- Change SMTP configuration.
- Stop health check tasks and modify machine policy to set never to health check.
- Suppress all triggers from all Spaces (to avoid unnatended executions from Staging server to Production objects)
- Restore a specific target
Use Octopus.Migrator.exe to import a Staging specific project (with different checks)

This is working quite nicely but…

But I’m facing 2 main issues :

Sometimes if I start the Octopus Server Manager during the time I moved the home directories it says no instance is configured even if after I stop it and put back directories. And this seems not to be happening when I don’t start the Manager during that process. Does Server manager is changing Windows Registry at start? Or some other process? VERY annoing for my “workflow”
The API call to suppress triggers seems to be always after the automated excution of triggers at start of the Instance. I need to start Instance to call API but it directly start triggers that have been configured (in the Production server instance that have been restored to Stagging instance).

Do you have any ideas of ways to avoid those problems or maybe a best way to order all that?

Thank you

PS : I have valid licence but see no place to put it as reference.

Derek_Campbell · 8 July 2020 12:51

Hi @gregory.blanc,

Thanks for getting in touch.

That’s an impressive automated backup and restore setup you have going there. I think there might be an easier way to set this up though, so you get a better experience.

You could enable Maintenance mode and set all nodes to drain mode so that no tasks run on start-up in Production. This also stops tasks at the Space level as well. This would likely resolve your issue around tasks being executed on Staging.

For your issue on start-up, I would also recommend deleting your Staging instance before any restoration. We don’t store anything in registry, but we do use C:\ProgramData\Octopus\OctopusServer\Instances. This is where previous instances will likely be getting stored and causing issues on startup.

Another approach would be to effectively automate this document on https://octopus.com/docs/administration/managing-infrastructure/moving-your-octopus/move-the-database-and-server with one additional key step. I’d put your nodes in to drain mode and maintenance mode. Drain mode means no tasks are run, including tasks such as health checks etc…

High level approach would be:

Delete Staging Instance
Put Octopus Production nodes into drain mode.
Place Octopus Production mode into maintenance mode.
Take a backup of the Production SQL Database
Take a copy of the Octopus Config file.
Restore the backup to the Staging SQL Database. You can do this as a new DB or overwriting the existing Database.
Copy the Production Files (artifacts, logs and packages) to the staging location to the same location. i.e
C:\Octopus\Artifacts etc…
Automate the installation of Octopus. Chocolatey is a good tool for this. choco install octopusdeploy --version=INPUTREQUIREDVERSION
Automate the installation https://octopus.com/docs/installation/automating-installation.

We have a Powershell script which helps here, which you could use and edit for your purposes. You will need to copy the config file to the new Staging Octopus server before running this script. You could do this using File System - Copy File Community step template. Alternatively, you could copy the full Production Octopus folder using File System - Backup Directory community step template.

param (
[Parameter(Mandatory=$true)]
[string]
$instanceName,

[Parameter(Mandatory=$true)]
[string]
$configFilePath

)

Write-Host “Setting up customer database, instance: ‘$instanceName’, configFilePath: ‘$configFilePath’”

$octopusExePath = “C:\Program Files\Octopus Deploy\Octopus\Octopus.Server.exe”

function exec
{
[CmdletBinding()]
param(
[Parameter(Position=0,Mandatory=1)][scriptblock]$cmd
)
&$cmd
if ($LASTEXITCODE -ne 0)
{
Write-Error “Command returned exit code $LASTEXITCODE. Command was: $cmd”
}
}

exec { & $octopusExePath delete-instance --instance $instanceName }

exec { & $octopusExePath create-instance --instance $instanceName --config $configFilePath --nologo --console }

exec { & $octopusExePath lost-master-key --instance $instanceName --iReallyWantToResetAllMySensitiveData --upgradeDatabase --scrubPii --iHaveBackedUpMyDatabase --skipCurrentMasterKeyTest }

exec { & $octopusExePath configure --instance=$instanceName --webCorsWhitelist=* --usernamePasswordIsEnabled=true --oktaIsEnabled=false --activeDirectoryIsEnabled=false }

exec { & $octopusExePath path --instance=$instanceName --nugetRepository=./Server/Packages --artifacts=./Server/Artifacts --taskLogs=./Server/TaskLogs }

exec { & $octopusExePath admin --instance=$instanceName --username=Admin --password=Password01! }

Start the Staging server Octopus instance. You should restore in Maintenance mode with all tasks set to drain mode meaning no tasks will run on your production Tentacles.

Please let me know what you think,

Thanks

Derek

gregory.blanc · 8 July 2020 16:05

Hi @Derek_Campbell,

Thanks taking time for such detailed answer,

I didn’t wanted to delete and recreate Staging Server instance from scratch every time, least actions least problems, but your proposition might do the trick to avoid my random issue. I will add it to my steps this way.

Concerning my other issue with triggers on the Staging instance your answer pointed out that we are making backups without activating Maintenance Mode previously, neither activating Drain on nodes. Because no one is supposed to change things during the time slots of our backups. Having Staging database restored from Production backups in this state will greatly help avoid to many “cleaning” steps.

Thanks for allowing me to take hindsight with good advices!

Gregory.

Derek_Campbell · 8 July 2020 17:38

Hi @gregory.blanc,

Thanks for the feedback.

Let me know how you get on, and if I can help further.

All the best,

Derek

system · 8 August 2020 17:38

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.