How do I set-up Octopus Deploy for Disaster Recovery?

mark.harrison · 17 October 2019 15:28

We have some general information on planning out a Disaster Recovery procedure in the following Backup and Restore documentation page:

However, in order to have Octopus Deploy Server up and running in a DR scenario, you will need at least the following:

Backup of the File system

Typically, this will consist of three main Octopus folders:

Artifacts -> usually located in C:\Octopus\Artifacts
TaskLogs -> usually located in C:\Octopus\TaskLogs
Packages -> usually located in C:\Octopus\Packages

It’s also worth noting that you could have the 3 folders above configured as a network share, and then leverage a file replication technology (such as DFS) to keep those files in sync at your DR site.

You can configure this using the command line:

Octopus.Server.exe path --artifacts \\Octoshared\OctopusData\Artifacts
Octopus.Server.exe path --taskLogs \\Octoshared\OctopusData\TaskLogs
Octopus.Server.exe path --nugetRepository \\Octoshared\OctopusData\Packages

Please see here for further considerations when using Shared Storage.

Backup of your Master Key
Without keeping a backup of your master key, you won’t be able to make use of your Octopus database backups, since there is no way to decrypt any sensitive values stored within it.
Take the master key and store it in a secure location, for example, a Password Manager.

Backup of your SQL database
Most of the data and settings used by Octopus are stored in the SQL database. Careful consideration should be made for the type of recovery model that is acceptable to your company. If you want in-time restore capabilities for Octopus, you would probably want to configure your solution to be backing-up every 15-30 minutes.

Whatever implementation of a DR plan you choose is ultimately up to you.
Here are a few ideas to get you started:

A cold-standby. You have Octopus configured to use the network shares as described above, replicating to your DR location. You would then have Octopus Server installed on another VM at your DR recovery site but not running . You could have the service configured for a database located within your DR site where data is being replicated using SQL HA. When DR is invoked, replication is stopped (either manually or via a script).
Full backup and restore. This is similar to the cold-standby but the replication is essentially done “manually” from backups when the disaster hits. You would install Octopus on a fresh VM in your DR site, restore the database from a backup, copy the files into the configured folders. For this option, you need the master key to be able to decrypt the Sensitive values stored within the database backup.
You could take a look at storing Octopus configuration in code. Octopus is API-first, so anything you can do in the UI can be automated. You could create a new Octopus Server using a terraform template when a disaster strikes.
Live/Live using Octopus Deploy’s High Avalability. You’d leave Octopus Deploy running in the DR site running, but you’d drain the nodes of the servers running in the DR site. When it comes time for a DR event, you’d drain the node from the primary server, and disable the drain on the DR server. The downside to this is you will need to keep both servers running on the same Octopus Deploy version.

Test your DR plan!
Lastly, the best thing you can do to ensure your DR plan works well is to practice it multiple times to ensure that in the unlikely event were to happen and you needed to invoke disaster recovery, it’s not a painful experience.

Some other links which you may find useful:

https://octopus.com/docs/administration/high-availability
https://octopus.com/docs/administration/managing-infrastructure/server-configuration-and-file-storage/moving-octopus-server-folders
Treating your Octopus Server like cattle - https://www.youtube.com/watch?v=bYrNx_gypsE