How can we better prepare for an incident like this so deployments are not left in an inconsistent state? If this would have happened in Production, this could have had a larger impact.
It looks like the Virtual Machine that was running your instance stopped responding and your instance was re-deployed. The timeline is (UTC times):
17:24:45 - Last Server log entry recorded
17:29:27 - An automated process failed to contact the server
17:31:00 - Our monitoring raised an alert
17:33:07 - AWS Terminated the EC2 Instance
17:33:52 - Provisioning of a new EC2 Instance started
17:48:00 - Our monitoring indicated the instance is now available again
17:50:47 - Provisioning completed
The best way I can think of to detect this happening is to create a subscription that listens to the Task Cancelled event and raises an event in a monitoring system somewhere. At the moment though, a dirty shutdown like what happened here does not raise the event required to trigger the subscription. I have raised and issue and will look at getting that into the product soon. The issue also shows the subscription configuration you would need.