Backboards: 
Posts: 151

SlackerTalk 2018-02-08 Outage: Root Cause Analysis and Post Mortem -- (edited)

At about 7:30, the server rebooted due to scheduled maintenance with the ISP.

When it came up, things went poorly.

It seems that the HTTP server didn't load properly, which I fixed with a manual intervention.

When HTTP came back up, HTTPS was supposed to - but it didn't. This turned out to be because the firewall re-read its configuration from disk and said "Ignore everything except HTTP", so I had to manually add a rule to allow HTTPS.

Total downtime (including partial downtime): about 1 hour 8 minutes.

Root Cause: Configuration is not stable enough to withstand reboots.


Responses:
Post a message   top
Replies are disabled on threads older than 7 days.