All systems are back to regular operations.
Root cause:
A scheduled automated task to apply the latest TLS security certificates to our load balancer began at 2:45AM. At 2:51AM, the changes were applied to the load balancer, however, failed to save correctly, which resulted in all HTTPS traffic failing to be served. Automatic alerts did not reach an engineer due to iOS sleep mode being incorrectly configured to silence these notifications, resulting in a delay to notice the issue.
On discovering the issue, traffic was immediately redirected away from the load balancer to a single backend server, to reduce user-facing impact of this issue. As of 8:55AM, all user requests were being served as usual. We encountered some delays in getting services fully restored, due to lacking error messaging from our infrastructure provider's interface. The load balancer HTTPS components were deleted and re-deployed from scratch, which remedied the issue.