JobStaq - JobStaq offline – Incident details

JobStaq offline

Resolved
Operational
Started 9 days agoLasted about 19 hours

Affected

Web interace

Major outage from 1:51 AM to 7:55 AM, Degraded performance from 7:55 AM to 8:25 PM

API

Major outage from 1:51 AM to 7:55 AM, Degraded performance from 7:55 AM to 8:25 PM

Customer portals

Major outage from 1:51 AM to 7:55 AM, Degraded performance from 7:55 AM to 8:25 PM

Updates
  • Resolved
    Resolved

    All systems are back to regular operations.

    Root cause:

    A scheduled automated task to apply the latest TLS security certificates to our load balancer began at 2:45AM. At 2:51AM, the changes were applied to the load balancer, however, failed to save correctly, which resulted in all HTTPS traffic failing to be served. Automatic alerts did not reach an engineer due to iOS sleep mode being incorrectly configured to silence these notifications, resulting in a delay to notice the issue.

    On discovering the issue, traffic was immediately redirected away from the load balancer to a single backend server, to reduce user-facing impact of this issue. As of 8:55AM, all user requests were being served as usual. We encountered some delays in getting services fully restored, due to lacking error messaging from our infrastructure provider's interface. The load balancer HTTPS components were deleted and re-deployed from scratch, which remedied the issue.

  • Identified
    Identified

    A temporary fix has been put in place to bring things online while we identify the root cause of the problem. At this stage, all user-facing impact should be resolved, although the affected components are under "degraded performance" as our automatic fault tolerance is no longer in effect, until the load balancer is healthy again.

  • Investigating
    Investigating

    HTTPS traffic towards our API, web interface, and customer portals is currently not being served.