JobStaq - System outage – Incident details

System outage

Resolved
Major outage
Started about 1 month agoLasted 39 minutes

Affected

Web interace

Major outage from 10:51 AM to 11:15 AM, Operational from 11:15 AM to 11:30 AM

API

Major outage from 10:51 AM to 11:15 AM, Operational from 11:15 AM to 11:30 AM

Customer portals

Major outage from 10:51 AM to 11:15 AM, Operational from 11:15 AM to 11:30 AM

Updates
  • Resolved
    Resolved

    The cause of this outage has been identified, and is due to database instances having crashed due to insufficient resources.

    JobStaq is served from a database setup consisting of 3 servers, and the system continues to operate without issues if one server crashes. At 2024-10-14T10:50:35.559404Z, server 3 of our UK cluster crashed due to insufficient resources, and at 2024-10-14T10:50:35.566518Z this was identified, with server 2 taking over responsibility of the primary (leader) database server. However, shortly after at 2024-10-14T10:51:51.217720Z, this server went offline before server 3 was able to recover - again due to running out of resources, which caused the database to be in a state where manual input is required to bring the system back online.

    We'll be performing upgrades on all database servers this evening to increase the resources avaialble, to ensure this doesn't occur again.

  • Monitoring
    Monitoring

    The database has been brought back online, we're continuing to investigate what happened to ensure the issue is resolved and prevent this occurring again.

  • Investigating
    Investigating

    We've run into an issue with our database. This is being actively investigated, and updates will be provided as we know more about the issue.