The cause of this outage has been identified, and is due to database instances having crashed due to insufficient resources.
JobStaq is served from a database setup consisting of 3 servers, and the system continues to operate without issues if one server crashes. At 2024-10-14T10:50:35.559404Z, server 3 of our UK cluster crashed due to insufficient resources, and at 2024-10-14T10:50:35.566518Z this was identified, with server 2 taking over responsibility of the primary (leader) database server. However, shortly after at 2024-10-14T10:51:51.217720Z, this server went offline before server 3 was able to recover - again due to running out of resources, which caused the database to be in a state where manual input is required to bring the system back online.
We'll be performing upgrades on all database servers this evening to increase the resources avaialble, to ensure this doesn't occur again.