[Solved] Colossus unresponsive between 01:00 and 09:30 on 2021-06-30.

The Slurm queue system was unresponsive between approximately 01:00 and 09:30 on 2021-06-30, indicated by this Slurm error: "slurm_load_jobs error: Socket timed out on send/recv operation"

No jobs have started in that period, but running jobs should not have been affected. If you had running jobs in this period we advice you to check your job results for errors.

The issue has been resolved.

Published June 30, 2021 9:52 AM - Last modified June 30, 2021 9:52 AM