Norwegian version of this page

Operational log

Published May 20, 2019 8:47 AM

We are experiencing issues with some services, which may lead to some users being unable to login to TSD, and get a 504 Gateway Timeout Error. We are investigating the cause of this and working on fix.

Published May 15, 2019 3:57 PM

https://data.tsd.usit.no is currently not allowing logins. Apart from this services should be running as expected, and we are working on fixing the issue as quickly as possible.

--
Best regards,
TSD

Published May 13, 2019 4:00 PM

Dear TSD users,

Unfortunately, the machine exporting the network file system to the submit-hosts crashed again, causing these machines to now hang.

As a result, we will have to restart machines which are now hanging to allow you to log in to these machines.

We're working towards getting the issue fixed, and apologize for the frequent crashes around the /cluster mounts lately. Finding a permanent solution to this issue is currently at the top of our priorities.

 

-- 
Our sincere apologies,
The TSD team

Published May 10, 2019 3:39 PM

There was temporary outage for some of TSD services between 15:13 and 15:20 today due to storage issues.

-- 
Our sincere apologies,
The TSD team

Published May 8, 2019 2:16 PM

Dear TSD users,

Unfortunately, the machine exporting the network file system to the submit-hosts crashed, causing these machines to now hang.

As a result, we will have to restart machines which are now hanging to allow you to log in to these machines.

We're working towards getting the issue fixed, and apologize for the frequent crashes around the /cluster mounts lately. Finding a permanent solution to this issue is currently at the top of our priorities.

 

-- 
Our sincere apologies,
The TSD team

Published May 6, 2019 10:47 AM

The machine doing the NFS-export of the /cluster/ filesystem crashed Saturday night. This will cause some login-machines to freeze, and make /cluster and modules unavailable to submit-hosts.

We're looking into it.

-- 
Best regards,
TSD

Published May 2, 2019 11:21 AM

A network error has prevented jobs from starting on Colossus the last day. The error has been fixed, and jobs are starting as they should again. The error can have affected running jobs, so you might want to check the status of your jobs.


-- 
The TSD Team

Published Apr. 30, 2019 10:02 AM

There was a short-lived issue with Windows login, which will have interrupted active login sessions. Apologies for the disturbance, this has now been solved.

Published Apr. 26, 2019 9:59 AM
Published Apr. 16, 2019 1:04 PM

ThinLinc is currently not functioning properly.
You might be able to log in and get a session, but it will quickly freeze and stop working. We're aware of the problem and working on fixing it as quickly as possible.

The Windows platform in TSD is still working, and as a workaround, please use this link to login to TSD.

Our apologies for the inconvenience.

-- 
Best regards,
The TSD Team

Published Apr. 8, 2019 10:51 AM

2019-05-08

  • A bug in qsumm resulting in wrong numbers shown for running or pending has been fixed.

2019-04-24

  • A bug in cost that made it show incorrect numbers has been fixed.
  • /cluster/projects/pNN/cluster_disk_usage.txt is now updated nightly again.

2019-04-11

  • At least one of the old hugemem nodes are now in production on Colossus 3.

2019-04-10

  • MPI-jobs with OpenMPI should work properly now. The preferred way to start MPI programs is with srun. The documentation has been updated to reflect this.

2019-04-09

  • The cost command has been fixed now
  • "srun" in job scripts produced errors like /var/spool/slurmd/job01122/slurm_script: line 21:...
Published Apr. 5, 2019 11:16 AM

*UPDATE*

/cluster should now be available on the new p<NUM>-submit.tsd.usit.no hosts. You can reach these by SSH from your login hosts, and from Windows you can access it using the Putty.

Modules are also available on these hosts, so that you can test your pipelines with the new software.

*END UPDATE*

 

Dear TSD users,

Unfortunately, something has happened with the mounts of /cluster on the new RHEL7 submit hosts created for the new Colossus cluster. We're working on it and will update this notice as soon as it is fixed.

Our apologies for the inconvenience.

--
Best regards,
The TSD team

Published Mar. 27, 2019 11:13 AM

The new cluster, Colossus 3, is now up, and the old cluster is turned off.

Published Mar. 18, 2019 9:09 AM

We are experiencing issues with some services, which may lead to some users being unable to login to TSD. We are investigating the cause of this and working on fix.

Published Mar. 13, 2019 11:41 AM

Some of our users are currently experiencing problems with Modules on Linux VM.

We are working on resolving this issue. 

Published Mar. 5, 2019 2:23 PM

The TSD self service portal https://selfservice.tsd.usit.no is currently unavailable, and attempted logins will result in a 502 error.

We are investigating, and will update this message as we make progress.

-- 
Best regards,
The TSD-team

Published Feb. 25, 2019 9:49 AM

Dear TSD User

We are experiencing issues with thinlinc login. We are working to fix this. Until then, it will not be possible to login to linux VMs.

Regards

TSD

Published Feb. 18, 2019 9:27 AM

We are experiencing issues with one of the the BeeGFS file system nodes at the moment. To fix this we will try to restart a part of the IO system, which may cause hangs on VMs, and may cause parts of /cluster to be unavailable. If this does not work, then we will have to reboot the node.

Published Feb. 12, 2019 9:55 AM

There will be a scheduled upgrade of PostgreSQL to V11 on 13.02.2019, between 07:00 - 15:00 CET.

During this downtime, the applications running PostgreSQL will not work, as we will restart the database in your project. Other services inside TSD will continue working as normal.

Published Jan. 25, 2019 9:24 AM

We're currently experiencing issues with the export of /cluster from Colossus. Something went wrong during our nightly builds and we are working on solving the issue.

--
The TSD-team

Published Jan. 22, 2019 9:12 AM

Update to web-based file uploads. After this change files uploaded with https://data.tsd.usit.no and the tsd-api-client will be located in /data/durable/file-import/pXX-member-group, instead of the previous location: /data/durable/file-api.

Published Jan. 7, 2019 3:01 PM

The SPSS license is currently not valid. We are trying to update it as soon as possible. Thanks for your patience.

TSD

Published Jan. 7, 2019 1:32 PM

Dear TSD-users,

Unfortunately, our services are currently unavailable due to a DNS-issue. We're aware of the problem and working on solving this as quickly as possible.

Our apologies for the inconvenience.

Best regards,
The TSD-team.

Published Jan. 2, 2019 10:30 AM

There are currently problems with the 2 factor authentication, which makes new logins to TSD impossible. (Existing connections are not affected.)

A side effect is also that syncronization of new QR code keys has stopped.

Update: The reason for the downtime was a failed synchronization. Everything should be working now, including newly generated QR codes. If you are still experiencing problems, please contact us.

Published Dec. 10, 2018 9:48 AM

We will perform a scheduled system upgrade of Colossus starting at 2019-01-03, 10:00 until 2019-01-04, 10:00.

UPDATE (2019-01-04, 12:35)

The upgrade of Colossus is complete.

UPDATE (2019-01-04, 09:45)

We are experiencing a slight delay, and hopefully Colossus will be available for use by 13:00 CET.

 

TSD@USIT