ML nodes

The AI HUB provides resources and services for machine learning and deep learning tasks at UiO. This page describes the available resources, how to get access to them, how to use them and how to get support in using them.

Available hardware resources

Name Status

CPUs/

RAM(GiB)

GPU Shared home area OS and software Comments

ml1.hpc.uio.no

ml2.hpc.uio.no

ml3.hpc.uio.no

Production 28 cores (Intel Xeon)/128 4 X RTX2080Ti Yes RHEL 8.3 with module system

 

 

ml4.hpc.uio.no

Production 32 cores (AMD)/128 2 X AMD Vega 10 XL/XT Yes RHEL 8.3 with module system

 

 

ml6.hpc.uio.no

ml7.hpc.uio.no

Production 32 cores (AMD)/128 8 X RTX2080Ti Yes RHEL 8.3 with module system

 

ml8.hpc.uio.no Production 2x 48 core (AMD) 4 X Nvidia-A-100 Yes RHEL 8.3 with module system  
bluemaster01 Production 32 cores Power9/512 4 x Nvidia V100 Yes RHEL 8.3 with module system

 

 

How to get access

Apply for access at the following nettskjema.

How to login

The ml nodes are behind a jump host as a security measure. Which means that you need to be logged in to a UiO computer before you SSH to a ML node. You can achieve this in two ways.

  1. Login to a computer inside UiO network (e.g. smaug.uio.no)
  2. Login to the ml nodes from that computer

 UIO-USER-NAME is your user name at University of Oslo

{MYUSER@laptop:~] $ ssh UIO-USER-NAME@smaug.uio.no

[UIO-USER-NAME@smaug ~]$ ssh ml1.hpc.uio.no

You could combine the above two steps using the following command

 ssh -J UIO-USER-NAME@smaug.uio.no  UIO-USER-NAME@ml1.hpc.uio.no

 

Login problems.

If you could not login to ML nodes, this could mean many things. So if you send us a mail asking for help with only "I can not login, it is difficult to provide a solution. Please go through the list and see what information you should gather.

  1. Wrong username or password. For ML nodes you should use the UiO username and password. If you get the username-password combination wrong for more than three times, then your account would be blocked for that machine for one hour.
  2. Your password is case sensitive.
  3. Jump host. Make sure that you follow the jump host instructions above.
  4. Did you type the correct host-name. Please check the correct names in the above table (Available hardware resources).
  5. When sending support requests, please include the details below.
    1.  Exact command you used to login with username used and hostname (the ML machine your are trying to login) . Never include password.
    2. Where are you login in from. Is it office ? from your laptop from home ?.  Please send the IP address of the machine if you know how to get it (if you do not know what that is do not worry)
    3. If you are login from a terminal please send the full debug info. e.g.
      1. ssh -vvv MY_USERNAME@ml1.hpc.uio.no

 

Please note that you need to use the jump host when Uploading/Downloading files as well

How to load software

Module system

We use the Lmod module system for all AI hub machines. Please refer the modules document for details.

How to use Jupyter

Please see here for using jupyter with GPU support

How to install additional python packages

See the document: install-additional-python-packages

Home area

The HOME area is shared between ml1, ml2, ml3, ml4, ml6, ml7 and ml8. i.e. you will see the same content when you login into any of these machines.

The home area is backed-up each night. To recover files one can access /itf-fi-ml/home/.snapshots/<time stamp>/<username> where your home folder has been backed up.

Using /scratch for large datasets

Since the home area of the ML machines is shared, the performance might not be the fastest when working with large datasets. To accommodate such workflows, each ML node has its own private scratch folder where users can store data temporarily when working on it. The scratch folder is local to each machine so when logging in to different ML machines users will see different content.

To start using the scratch folder simply upload data to /scratch/users/<username> and access it from here. The scratch area is useful if you need to read and/or write a lot of data to files.

There are currently no usage limits on the scratch folders, but we retain the right to remove data that is not in active use when the scratch area of a machine is nearing full. If your workflow requires writing a lot of data to files we recommend you read and write from the scratch area and then move the results to your home area when the experiment is done.

Upload/Download files

https://www.uio.no/tjenester/it/forskning/kompetansehuber/uio-ai-hub-node-project/it-resources/ml-nodes/file-transfer.html

Software requests

If you need additional software or want us to upgrade an existing software package, we are happy to do this for you (or help you to install it yourself if you prefer that). In order for us to get all the relevant information and take care of the installation as quick as possible, we have created a software request form. After filling in the form a ticket will be created in RT and we will get back to you with the installation progress.

https://nettskjema.no/a/usit-sw-request

Key changed when trying to log in

UiO has updated the SSH hostkey policy which decides the appropriate hashing function to use during SSH key exchange. For some of you this might mean that your previous setup is now telling you that the key has changed and that you might be a victim of a man-in-the-middle attack. When encountering such messages please check trusted sources to ensure that you are not being attacked and the proceed from there.

In the current case it simply means that one needs to refresh the hostkey of the ML node in question. To do this, use the following SSH commands:

  • ssh-keygen -R ml1.hpc.uio.no (exchange for the applicable ML node)
  • Connect again as usual, through SSH, and paste the corresponding key from the table below.
ML node RSA key ED25519 key
ML1 SHA256:pAw0j5DjOvXrgKO3DlGvTvF3EAzaxw2/tEPGaygayGw SHA256:rMc5mseHIDPcwPZCWlE3fAEK155ad8sJ7kQUSgVPWVY
ML2 SHA256:yogcKQBA8uZDap7bIqS8xtwhzXxM3JI7UyEHCItzLJU SHA256:/QaY71pRnimBkUWb+H/NGv4b+EGf91sQdk1h8Z3/kKU
ML3 SHA256:9ETM32UFHBJC6BQfmqnE0R0ECQts/RYQGDNN/lqUmYs SHA256:PXTnLgrMueFcPGuKgb8TyP2s+eBmeXJzSvEEb7rq19A
ML4 SHA256:dv5VKLHZ/IIAmj5aCUqQ5IAmVgnq/EXcyQcZjoRBAjk SHA256:zHr4djVT4zu2fGlI6pdjAH9yOjG1a1ifwOwxe8GA1A8
ML6 SHA256:0zRe9JqlhDZwDgJwdXBNF6KIfs7Y81GaiEMx7cdL0iw SHA256:2o+eqB6cltnXuMXTSv+87xSijdtBSisRts840hAs9iQ
ML7 SHA256:I1FeqkoKGsUEJ8B7jNQZsMVQjXsct7oCRTvKDvXqIJk SHA256:QTpQ3sY5rF84gQDMend8KhXP6Y7aWEhJ/Rgl5wQcRC4
Bluemaster01 SHA256:Sn7I6tHz9OeL9PkBLorS24LrILUMbH4l5fydaTlzl+g SHA256:biNo079CAkTDvPhQzNL9yVWaGRkfcff9eMSBFz1DLpQ

Citations and acknowledgements

Please use the following format when acknowledging ML nodes, if you use them in your research.

Machine learning infrastructure (ML Nodes), University Centre for Information Technology, University Of Oslo, Norway.

Contact

If you need help with the ML nodes please contact:
itf-ai-support@usit.uio.no

Published Nov. 16, 2020 11:24 AM - Last modified June 3, 2022 2:01 PM