ML nodes
The AI HUB provides resources and services for machine learning and deep learning tasks at UiO. This page describes the available resources, how to get access to them, how to use them and how to get support in using them.
Available hardware resources
Name | Status |
CPUs/ RAM(GiB) |
GPU | Shared home area | OS and software | Comments |
---|---|---|---|---|---|---|
ml1.hpc.uio.no ml2.hpc.uio.no ml3.hpc.uio.no |
Production | 28 cores (Intel Xeon)/128 | 4 X RTX2080Ti | Yes | RHEL 8.3 with module system |
|
ml4.hpc.uio.no |
Production | 32 cores (AMD)/128 | 2 X AMD Vega 10 XL/XT | Yes | RHEL 8.3 with module system |
|
ml6.hpc.uio.no ml7.hpc.uio.no |
Production | 32 cores (AMD)/128 | 8 X RTX2080Ti | Yes | RHEL 8.3 with module system |
|
ml8.hpc.uio.no | Production | 2x 48 core (AMD) | 4 X Nvidia-A-100 | Yes | RHEL 8.3 with module system | |
bluemaster01 | Production | 32 cores Power9/512 | 4 x Nvidia V100 | Yes | RHEL 8.3 with module system |
|
How to get access
Apply for access at the following nettskjema.
How to login
The ml nodes are behind a jump host as a security measure. Which means that you need to be logged in to a UiO computer before you SSH to a ML node. You can achieve this in two ways.
- Login to a computer inside UiO network (e.g. smaug.uio.no)
- Login to the ml nodes from that computer
UIO-USER-NAME is your user name at University of Oslo
{MYUSER@laptop:~] $ ssh UIO-USER-NAME@smaug.uio.no [UIO-USER-NAME@smaug ~]$ ssh ml1.hpc.uio.no
You could combine the above two steps using the following command
ssh -J UIO-USER-NAME@smaug.uio.no UIO-USER-NAME@ml1.hpc.uio.no
Login problems.
If you could not login to ML nodes, this could mean many things. So if you send us a mail asking for help with only "I can not login, it is difficult to provide a solution. Please go through the list and see what information you should gather.
- Wrong username or password. For ML nodes you should use the UiO username and password. If you get the username-password combination wrong for more than three times, then your account would be blocked for that machine for one hour.
- Your password is case sensitive.
- Jump host. Make sure that you follow the jump host instructions above.
- Did you type the correct host-name. Please check the correct names in the above table (Available hardware resources).
- When sending support requests, please include the details below.
- Exact command you used to login with username used and hostname (the ML machine your are trying to login) . Never include password.
- Where are you login in from. Is it office ? from your laptop from home ?. Please send the IP address of the machine if you know how to get it (if you do not know what that is do not worry)
- If you are login from a terminal please send the full debug info. e.g.
- ssh -vvv MY_USERNAME@ml1.hpc.uio.no
Please note that you need to use the jump host when Uploading/Downloading files as well
How to load software
Module system
We use the Lmod module system for all AI hub machines. Please refer the modules document for details.
How to use Jupyter
Please see here for using jupyter with GPU support
How to install additional python packages
See the document: install-additional-python-packages
Home area
The HOME area is shared between ml1, ml2, ml3, ml4, ml6, ml7 and ml8. i.e. you will see the same content when you login into any of these machines.
The home area is backed-up each night. To recover files one can access /itf-fi-ml/home/.snapshots/<time stamp>/<username>
where your home folder has been backed up.
Using /scratch
for large datasets
Since the home area of the ML machines is shared, the performance might not be the fastest when working with large datasets. To accommodate such workflows, each ML node has its own private scratch folder where users can store data temporarily when working on it. The scratch folder is local to each machine so when logging in to different ML machines users will see different content.
To start using the scratch folder simply upload data to /scratch/users/<username>
and access it from here. The scratch area is useful if you need to read and/or write a lot of data to files.
There are currently no usage limits on the scratch folders, but we retain the right to remove data that is not in active use when the scratch area of a machine is nearing full. If your workflow requires writing a lot of data to files we recommend you read and write from the scratch area and then move the results to your home area when the experiment is done.
Upload/Download files
Software requests
If you need additional software or want us to upgrade an existing software package, we are happy to do this for you (or help you to install it yourself if you prefer that). In order for us to get all the relevant information and take care of the installation as quick as possible, we have created a software request form. After filling in the form a ticket will be created in RT and we will get back to you with the installation progress.
https://nettskjema.no/a/usit-sw-request
Key changed when trying to log in
UiO has updated the SSH hostkey policy which decides the appropriate hashing function to use during SSH key exchange. For some of you this might mean that your previous setup is now telling you that the key has changed and that you might be a victim of a man-in-the-middle attack. When encountering such messages please check trusted sources to ensure that you are not being attacked and the proceed from there.
In the current case it simply means that one needs to refresh the hostkey of the ML node in question. To do this, use the following SSH commands:
ssh-keygen -R ml1.hpc.uio.no
(exchange for the applicable ML node)- Connect again as usual, through SSH, and paste the corresponding key from the table below.
ML node | RSA key | ED25519 key |
---|---|---|
ML1 | SHA256:pAw0j5DjOvXrgKO3DlGvTvF3EAzaxw2/tEPGaygayGw | SHA256:rMc5mseHIDPcwPZCWlE3fAEK155ad8sJ7kQUSgVPWVY |
ML2 | SHA256:yogcKQBA8uZDap7bIqS8xtwhzXxM3JI7UyEHCItzLJU | SHA256:/QaY71pRnimBkUWb+H/NGv4b+EGf91sQdk1h8Z3/kKU |
ML3 | SHA256:9ETM32UFHBJC6BQfmqnE0R0ECQts/RYQGDNN/lqUmYs | SHA256:PXTnLgrMueFcPGuKgb8TyP2s+eBmeXJzSvEEb7rq19A |
ML4 | SHA256:dv5VKLHZ/IIAmj5aCUqQ5IAmVgnq/EXcyQcZjoRBAjk | SHA256:zHr4djVT4zu2fGlI6pdjAH9yOjG1a1ifwOwxe8GA1A8 |
ML6 | SHA256:0zRe9JqlhDZwDgJwdXBNF6KIfs7Y81GaiEMx7cdL0iw | SHA256:2o+eqB6cltnXuMXTSv+87xSijdtBSisRts840hAs9iQ |
ML7 | SHA256:I1FeqkoKGsUEJ8B7jNQZsMVQjXsct7oCRTvKDvXqIJk | SHA256:QTpQ3sY5rF84gQDMend8KhXP6Y7aWEhJ/Rgl5wQcRC4 |
Bluemaster01 | SHA256:Sn7I6tHz9OeL9PkBLorS24LrILUMbH4l5fydaTlzl+g | SHA256:biNo079CAkTDvPhQzNL9yVWaGRkfcff9eMSBFz1DLpQ |
Citations and acknowledgements
Please use the following format when acknowledging ML nodes, if you use them in your research.
Machine learning infrastructure (ML Nodes), University Centre for Information Technology, University Of Oslo, Norway.
Contact
If you need help with the ML nodes please contact:
itf-ai-support@usit.uio.no