AI HUB

The AI HUB provides resources and services for machine learning and deep learning tasks at UiO. This page describes the available resources, how to get access to them, how to use them and how to get support in using them.

Available hardware resources

Three servers ml{1,2,3}.hpc.uio.no each with 28 CPU cores, 128 GiB RAM, 4 x NVIDIA RTX 2080 Ti and about 160 GiB local SSD for scratch data (/work) and home directories local to each machine. In addition there is a shared BeeGFS on Demand (Beeond) directory mounted as /shared, this is about 17 TB in size and has good performance, and can be used as scratch storage during work. Note! There is no backup for any of the file systems. Because /work and /home are on a relatively small local disk, we recommend that you only store small data there. Large data should be stored under /shared.

Each GPU provides about 29 TFLOPS performance and 500+ GiB/s memory bandwidth. Hence, the 12 GPUs together provide nearly double the performance of the whole Abel cluster (or 1/3 of Fram). Well, almost as the numbers are with lower precision than what's run on Abel and Fram.

Important note! Consider the servers as a beta/experimental service providing an opportunity to get used to such hardware and the tools for machine learning and deep learning. There is no module system, software is installed system wide with only one version available. Intentionally, there are few restrictions on the use of them currently. Only interactive work with no batch queue system. Thus, we kindly ask everyone to share them with fellow users. We may need to change the usage policy and how the servers are operated based on experiences and needs (e.g., if the AI hub-node project needs these resources).

Feedback on the servers

As this is a beta/experimental service, we're very much interested in learning from you what works well, what not and how we could improve the service. Please, contact us at itf-ai-support@usit.uio.no.

How to get access

Currently (mid Dec 2018), you need an account on Abel. If you don't have one, please see Getting Access.

How to use them

Login from anywhere

ssh ml1.hpc.uio.no
ssh ml2.hpc.uio.no

or

ssh ml3.hpc.uio.no

Please, process data on the local /work/users/YOUR_USERNAME or the shared /shared/users/YOUR_USERNAME. The former filesystem is small (~ 160 GiB for data/code and $HOME). Please, clean up any data not needed.

There is also the Beeond global parallel file system mounted under /shared. This is a temporary file system with resonably large space (approx 17 TB) with good performance and is suited for scratch storage etc during work. Being a none persistent file system it's important to copy data back to your home directory after work.

Software

Currently the following software is installed. If you need different software, please contact us at itf-ai-support@usit.uio.no

  • CUDA 9.0 and 10.0, cuDNN 7.4.1.5
    • CUDA is installed under /usr/local/cuda which is a symlink to /usr/local/cuda-10.0.
    • See available GPUs and their current use with
      nvidia-smi
      
  • Miniconda with an environment named python3.6 which contains several popular packages. To get access to conda do
    . /opt/conda/etc/profile.d/conda.sh

    Add this line to your shell startup script, e.g., .bash_profile, in order to have it available right after you logged into an ml server.

    Do conda env list to get an overview of available environments.

    thomarob@ml1:~$ conda env list
    # conda environments:
    #
    base      * /opt/conda
    python3.6   /opt/conda/envs/python3.6
    python3.7   /opt/conda/envs/python3.7
                /shared/users/thomarob/conda/bm_tf1.14
    

    Do conda activate python3.6 to use the environment named python3.6 (when the command was successful you may recognise the changed prompt showing the active environment).

    thomarob@ml1:~$ conda activate python3.6
    (python3.6) thomarob@ml1:~$

    Do conda list to obtain a list of all packages installed under the current environment. Many packages are installed in python3.6, for example,

    tensorflow   1.12.0
    scikit-learn 0.20.2
    keras        2.2.4
    cudatoolkit  9.2
    

    In case you wish to create your own environment please do create it under /shared with the command

    conda create --prefix /shared/users/$USER/conda/tf1.13

    Don't create it under your $HOME because the disk for the home directories is very small and conda environments can consume large disk space.

    You may wish to pin certain packages to specific versions. For example, do so by

    echo 'tensorflow-gpu 1.13.*' >> /shared/users/$USER/conda/tf1.13/conda-meta/pinned

    Then, when you install tensorflow versions will be limited to 1.13.x, for example, try

    conda install tensorflow-gpu

    You can have multiple packages pinned to specific versions. Note, however, pinning may result in no set of packages satisfying your specification and package dependencies.

  • TensorFlow 1.12.0
    • Use via Python
      $ python3
      Python 3.5.2 (default, Nov 12 2018, 13:43:14) 
      [GCC 5.4.0 20160609] on linux
      Type "help", "copyright", "credits" or "license" for more information.
      >>> import tensorflow as tf
      >>> print(tf.__version__)
      1.12.0
      
  • Keras
    • Use via Python
      $  python3
      Python 3.5.2 (default, Nov 12 2018, 13:43:14) 
      [GCC 5.4.0 20160609] on linux
      Type "help", "copyright", "credits" or "license" for more information.
      >>> from keras.models import Sequential
      Using TensorFlow backend.
      
  • MXNet 1.5.0
    • Use via Python
      $ python3
      Python 3.5.2 (default, Nov 12 2018, 13:43:14) 
      [GCC 5.4.0 20160609] on linux
      Type "help", "copyright", "credits" or "license" for more information.
      >>> import mxnet as as
      >>> print(mx.__version__)
      1.5.0
      
  • PyTorch 1.0.0
    • Use via Python
      $ python3
      Python 3.5.2 (default, Nov 12 2018, 13:43:14) 
      [GCC 5.4.0 20160609] on linux
      Type "help", "copyright", "credits" or "license" for more information.
      >>> import torch
      >>> print(torch.__version__)
      1.0.0
      
  • Python virtualenv
    The example below shows how to setup a python virtual environment (virtualenv) and install a specific numpy version in it 

#Create a virtualenv with python 3 and some basic tools
$ virtualenv -p python3 myvenv 

# Activate the environment 
$ source myvenv/bin/activate 

#To verify that the python 
#available from the virtualenv is the one being used 
$ which python  

#This should point to the path of the python binary 
#in the virtualenv
e.g.
/cluster/home/..../myvenv/bin/python 

#install numpy version 1.14 in this environemnt 
$ pip install numpy==1.14 

#Test that the correct numpy is used 
$ $python -c "import numpy as np; print(np.__version__); print(np.__file__);"
1.14.0
/cluster/...myvenv/lib/python3.5/site-packages/numpy/__init__.py

#To deactivate the virtualenv 
$ deactivate
  • Jupyter notebooks 5.7.4
  • Python 3.5, R 3.2.3 and julia 0.4.5
  • Intel compiler 2019.1
    • To use it, do
      source /opt/intel/bin/compilervars.sh intel64
  • PGI/Portland compiler 18.10, with full OpenACC support
    • To use it, do (this should be enough)
      export PATH=$PATH:/opt/pgi/linux86-64/18.10/bin/
      export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/pgi/linux86-64/18.10/lib
      

Getting support

Please, send any support request to itf-ai-support@usit.uio.no

By Thomas Röblitz
Published Dec. 13, 2018 2:59 PM - Last modified Sep. 25, 2019 9:56 PM