Abel Newsletter #2, 2017
Fall course week, application deadline for CPU time through Notur, interesting conferences and external courses, along with the usual list of updated tools and applications now available on Abel or in the Lifeportal.
- USIT Seksjon for IT i forskning ITF(NO), or Department for research computing RC(EN), is responsible for delivering IT support for research at University of Oslo.
- The department's groups operate infrastructure for research, and support researchers in the use of computational resources, data storage, application portals, parallelization and optimizing of code, and advanced user support.
- The Abel High Performance Computing (HPC) cluster, Notur, and the NorStore storage resources are central components of the USIT IT support for researchers.
- Announcement of this news letter is done on the abel-users mailing list. All users with an account on Abel are automatically added to the abel-users list. This is mandatory. The news letter will be issued at least twice a year.
News and announcements
HPC course week
This November we have a shortened version of the course starting 02.11.17 12:00. The course is designed for Notur as well as local users. This event is especially suitable for scientists who wish to learn more about how they can use Abel computer cluster and the high performance facilities of Tjenester for Sensitive Data (TSD) for their research.
The course will have two sections, one section focusing on beginners and the other on advanced users. All participants are expected to have a very basic working knowledge of Unix. If you are new to UNIX, please participate in one of the Software carpentry (http://www.uio.no/english/services/it/research/events/) seminars before the start of the course.
Registration: Registration will open 01-09-2017 after the schedule is finalized. Please check programme page for more details
Questions? Ideas? Contact email@example.com.
NeIC training calendar
Looking for more training events? NeIC is maintaining a shared calendar for training events in the Nordics, see https://neic.no/training/ for more information.
New Notur allocation period 2017.2, application deadline 21 August 2017
Kindly reminder: If you have many CPU hours remaining in the current period, you should of course try to utilize them asap, but since many users will be doing the same there is likely going to be a resource squeeze and potentially long queue times. The quotas are allocated according to several criteria, of which publications registered to Cristin is an important one (in addition to historical usage). The quotas are based on even use throughout the allocation period. If you think you will not be able to spend all your allocated CPU hours, it is highly appreciated to notify firstname.lastname@example.org so that the CPU hours may be released. You may get extra hours if you need more later. For those of you that have run out of hours already, or are about to run out of hours, you may contact email@example.com and ask for a little more. No guarantees of course.
to list project accounts you are able to use.
cost -p nn0815k
to check your allocation (replace 0815 with your project's account name).
cost -p nn0815k --detail
to check your allocation and print consumption for all users of that allocation.
The new supercomputer Fram is nearing stable waters
The operations team, Sigma2 and the vendors have been working hard over summer to navigate Fram into safe and stable waters. After a manufacturing issue with the interconnect fabric, the sails are now repaired and to make sure we're shipshape for a long journey all bulkheads are now tightly sealed with copper tubing. To avoid any incidents like Wasa in 1628 we will run some last stability tests and, if all goes well, the pilot phase will restart from 21 August and Fram is now expected to enter production with the 2017.2 allocation period (starting 1 October 2017).
Norstore->NIRD data migration
1st of July has started the migration of the Norstore data to NIRD the new storage located in Tromsø and Trondheim.
The process of migrating the 2.5PB of data on the Norstore disks will take until at least early September.
After that the migration of the 1.5PB of data on tape will start.
Norstore project leaders will be contacted separately to organize the migration of their data.
For more information see https://www.sigma2.no/migration-nird.
Support of Linux Containers on Abel
Linux containerization is an operating system level virtualization technology that offers lightweight virtualization. An application that runs as a container has its own root file-system, but shares kernel with the host operating system. Abel is now supporting Singularity, a user friendly tool for building, running, and managing containers on HPC systems. The user guide is here. If you are more familiar with Docker containers, Singularity offers conversion of Docker containers, but this feature is not yet mature, i.e. not all Docker converted containers work as expected. Later we will provide full support of Docker containers on Abel.
Supercomputing 2017 is in Denver, USA 12-17 November
Read more at http://sc17.supercomputing.org
A small contingent from USIT is attending. Contact us if you have any information to convey to some vendor or if you want lecture notes from any of the tutorials.
New organizational structure at USIT
To emphasize the fact that IT for research is one of USIT´s top priorities the Research Computing Services Department has become become a Division, and does now serve directly under the IT-Director. Former Head of Department is now Head of Division.
Reminder: New recommendation for parallel jobs
This is a reminder of a change in the recommended way to run parallel jobs:
If your job needs at least 16 CPU cores, it is now recommended to ask for whole nodes.
This applies to all normal jobs (i.e., everything except hugemem, GPU, long and lowpri jobs).
The benefits for yourself is that your job will likely run faster (especially if it has a lot of communication between its processes), and it will likely start sooner. The benefits for all users is that the job will put less strain in the queue system, which sometimes prevents jobs from starting.
This can be done in two ways: Either ask for 16 cores per node (with --ntasks-per-node and/or --cpus-per-task) or make sure the job asks for between 61 and 61.5 GiB RAM per node. The principle is: make sure --ntasks-per-node * --cpus-per-task is 16, or --ntasks-per-node * --cpus-per-task * --mem-per-cpu is between 61 and 61.5 GiB (the default for --cpus-per-task is 1).
- MPI jobs with single threaded tasks (ranks) that need no more then 3936 MiB RAM per task, can use --ntasks-per-node=16 --mem-per-cpu=3936 in combination with either --nodes=N or --ntasks=M (where M is a multiple of 16).
- MPI jobs with single threaded tasks that need more than 3936 MiB per task, should ask for so many tasks per node that the total memory requirement is between 61 and 61.5 GiB per node. For instance, if the job needs at least 5 GiB per task, it can specify --ntasks-per-node=12 --mem-per-cpu=5248, (resulting in 12 * 5248 / 1024 = 61.5 GiB per node), in combination with --nodes or --ntasks as above.
- MPI jobs with multi threaded tasks can ask for --ntasks-per-node and --cpus-per-task such that --ntasks-per-node * --cpus-per-task is 16 if --mem-per-cpu is no more than 3936. For instance --ntasks-per-node=2 --cpus-per-task=8 --mem-per-cpu=3936 for MPI tasks with 8 threads. If the required memory per core is higher, the number of tasks * threads must be reduced as above.
- Single threaded jobs needing more than 61 GiB RAM can simply specify --mem-per-cpu=61G or --mem-per-cpu=62976 (which is 61.5 GiB).
- Multi-threaded jobs can specify --cpus-per-task=16 if they don't need more than 3936 MiB per core. If they need more, they should specify --cpus-per-task and --mem-per-cpu such that --cpus-per-task * --mem-per-cpu is between 61 and 61.5 GiB. For instance, if the job needs 6 GiB per core, it could use --cpus-per-task=10 --mem-per-cpu=6297 (resulting in 10 * 6297 / 1024 = 61.49 GiB).
Pilot service on visualization nodes connected to Abel/Norstore
We plan to start a pilot service of remote visualisation on Abel and Norstore and have for that purpose set-up several Linux nodes with 8 CPUs, 32 GB of RAM and 1 or 2 NVIDIA Tesla card (M2090, 6 GB).
If you are interested please contact us (firstname.lastname@example.org) and let us know what visualisation software you would like to see installed.
Availability of accelerated computing resources
Intel Xeon Phi (AKA Knights Landing):
We are happy to inform that 2 nodes with Xeon Phi Knights stand alone processor with Mellanox Infiniband (EDR 100 Gbits/s) are installed and available for testing and development. Please contact us if you would like assistance with porting an application to KNL. All the Intel tools are ready for KNL.
Intel developer site provide more information on products for HPC develpoment.
Other hardware needs
Are you in need of particular types of hardware (fancy GPUs, kunluns, dragons, etc.) not provided through Abel, please do contact us (email@example.com), and we'll try to help you as best we can.
Also, if you have a computational challenge where your laptop is too small but a full-blown HPC solution is a bit of an overkill, it might be worth to check out UH-IaaS.
If you want to be informed about day-to-day operations you can subscribe to the abel-operations list by emailing "subscribe abel-operations <Your Name>" to firstname.lastname@example.org. You can also follow us on twitter abelcluster: http://twitter.com/#!/abelcluster
New and updated software packages
The following is a list of new or updated software packages available on Abel with the module command.
=== R 3.4.1 ===
module load R/3.4.1
=== amplicon_processing 1.5 ===
module load amplicon_processing/1.5
=== bcbio 1.0.1 ===
module load bcbio/1.0.1
=== beagle-lib 2.1.2 ===
module load beagle-lib/2.1.2
=== binutils 2.28 ===
module load binutils/2.28
=== blast+ 2.6.0 ===
module load blast+/2.6.0
=== bowtie 1.1.2 ===
module load bowtie/1.1.2
=== bowtie2 2.3.1 ===
module load bowtie2/2.3.1
=== busco v3.0.1 ===
module load busco/v3.0.1
=== cesm 1.2.2 ===
module load cesm/1.2.2
=== dynet 2.0 ===
module load dynet/2.0
=== esmf 6.3.0rp1 ===
module load esmf/6.3.0rp1
=== flexpart 9.2.3 ===
module load flexpart/9.2.3
=== fsl 5.0.10 ===
module load fsl/5.0.10
=== galaxy-python 3.0 ===
module load galaxy-python/3.0
=== gatk 3.7 ===
module load gatk/3.7
=== gcc 6.3.0 ===
module load gcc/6.3.0
=== grib_api 1.21.0 ===
module load grib_api/1.21.0
=== hisat2 2.1.0 ===
module load hisat2/2.1.0
=== hpcx 18.104.22.168.0.0 ===
module load hpcx/22.214.171.124.0.0
=== idba 1.1.3 ===
module load idba/1.1.3
=== intel 2017.4 ===
module load intel/2017.4
=== lifeportal 1.0 ===
module load lifeportal/1.0
=== masurca 3.2.2 ===
module load masurca/3.2.2
=== md43 2016.03.01 ===
module load md43/2016.03.01
=== netcdf.intel 126.96.36.199p ===
module load netcdf.intel/188.8.131.52p
=== openmpi.gnu 1.10.6 ===
module load openmpi.gnu/1.10.6
=== openmpi.gnu 2.1.0 ===
module load openmpi.gnu/2.1.0
=== openmpi.hpcx 1.10.7rc1 ===
module load openmpi.hpcx/1.10.7rc1
=== openmpi.hpcx 2.1.1 ===
module load openmpi.hpcx/2.1.1
=== paraview 5.3.0 ===
module load paraview/5.3.0
=== pgi 17.1 ===
module load pgi/17.1
=== picard-tools 2.10.4 ===
module load picard-tools/2.10.4
=== roary 20170309 ===
module load roary/20170309
=== samtools 1.4 ===
module load samtools/1.4
=== singularity 2.3.1 ===
module load singularity/2.3.1
=== sortmerna 2.1 ===
module load sortmerna/2.1
=== spades 3.10.1 ===
module load spades/3.10.1
=== tensorflow 1.0.1 ===
module load tensorflow/1.0.1
=== ucx master ===
module load ucx/master
=== usearch 9.2.64 ===
module load usearch/9.2.64
=== vasp 5.4.4 ===
module load vasp/5.4.4
=== vcflib 1.0 ===
module load vcflib/1.0
=== vsearch 2.4.3 ===
module load vsearch/2.4.3
=== wrf-hydro 184.108.40.206 ===
module load wrf-hydro/220.127.116.11
Puh! Questions? Contact email@example.com.
USIT Department for Research Computing (RC) is interested in keeping track of publications where computation on Abel (or Titan) or usage of any other RC services are involved. We greatly appreciate an email to:
Abel Operations mailing list
To receive extensive system messages and information please subscribe to the "Abel Operations" mailing-list. This can be done by emailing "subscribe abel-operations <Your Name>" to firstname.lastname@example.org.
Follow us on Twitter
Follow us on twitter abelcluster. Twitter is the place for short notices about Abel operations.