Abel Newsletter #1, 2017
Spring course week, application deadline for CPU time through Notur, interesting conferences and external courses, along with the usual list of updated tools and applications now available on Abel or in the Lifeportal.
- USIT Seksjon for IT i forskning ITF(NO), or Department for research computing RC(EN), is responsible for delivering IT support for research at University of Oslo.
- The department's groups operate infrastructure for research, and support researchers in the use of computational resources, data storage, application portals, parallelization and optimizing of code, and advanced user support.
- The Abel High Performance Computing (HPC) cluster, Notur, and the NorStore storage resources are central components of the USIT IT support for researchers.
- Announcement of this news letter is done on the abel-users mailing list. All users with an account on Abel are automatically added to the abel-users list. This is mandatory. The news letter will be issued at least twice a year.
News and announcements
HPC basics for research training course 04-06 April 2017
Training in using High-performance Computing (HPC) efficiently, for Abel and Colossus (TSD) users. The course is open for all users of Notur systems, but examples and specifics will pertain to Abel and Colossus.
This time there are four sessions catered to different user groups.
Session 1 : HPC for beginners - 04-04-2017 (targeting users beginning to use Abel/Colossus)
Session 2 : HPC for advanced users - 05-04-2017 , 06-04-2017 (Targeting experienced users with good Unix knowledge)
Session 3 : Portals : HPC through Graphical user interface (No UNIX needed) (Targeting users who would like to use Abel through a Web browser instead of the UNIX terminal)
Session 4 : Code refinery - Optimize code and code management for HPC - 06-04-2017 (There will be a special seminar and the programme is not finalised yet as we are waiting for reply from the invited speakers)
Please note that we are collaborating with the software carpentry initiative at University of Oslo to help our user build up their UNIX competence. Please attend one their workshops before the start of the course if you feel that you need to refresh your UNIX knowledge (not a requirement for Session 3:Portals)
Registration: Registration will open 06-02-2017 and please check programme page for more details
Questions? Ideas? Contact email@example.com.
New Notur allocation period 2017.1, application deadline 1 February 2017
Kindly reminder: If you have many CPU hours remaining in the current period, you should of course try to utilize them asap, but since many users will be doing the same there is likely going to be a resource squeeze and potentially long queue times. The quotas are based on even use throughout the allocation period. If you think you will not be able to spend all your allocated CPU hours, it is highly appreciated to notify firstname.lastname@example.org so that the CPU hours may be released. You may get extra hours if you need more later. For those of you that have run out of hours already, or are about to run out of hours, you may contact email@example.com and ask for a little more. No guarantees of course.
to list project accounts you are able to use.
cost -p nn0815k
to check your allocation (replace 0815 with your project's account name).
cost -p nn0815k --detail
to check your allocation and print consumption for all users of that allocation.
The new supercomputer Fram is moving forward
The new Sigma2 supercomputer that is to replace Vilje and Hexagon has now gotten a name: Fram. Progress is moving steadily forward, and the system is currently in acceptance testing. Fram is expected to enter production during the 2017.1 computing period (starting 1 April 2017).
Planned maintenance stop on Abel 27-28 February 2017
Abel is not getting any younger and needs another maintenance stop. The stop will take place 27-28 February and in this period Abel and the BeeGFS file systems will be completely unavailable. The main reasons for the stop is to do maintenance on the BeeGFS file systems (/work and /cluster), and to upgrade the accounting database.
The exact starttime and duration of the maintenance period can change, but the current plan is to close off user acces to Abel on Monday February 27 at 7am. Prior to this we will put a limit on batch job submissions so that jobs scheduled to last beyond 7am on February 27 will not start. Abel should return to normal operation by Tuesday February 28, 4pm or earlier.
More information and reminders wil be sent to all Abel users closer to the event.
Big Data on Abel - Spark now available
The research field of Big Data has grown rapidly in the last decade, but has been excluded from general HPC clusters since the most common tools have required specialised distributed data storage. Luckily, a new star has entered the scene of Big Data tools and sparked new hope for Big Data on HPC. So we're happy to announce that Spark is now available on Abel, and we would be very interested to hear about any experiences, success stories, issues, etc. you might have with this installation.
#NeIC 2017 conference
Building on the enthusiasm from the first two Nordic e-Infrastructure Conferences in 2013 and 2015, the next NeIC Conference will be hosted by NeIC and SNIC at the Umeå Folkets Hus May 29 to June 1. Note that the early-bird registration ends March 10.
Nordugrid 2017 conference
The NorduGrid 2017 conference will be held in Tromsø 26-30 June. Registration is open: http://indico.lucas.lu.se/event/573/. This years theme is: From Data Factories to Insight as a Service. Tuesday June 27 until midday Wednesday June 28 is dedicated to a technical workshop, while the conference will take place from Wednesday afternoon until Friday noon.
PRACE Best Practice Guide to Knights Landing, now even featured on HPCwire
We are happy to see our own resident accelerator expert, Dr. Ole-Widar Saastad as Editor and Author of the PRACE Best Practice Guide to Knights Landing, now even featured on HPCwire.
EUDAT events coming up
- Open calls as part of EINFRA-12 (A) project proposal:
- for a business pilots
- for expression of interest to establish a competence center
- Upcoming 9th RDA Plenary, Barcelona, Spain, 5 - 7 April
- EGI Conference and INDIGO Summit, Catana, Italy, 9 - 12 May
New queue system recommendation
It has been a long-standing recommendation for submitting jobs that one should not ask for specific placements of the allocated CPU cores unless it is really needed. I.e., for MPI jobs, one should not ask for a specific number of tasks (CPUs) per node. The reason for this is that such jobs often must wait longer in the queue, and also that they frequently create problems for the queue system itself, resulting in jobs not being started as they should.
This recommendation is still valid, with one major exception: Due to an increase in jobs requesting whole nodes, we have set aside a part of the cluster to only run jobs requesting whole nodes. This makes it easier for such jobs to start, and easier for the queue system to schedule them. Thus, if your job needs at least 16 CPU cores, it is now recommended to ask for whole nodes.
This can be done in two ways: Either ask for 16 cores per node (with --ntasks-per-node and/or --cpus-per-task) or make sure the job asks for between 61 and 61.5 GiB RAM per node. The principle is: make sure --ntasks-per-node * --cpus-per-task is 16, or --ntasks-per-node * --cpus-per-task * --mem-per-cpu is between 61 and 61.5 GiB (the default for --cpus-per-task is 1).
- MPI jobs with single threaded tasks (ranks) that need no more then 3936 MiB RAM per task, can use --ntasks-per-node=16 --mem-per-cpu=3936 in combination with either --nodes=N or --ntasks=M (where M is a multiple of 16).
- MPI jobs with single threaded tasks that need more than 3936 MiB per task, should ask for so many tasks per node that the total memory requirement is between 61 and 61.5 GiB per node. For instance, if the job needs at least 5 GiB per task, it can specify --ntasks-per-node=12 --mem-per-cpu=5248, (resulting in 12 * 5248 / 1024 = 61.5 GiB per node), in combination with --nodes or --ntasks as above.
- MPI jobs with multi threaded tasks can ask for --ntasks-per-node and --cpus-per-task such that --ntasks-per-node * --cpus-per-task is 16 if --mem-per-cpu is no more than 3936. For instance --ntasks-per-node=2 --cpus-per-task=8 --mem-per-cpu=3936 for MPI tasks with 8 threads. If the required memory per core is higher, the number of tasks * threads must be reduced as above.
- Single threaded jobs needing more than 61 GiB RAM can simply specify --mem-per-cpu=61G or --mem-per-cpu=62976 (which is 61.5 GiB).
- Multi-threaded jobs can specify --cpus-per-task=16 if they don't need more than 3936 MiB per core. If they need more, they should specify --cpus-per-task and --mem-per-cpu such that --cpus-per-task * --mem-per-cpu is between 61 and 61.5 GiB. For instance, if the job needs 6 GiB per core, it could use --cpus-per-task=10 --mem-per-cpu=6297 (resulting in 10 * 6297 / 1024 = 61.49 GiB).
Pilot service on visualization nodes connected to Abel/Norstore
We plan to start a pilot service of remote visualisation on Abel and Norstore and have for that purpose set-up several Linux nodes with 8 CPUs, 32 GB of RAM and 1 or 2 NVIDIA Tesla card (M2090, 6 GB).
If you are interested please contact us (firstname.lastname@example.org) and let us know what visualisation software you would like to see installed.
Availability of accelerated computing resources
Intel Xeon Phi (AKA Knights Landing):
We are happy to inform that 2 nodes with Xeon Phi Knights stand alone processor with Mellanox Infiniband (EDR 100 Gbits/s) are installed and available for testing and development. Please contact us if you would like assistance with porting an application to KNL. All the Intel tools are ready for KNL.
Intel developer site provide more information on products for HPC develpoment.
If you want to be informed about day-to-day operations you can subscribe to the abel-operations list by emailing "subscribe abel-operations <Your Name>" to email@example.com. You can also follow us on twitter abelcluster: http://twitter.com/#!/abelcluster
New and updated software packages
The following is a list of new or updated software packages available on Abel with the module command.
=== R 3.3.2 ===
module load R/3.3.2
=== R 3.3.2.gnu ===
module load R/3.3.2.gnu
=== R 3.3.2.profmem ===
module load R/3.3.2.profmem
=== abyss 2.0.2 ===
module load abyss/2.0.2
=== abyss 2.0.2-MPI ===
module load abyss/2.0.2-MPI
=== augustus 3.2.2 ===
module load augustus/3.2.2
=== augustus 3.2.2-intel ===
module load augustus/3.2.2-intel
=== bamtools 2.4.0 ===
module load bamtools/2.4.0
=== bismark 0.17.0 ===
module load bismark/0.17.0
=== boost 1.62.0-intel ===
module load boost/1.62.0-intel
=== busco v2.0 ===
module load busco/v2.0
=== cmake 3.7.1 ===
module load cmake/3.7.1
=== cuda 8.0 ===
module load cuda/8.0
=== emacs 25.1 ===
module load emacs/25.1
=== fastx-toolkit 0.0.14 ===
module load fastx-toolkit/0.0.14
=== flexbar 2.7.0 ===
module load flexbar/2.7.0
=== flexpart 10.1 ===
module load flexpart/10.1
=== freesurfer 6.0.0 ===
module load freesurfer/6.0.0
=== fsl 5.0.9 ===
module load fsl/5.0.9
=== ghc 8.0.1 ===
module load ghc/8.0.1
=== gnuplot 5.0.4 ===
module load gnuplot/5.0.4
=== gromacs 5.1.4 ===
module load gromacs/5.1.4
=== gsl 2.2 ===
module load gsl/2.2
=== hdf5 1.10.0_patch1_gnu ===
module load hdf5/1.10.0_patch1_gnu
=== hdf5 1.10.0_patch1_intel ===
module load hdf5/1.10.0_patch1_intel
=== hmmer 3.1b2 ===
module load hmmer/3.1b2
=== icu 58.1_intel ===
module load icu/58.1_intel
=== intel 2017.1 ===
module load intel/2017.1
=== java jdk1.7.0_80 ===
module load java/jdk1.7.0_80
=== java jdk1.8.0_112 ===
module load java/jdk1.8.0_112
=== libmatheval 1.1.11 ===
module load libmatheval/1.1.11
=== maker 2.31.9 ===
module load maker/2.31.9
=== matlab R2016a ===
module load matlab/R2016a
=== netcdf.gnu 18.104.22.168 ===
module load netcdf.gnu/22.214.171.124
=== netcdf.intel 126.96.36.199 ===
module load netcdf.intel/188.8.131.52
=== openifs 40r1v1 ===
module load openifs/40r1v1
=== openmpi.gnu 2.0.1 ===
module load openmpi.gnu/2.0.1
=== orthograph 0.5.14 ===
module load orthograph/0.5.14
=== papi 5.5.1 ===
module load papi/5.5.1
=== penncnv 1.0.3 ===
module load penncnv/1.0.3
=== pgdspider 184.108.40.206 ===
module load pgdspider/220.127.116.11
=== pgi 16.7 ===
module load pgi/16.7
=== pigz 2.3.4 ===
module load pigz/2.3.4
=== poretools 0.5.1 ===
module load poretools/0.5.1
=== protobuf 3.0.0 ===
module load protobuf/3.0.0
=== quast 4.3 ===
module load quast/4.3
=== scala 2.12.0 ===
module load scala/2.12.0
=== schrodinger 2016.3 ===
module load schrodinger/2016.3
=== scorep 3.0 ===
module load scorep/3.0
=== smrtlink 1.0.7 ===
module load smrtlink/1.0.7
=== spark 2.0.2-bin-hadoop2.7 ===
module load spark/2.0.2-bin-hadoop2.7
=== stata 14 ===
module load stata/14
=== tbl2asn 25.0 ===
module load tbl2asn/25.0
=== totalview 2016.06.21 ===
module load totalview/2016.06.21
=== transdecoder 3.0.0 ===
module load transdecoder/3.0.0
=== trinityrnaseq 2.3.2 ===
module load trinityrnaseq/2.3.2
=== trinotate 3.0.1 ===
module load trinotate/3.0.1
=== vesta 3.3.8 ===
module load vesta/3.3.8
=== wrf 18.104.22.168 ===
module load wrf/22.214.171.124
Puh! Questions? Contact firstname.lastname@example.org.
USIT Department for Research Computing (RC) is interested in keeping track of publications where computation on Abel (or Titan) or usage of any other RC services are involved. We greatly appreciate an email to:
Abel Operations mailing list
To receive extensive system messages and information please subscribe to the "Abel Operations" mailing-list. This can be done by emailing "subscribe abel-operations <Your Name>" to email@example.com.
Follow us on Twitter
Follow us on twitter abelcluster. Twitter is the place for short notices about Abel operations.