MF9155 – Introduction to statistics and bioinformatics for the analysis of large-scale biological data

Schedule, syllabus and examination date

Course content

The course considers methods integral to data analysis in modern molecular medical research. It is planned that this course will be part 1 of a series of two courses on this topic. As such it is relevant to all PhD students and researchers who need to analyze large-scale molecular data themselves, as well as those who need to interpret results and understand publications in the molecular life sciences.

High-throughput techniques are becoming increasingly more prevalent in research in life sciences and the clinic. However, to make effective use of the resulting large datasets it is necessary to understand and apply more advanced statistical methods. We will introduce the statistical concepts behind typical data analysis tasks for large-scale biological data, including the following topics:

a) high-throughput screening (multiple testing and group tests),

b) unsupervised learning and data visualization (clustering and heatmaps, dimension reduction methods),

c) supervised learning (classification and prediction, cross-validation and bootstrapping).

We will also introduce reference sources and biological databases that can aid interpretation and will show how they can be accessed and integrated into a data analysis.

Methods will be demonstrated by replicating analyses from publications and real-life gene expression data will be used in the computer labs.

To encourage continued learning after the course, we will also provide an overview of available web-based courses and exercises.

Learning outcome

Learn important statistical and bioinformatics concepts for analysing molecular data. Have knowledge of the specific statistical challenges associated with the analysis of high-throughput biological data. Know important biological databases and relevant statistics/ bioinformatics software tools. Understand some of the challenges you will face when trying to apply this knowledge to the analysis of real datasets.

Be able to identify the data analysis problem and match the appropriate type of statistical method and corresponding software. Perform basic analyses of high-throughput biological data using R and Bioconductor. Be able to understand and critically evaluate the data analysis procedures in publications in molecular biology/ molecular medicine.


PhD candidates at UiO will have first priority at admission to the course. Maximum number of participants is 30-35 (limited by the capacity of computer lab).

How to apply:

  • PhD candidates admitted to a PhD programme at UiO:
  • Applicants who are not admitted to a PhD programme at UiO:

Reply to course application:

  • This course has registration type Application.
  • Applicants must wait for a reply to the course application. A reply will be given in StudentWeb and sent by e-mail about 1 week after the application deadline has expired.


Formal prerequisite knowledge

Students should have passed the exam in an introductory course in statistics (e.g. MF9130). They should also have some experience with the statistical programming language R and have basic familiarity with the Unix shell, for example by having completed a software carpentry workshop.

To gain sufficient experience with R, students could for example complete an introductory online course or follow a software carpentry course at UiO.

Recommended previous knowledge

Students should have a basic understanding of molecular biology, at least roughly corresponding to 5-10 university study points in molecular biology, biochemistry, or similar.


The teaching will be organized as an intensive course over seven days. There will be lectures coupled with hands-on practicals and example data analyses in the computer labs. Students will need to allow for sufficient time in advance for course preparations, which include some required reading, as well as after the course for the take-home exam. The practicals will take place in the same lecture hall as the lectures. Students will need to bring their own laptops with R/Bioconductor and RStudio installed to be able to follow the computer exercises.

You have to participate in at least 80 % of the teaching to be allowed to take the exam. Attendance will be registered.


Take-home exam in the form of a comprehensive data analysis task based on a recent publication, to be submitted four weeks after completion of the course.

Submit assignments in Inspera

You submit your assignment in the digital examination system Inspera. Read about how to submit your assignment.

Use of sources and citation

You should familiarize yourself with the rules that apply to the use of sources and citations. If you violate the rules, you may be suspected of cheating/attempted cheating.

Language of examination

The examination text is given in English, and you submit your response in English.

Grading scale

Grades are awarded on a pass/fail scale. Read more about the grading system.

Explanations and appeals

Resit an examination

Withdrawal from an examination

It is possible to take the exam up to 3 times. If you withdraw from the exam after the deadline or during the exam, this will be counted as an examination attempt.

Special examination arrangements

Application form, deadline and requirements for special examination arrangements.


The course is subject to continuous evaluation. At regular intervals we also ask students to participate in a more comprehensive evaluation.

Facts about this course

Every autumn

6 days course.

Teaching autumn 2022:  21.11. - 23.11. and  28.11. - 30.11.    Application period:  1.6.2022 - 1.10.2022

Course registration:  See information on how to apply in the section "Admission" in the course description below.

Every autumn
Teaching language