MF9155 – Introduction to statistics and bioinformatics for the analysis of large-scale biological data
Schedule, syllabus and examination date
The course considers methods integral to data analysis in modern molecular medical research. It is planned that this course will be part 1 of a series of two courses on this topic. As such it is relevant to all PhD students and researchers who need to analyze large-scale molecular data themselves, as well as those who need to interpret results and understand publications in the molecular life sciences.
High-throughput techniques are becoming increasingly more prevalent in research in life sciences and the clinic. However, to make effective use of the resulting large datasets it is necessary to understand and apply more advanced statistical methods. We will introduce the statistical concepts behind typical data analysis tasks for large-scale biological data, including the following topics:
a) high-throughput screening (multiple testing and group tests),
b) unsupervised learning and data visualization (clustering and heatmaps, dimension reduction methods),
c) supervised learning (classification and prediction, cross-validation and bootstrapping).
We will also introduce reference sources and biological databases that can aid interpretation and will show how they can be accessed and integrated into a data analysis.
Methods will be demonstrated by replicating analyses from publications and real-life gene expression data will be used in the computer labs.
To encourage continued learning after the course, we will also provide an overview of available web-based courses and exercises.
Learn important statistical and bioinformatics concepts for analysing molecular data. Have knowledge of the specific statistical challenges associated with the analysis of high-throughput biological data. Know important biological databases and relevant statistics/ bioinformatics software tools. Understand some of the challenges you will face when trying to apply this knowledge to the analysis of real datasets.
Be able to identify the data analysis problem and match the appropriate type of statistical method and corresponding software. Perform basic analyses of high-throughput biological data using R and Bioconductor. Be able to understand and critically evaluate the data analysis procedures in publications in molecular biology/ molecular medicine.
PhD candidates at UiO will have first priority at admission to the course. Maximum number of participants is 30-35 (limited by the capacity of computer lab).
How to apply:
- PhD candidates admitted to a PhD programme at UiO apply in StudentWeb
- Applicants who are not admitted to a PhD programme at UiO must apply for a right to study PhD courses in medicine and health sciences in SøknadsWeb before they can apply for this course. External applicants should apply for a right to study minimum 3 weeks before the course application deadline. See information about how to apply for at right to study and how to apply for PhD courses here: How external applicants can apply for elective PhD courses in medicine and health sciences.
Reply to course application:
Formal prerequisite knowledge
Students should have passed the exam in an introductory course in statistics (e.g. MF9130). They should also have some experience with the statistical programming language R and have basic familiarity with the Unix shell, for example by having completed a software carpentry workshop.
Recommended previous knowledge
Students should have a basic understanding of molecular biology, at least roughly corresponding to 5-10 university study points in molecular biology, biochemistry, or similar.
The teaching will be organized as an intensive course over seven days. There will be lectures coupled with hands-on practicals and example data analyses in the computer labs. Students will need to allow for sufficient time in advance for course preparations, which include some required reading, as well as after the course for the take-home exam. The practicals will take place in the same lecture hall as the lectures. Students will need to bring their own laptops with R/Bioconductor and RStudio installed to be able to follow the computer exercises.
You have to participate in at least 80 % of the teaching to be allowed to take the exam. Attendance will be registered.
Take-home exam in the form of a comprehensive data analysis task based on a recent publication, to be submitted four weeks after completion of the course.
Submit assignments in Inspera
You submit your assignment in the digital examination system Inspera. Read about how to submit your assignment.
Use of sources and citation
Language of examination
The examination text is given in English, and you submit your response in English.
Grades are awarded on a pass/fail scale. Read more about the grading system.
Explanations and appeals
Resit an examination
Withdrawal from an examination
It is possible to take the exam up to 3 times. If you withdraw from the exam after the deadline or during the exam, this will be counted as an examination attempt.
The course is subject to continuous evaluation. At regular intervals we also ask students to participate in a more comprehensive evaluation.