MF9155 – Introduction to statistics and bioinformatics for the analysis of large-scale biological data

Schedule, syllabus and examination date

Choose semester

Course content

The course provides an introduction to methods that are integral to data analysis in modern molecular medical research. As such it is relevant to all PhD students and researchers who need to analyze large-scale molecular data themselves, as well as those who need to interpret results and understand publications in the molecular life sciences.

High-throughput techniques are becoming increasingly more prevalent in research in life sciences and the clinic. However, to make effective use of the resulting large datasets it is necessary to understand and apply more advanced statistical methods. We will introduce the statistical concepts behind typical data analysis tasks for large-scale biological data, including the following topics:

a) high-throughput screening (multiple testing and group tests),

b) unsupervised learning and data visualization (clustering and heatmaps, dimension reduction methods),

c) supervised learning (classification and prediction, cross-validation and bootstrapping).

We will also introduce reference sources and biological databases that can aid interpretation and will show how they can be accessed and integrated into a data analysis.

Methods will be demonstrated using examples from publications and real-life data sets will be used in the computer labs. Data examples will be chosen to cover diverse technologies, including next generation sequencing and microarray technologies, as well as different molecular data types, such as DNA sequencing, gene expression profiling and microRNA analyses.

To encourage continued learning after the course, we will also provide an overview of available web-based courses and exercises.

Learning outcome

Knowledge:
Learn important statistical and bioinformatics concepts for analysing molecular data. Have knowledge of the specific statistical challenges associated with the analysis of high-throughput biological data. Know important biological databases and relevant statistics/ bioinformatics software tools. Understand some of the challenges you will face when trying to apply this knowledge to the analysis of real datasets.

Skills:
Be able to identify the data analysis problem and match the appropriate type of statistical method and corresponding software. Perform basic analyses of high-throughput biological data using R and Bioconductor. Be able to understand and critically evaluate the data analysis procedures in publications in molecular biology/ molecular medicine.

Admission

PhD candidates at UiO will have first priority at admission to the course. Maximum number of participants is 30-35 (limited by the capacity of computer lab).

PhD candidates admitted to a PhD programme at UiO apply in StudentWeb

Applicants who are not admitted to a PhD programme at UiO must apply for a right to study in SøknadsWeb before they can apply for PhD courses in medicine and health sciences. Deadline for applying for a right to study this course is 15 September.

Prerequisites

Formal prerequisite knowledge

Passed exam in an introductory course in statistics (e.g. MF9130)

Recommended previous knowledge

Students should have a basic understanding of molecular biology, at least roughly corresponding to 5-10 university study points in molecular biology, biochemistry, or similar. Some experience performing data analysis with a statistical analysis software (e.g. SPSS, Stata, R) is strongly recommended.

Teaching

The teaching will be organized as an intensive course over five full days. There will be lectures coupled with hands-on practicals and example data analyses in the computer labs. Students will need to allow for sufficient time in advance for course preparations, which include some required reading, as well as after the course for the take-home exam. The practicals will take place in a PC room, but students are encouraged to bring their own laptops with R/Bioconductor and RStudio installed to make the most of the course.

You have to participate in at least 80 % of the teaching to be allowed to take the exam. Attendance will be registered.

Examination

Take-home exam in the form of a comprehensive data analysis task based on a recent publication, to be submitted four weeks after completion of the course.

Language of examination

The examination text is given in English, and you submit your response in English.

Grading scale

Grades are awarded on a pass/fail scale. Read more about the grading system.

Explanations and appeals

Resit an examination

Withdrawal from an examination

It is possible to take the exam up to 3 times. If you withdraw from the exam after the deadline or during the exam, this will be counted as an examination attempt.

Evaluation

The course is subject to continuous evaluation. At regular intervals we also ask students to participate in a more comprehensive evaluation.

Facts about this course

Credits

5

Level

PhD

Teaching

Autumn 2018

3.12. - 7.12.

Application period

1.6. - 1.10.

Applicants will be notified after the course registration deadline

 

Examination

Every autumn

Teaching language

English