This course is discontinued

STV9020H – Quantitative Text Analysis

Course content

This course introduces students to approaches to quantitatively analyzing text relevant for core political science questions.

With the technological advancements of recent years, textual footprints are available abundantly: from governments and legislatures offering digitalized and well structured summaries of their activities (including proposals, laws, debates) through the increased use of social media by political candidates, to easily accessible media archives. All these new data sources can help in both re-evaluating theories of political behavior, but also extend these using more refined contextual and communication measurements. Accordingly, the course is intended to make sure that students:

  • have the tools to gather, manipulate, and prepare these texts
  • conditional on the research goals, are able to pick approaches and models suitable for testing particular hypotheses
  • carry out these steps following principles behind the quantitative analysis of textual data.

The topics should also be of general interest for people interested in carrying out only basic work with text: storing and loading text into Statistical software for coding (new variable creation), extracting various word or term occurrences (did the text mention a particular party leader for example), and generate summary statistics related to word occurrences, text length, and other features.

Learning outcome


Participants will:

  • acquire in depth knowledge of the possible uses of text for analyzing political phenomena
  • be exposed to and learn about relevant sources of text and a representative set of quantitative approaches to analyzing these texts
  • acquire specific and detailed knowledge on various methodologies for quantitatively extracting meaning from text relevant for political actors


Participants will:

  • be able to critically evaluate cutting edge quantitative text analyses methodologies in a comparative manner
  • be able to gather and analyze text in a scalable manner
  • be able to link current results from quantitative text analysis to relevant political science questions and better map societal and political dynamics


Participants will:

  • be able to select and adapt pre-existing quantitative text analysis techniques for their own Research
  • be able to gather, structure, pre-process and manipulate large corpora of text
  • be able to contrast qualitative and quantitative approaches to the analysis of political text


This course is a combinded Master's and PhD course.

PhD candidates from UiO: Apply for the course in StudentWeb
Other PhD candidates: Application form

Application deadline: 1 October 2017


Formal prerequisite knowledge

STV4020A – Forskningsmetode og statistikk or comparable, i.e. understanding of quantitative methods/statistics. Participants who did STV4020A with the R component will be at an advantage, simply in terms of experience with implementation.

Students are expected to have working knowledge of R, which should cover at least: loading data and packages, data recoding, fitting at least simple regression models (lm() for example), and extracting quantities of interest from these model Objects.

Overlapping courses

10 credits overlap with STV4020H – Quantitative Text Analysis


The course runs for five weeks with two meetings held each week.

These meetings will be a mixture of a) instructor lead introduction of the problem, b) review of the readings/approaches, and c) software demonstration. Accordingly, in-class activity presupposes that participants read the required readings and follow along the implementation examples and exercises.

The meetings will be based on substantive political science problems that have been analyzed using text data (party manifestos, government bills, speeches, open ended survey responses, for example), with supporting readings on the methods employed, and software demonstration and exercises. We will work extensively with texts (both human coded and not) available through the Comparative Manifesto Project and the Comparative Agenda Project.

Refresh your R knowledge

Before the class is scheduled to start, there will be an R refresher offered for those who are uncertain about their working-level of R. This longer session aims at reviewing the basic principles of working with R up to the point of fitting simple linear regression models. This step is voluntary and will be discussed on the Fronter page of the course prior to its start.


Portfolio examination.

The portfolio consists of:

  • four home assignments during the course (excluding the first week of the course).
  • These home assignments contain both discussion of a particular problem and software application.
  • The home assignments will be between 2000-3000 words, with R code supplied separately to reproduce the results reported and discussed in the assignments.
  • Short feedback is offered on each of the home assignments, with additional detailed feedback available through office hours or request.
  • There is no possibility to re-take the assignments.
  • As we progress with the course, assignments get slightly more complex.

The final letter grade will reflect the overall performance throughout the course. You must pass all the assignments in the portfolio in the same semester.

Grading scale

Grades are awarded on a pass/fail scale. Read more about the grading system.

Explanations and appeals

Resit an examination

Special examination arrangements

Application form, deadline and requirements for special examination arrangements.

Facts about this course






Every autumn


Every autumn

Teaching language