Corpora and computation in language studies

Teacher spring term 2013: Emiliano Guevara, ILN

Brief description

This course introduces the basic methodological and theoretical elements of Corpus Linguistics and Computational Linguistics, their practice and relation with language studies in general. The course will consist in a combination of traditional lectures, guided seminars and practical work in the computer laboratory.

Corpora and computation allow us to carry out sound empirically-based research on language independently of the particular theoretical framework of choice. The course will be relevant for students interested in Generative Linguistics, Cognitive Linguistics, Construction Grammar, Psycholinguistics, Discourse Analysis and also other related fields like Literary Criticism.

What you will learn

In the first place you will learn what a language corpus is, how you can build one, what choices must be made when you project your own corpus, what sort of evidence can be obtained from them (frequency lists, concordances, collocations) and how you can exploit that information in your own research. You will also use large electronic corpora available through online interfaces.

Second, you will acquire expertise in dealing with text in electronic format, encoding, annotation (mark-up) and conversion between different formats. You also will be taught how to process and manipulate electronic text with Open-Source tools and software.

Finally, you will learn to statistically analyze the data you obtain from language corpora. This will allow you to test the hypotheses you make in your research with real-usage data and present your results in clear and effective ways.

Course language

The course will be taught in English.


Fourteen seminar double-classes.

Prior knowledge required

No previous knowledge of the field is required for the course, besides the basic concepts linguistic theory (word and phrase categories, basic syntax, morphology and semantics). However, some familiarity with programming and/or statistics could be useful.

Evaluation and exam

A short written essay on a research project of your choice (five to ten pages) and a two-hour classroom exam.

The essay and exam can be written in English or Norwegian.


The course will use mainly the following handbook:

Mc Enery, Tony, and Hardie, Andrew (2012). Corpus Linguistics. Cambridge: Cambridge University Press. (294 pages).

The parts on data analysis and processing will be instead based on parts of:

Gries, Stefan Th. (2009). Quantitative Corpus Linguistics with R. New York and London: Routledge. (248 pages)

Other material, like case-study articles, will be provided during the course.

A good primer on text manipulation is Kenneth W. Church’s “Unix for poets”, which can be found easily online in different formats. This is the original link to the document (download it and save it as a PostScript .ps file, which can be then converted to PDF or printed):


Publisert 5. okt. 2012 10:35 - Sist endret 5. okt. 2012 10:35