Dette emnet er erstattet av IN4080 – Natural Language Processing.

Syllabus

Background

Natural Language Processing is an interdisciplinary discipline building on insights from various fields including

  • Language and Linguistics
  • Computer Science in general and programming in particular
  • Statistics
  • Machine Learning and "Data Science"

Students who come to this class have different backgrunds. Some are familiar with some of the fields, others are familiar with different fields. We will try to cover much of the background material - but not all. You might have to read some on your own. What we will cover in class will be adepted to what can be assumed from the first year master students in Informatics: Language and Computation, since this course is mandatory for these students.

Here is some more on assumed background and recommendations on what to read.

Language and linguistics

You have to be familiar with some core concepts of linguistics, like "parts of speech" and "sentence structure". If you have not taken any courses in linguistics or NLP/Computational Linguistics you should consult some of the following.

  • Chapter 3, "Linguistic Essentials", p. 81-115, in Manning and  Schütze: Foundations of Statistical Natural Language Processing. This is the best overview for what will be assumed in the course. Unfortunately, the book is not online, but you find it in the library.
  • You are recommended to acquire Jurafsky and Martin, Speech and Language Processing, anyhow. The sections 3.1 + 12.1-12.3 introduce some of the key concepts of morphology and syntax.
  • You are also recommended to read sections 8.1-8.3 in the NLTK book: Natural Language Processing with Python, by Bird, Klein and Loper

Programming in Python

We assume you have programming experience from some language(s), and that if you have no experience with Python, you're able to catch up. We have given some advices here, and will go more into details in the first group session.

Statistics

Since we don't presuppose any background in Statistics, we will give a crash course in the three first lectures. Do you need a book on statistics? We will cover all the concepts on the slides, so a book is not strictly required. But it could be useful with some more explanations and examples than what we reach to cover in class.

  • If you already own a book on statistics, that will probably suffice, e.g. the STK1000 book, Moore and McCabe, Introduction to the Practice of Statistics.
  • I like Gonnick and Smith's, The Cartoon Guide to Statistics. It is mostly drawings - not too many words, but it covers the essentials.
  • Statistics in a Nutshell by Sarah Boslaugh covers what we we need in not too many pages, and in roughly the same order as we will present the material.
  • There are several free book and courses on statistics on the internet - I don't have any particular recommendations.
  • Last time I gave the course some students recommended Khan academy

Week to week

   
1 week, 17 Aug

Introduction

Recommended reading

Looking at data

Presentation

Recommended reading

The following each cover the lecture

  • Cartoon Guide: Ch. 2, Data Description
  • Nutshell: Ch. 4, Descriptive Statistics and Graphical Displays, p. 83-120
  • Moore and McCabe: Ch. 2, Looking at Data - Relationships (ca 100 pages)
2 week, 24 Aug

Probabilities

Presentation

Recommended reading

The following each cover the lecture

  • Cartoon Guide: Ch. 3, 4, parts of 5
  • Nutshell: Ch. 2, Probability, p. 21-53
  • Moore and McCabe: Ch. 4, Probability: The Study of Randomness (ca 85 pages)
2 week, 27 Aug

Lab: Python and NLTK

Program

Mandatory reading

Natural Language Processing with Python (=NLTK book)

  • Ch. 1
  • Sec. 2.1-2.2
  • Sec. 2.3
3 week, 1 Sep

Statistics

Presentation (corrected 18 Sept.)

Recommended reading

The following each cover the lecture

  • Cartoon Guide: Ch. 5, 6
  • Nutshell: Ch. 3, Inferential Statistics
  • Moore and McCabe:

It is also a good idea to repeat the parts from INF1080 Logic on "Kombinatorikk"

3 week, 3 Sep

Exercises on whiteboard

Moved to room Java!

4 week, 10 Sep

Working with texts

Moved to room Java!

Presentation

Mandatory reading

5 week, 14 Sep

Classification, evaluation and more statistics - mostly statistics

Presentation (corrected 18 sept)

Mandatory reading

Recommended reading

Parts of the following each cover (most of) the statistical part

  • Cartoon Guide: Ch. 6-9
  • Nutshell: Ch. 3, 5, 6
  • Moore and McCabe:
5 week, 17 Sep

Lab

6 week, 21 Sep

Classification, evaluation and more statistics, contd.

Presentation (preliminary)

Mandatory reading

Recommended reading

  • Raghavan, Manning, Schütze: Introduction to Information Retrieval
    • Section 13.4
6 week, 24 Sep

Lab

7 week, 28 Sep

Information extraction

Presentation (preliminary)

Mandatory reading

7 week, 1 Oct

Lab

 

8 week, 5 Oct

Dependency Grammar

Presentation (screen, handout)

Mandatory reading

  • Nivre, Joakim: "Dependency grammar and dependency parsing" in MSI report 05133. , 2005. Växjö University: School of Mathematics and Systems Engineering. Sections 1,2,4. On-line copy.

  • Arnold Zwicky: "Heads" in Journal of Linguistics, Vol 21, 1985. Sections 1-2. On-line copy.

8 week, 8 oct

Reading group

9 week, 12 oct

Dependency parsing

Presentation (screen, handout)

Mandatory reading

  • Nivre, Joakim: "Two Strategies for Text Parsing" in A Man of Measure: Festschrift in Honour of Fred Karlsson on his 60th Birthday., 2006. On-line copy.
  • McDonald, Ryan and Nivre, Joakim: "Analyzing and Integrating Dependency Parsers" in Computational Linguistics, 2011. Sections 1-2. On-line copy
  • Nivre, Joakim et. al.: "MaltParser: A language-independent system for data-driven dependency parsing" in Natural Language Engineering, 2007. Sections 1-4.1. On-line copy.

 

9 week, 15 oct

Lab

Mini-lecture on experimental methodology

Oblig 3a

10 week, 19 oct

Semantic roles

Presentation (screen, handout)

Mandatory reading

  • Jurafsky and Martin 3rd ed, ch 22  sec 22.1-22.3
  • David Dowty: "Thematic Proto-Roles and Argument Selection" in Language Vol. 67, No. 3, 1991. Sections 1-9. On-line copy.

10 week, 22 oct

No group

Study questions Dowty

11 week, 26 Oct

Semantic Role Labeling

Presentation (screen, handout)

Mandatory reading

  • Jurafsky and Martin 3rd ed, ch 22  sec 22.4-22.6
  • Marquez et. al.: "Semantic Role Labeling: An Introduction to the Special Issue" in Computational Linguistics, 2008. Association for Computational Linguistics. On-line copy.
  • Richard Johansson and Pierre Nugues: "Dependency-based Syntactic–Semantic Analysis with PropBank and NomBank" in Proceedings of the Twelfth Conference on Computational Natural Language Learning, 2008. On-line copy.

     
11 week, 29 Oct

Lab

12 week, 2 Nov

Machine Learning in NLP, Logistic Regression

Presentation (preliminary)

Mandatory reading

12 week, 5 Nov

Lab

13 week, 9 Nov

Statistical significance, chi square, collocations

Presentation (preliminary)

  • Evaluation and significance (Oblig. 2, ex 1.7-9, ex 2.3)
  • Chi square
  • Collocations
  • Feature selection

Mandatory reading

13 week, 12 Nov

No group

14 week, 16 Nov

More on collocations, feature selection and maximum entropy

Presentation (preliminary)

14 week, 19 Nov

Lab

15 week, 23 Nov

No class

16 week, 26 Nov

No class

17 week, 30 Nov

Exercises from earlier exams

Room Java!

   

 

Publisert 24. aug. 2015 11:23 - Sist endret 30. nov. 2015 14:45