Syllabus

    Weekly program

    Wen
    24 Aug

    Introduction, Machine Translation

    Lecturer: Jan Tore

    Presentation

    Mandatory reading

    • Jurafsky and Martin, Speech and Language Processing, Ch. 25-25.2
    • Koehn, Statistical Machine Translation (K:SMT), Ch. 1

    Thu
    25 Aug

    Probabilities

    Lecturer: Jan Tore

    Presentation

    Recommended reading

    It is a good idea to repeat the parts from INF1080 Logic on "Kombinatorikk"

    Wen
    31 Aug

    Machine translation evaluation

    Presentation

    Mandatory reading

    • Jurafsky and Martin, Speech and Language Processing, Ch. 25.9
    • Koehn, Statistical Machine Translation Ch. 8 up to sec. 8.2.3 BLEU
    Thu
    1 Sep

    Probabilities and mathematical notation

    Presentation

    Exercises

    • Probability
    • Conditional probability
    • Bayes’ rule
    • Independent events

    Wen
    7 Sep

    MT evaluation, the noisy channel model, language models

    Presentation

    Mandatory reading

    • Koehn, Statistical Machine Translation Ch. 8, sec. 8.2 and 8.4, with errata page
    • Jurafsky and Martin, Speech and Language Processing, Sec. 25.3-25.4
    • Koehn, Statistical Machine Translation Ch. 4, sec. 4.3

    Recommended reading

    • Koehn, Statistical Machine Translation sec 8.3 (for they with sufficient background)
    • Kishore Papineni et. al.: "Bleu: a Method for Automatic Evaluation of Machine Translation" in Fulltext.
    • Chris Callison-Burch, Miles Osborne and Philipp Koehn: "Re-evaluation the Role of Bleu in Machine Translation Research". Fulltext
    Thu
    8 Sep

    Work on obligatory assignment 1

    Wen
    14 Sep

    Word-based models and alignment

    Presentation

    Mandatory reading

    • Jurafsky and Martin, Speech and Language Processing, Sec. 25.4-25.6
    • Koehn, Statistical Machine Translation Ch. 4:
      • Sec 4.1-4.2 (except the technical details of 4.2.4)
      • Sec 4.4 (except technical details, to be explained)
      • Sec 4.5-4.6
    Thu
    15 Sep
    No class
    Wen
    21 Sep

    More on alignment, higher-order models

    Presentation

    Thu
    22 Sep

    Completing obligatory assignment 1

    Fri
    23 Sep

    Obligatory assignment 1 due 1800

    Wen
    28 Sep

    Phrase-based alignment

    Presentation

    Mandatory reading

    • Jurafsky and Martin, Speech and Language Processing, Sec. 25.4, 25.6
    • Koehn, Statistical Machine Translation Ch. 5, except
      • The technical details of sec. 5.3.3-5.3.6
      • Sec. 5.5
    Thu
    29 Sep

    Work on obligatory assignment 2

    Wen
    5 Oct

    Decoding

    Presentation

    Mandatory reading

    • Jurafsky and Martin, Speech and Language Processing, Sec. 25.8
    • Koehn, Statistical Machine Translation Ch. 6: sec 6.0-6.3
    Thu
    6 Oct

    Program

    Wen
    12 Oct

    Refinements

    Presentation

    Mandatory reading

    • Koehn, Statistical Machine Translation Ch2,
      • Sec 2.1.1 Tokenization,
      • Sec. 2.3 Corpora
    • Koehn, Statistical Machine Translation Ch9,
      • Sec. 9.0
      • Sec. 9.2, except 9.2.4
      • Sec. 9.3 up to "finding threshold points", p. 266
    • Koehn, Statistical Machine Translation Ch10,
      • Sec. 10.0
      • Sec. 10.1.1 and 10.1.2
      • Sec. 10.2

    Recommended reading

    • Koehn, Statistical Machine Translation Ch2,
    Thu
    13 Oct
    No class! Dagen at IFI
    Wen
    19 Oct

    Alternative translation strategies

    Presentation

    Thu
    20 Oct
    Work on obligatory assignment 2

    Distributional Semantics: Extracting Meaning from Data

    Wen

    26 Oct

    Introduction: linguistic foundations of distributional semantics

    Lecturer: Andrei Kutuzov

    Presentation

    Mandatory Reading

    1. (Optional) Distributional Structure. Zellig Harris, 1954.
    2. The Distributional Hypothesis. Magnus Sahlgren, 2008.
    3. Speech and Language Processing. Daniel Jurafsky and James Martin. 3rd edition draft of April 9, 2016. Chapter 15, 'Vector Semantics'.
    4. From Frequency to Meaning: Vector Space Models of Semantics. Peter Turney and Patrick Pantel, 2010. Skip Section 5.

    Thu

    27 Oct

    Exercises on MT

    Wed

    2 Nov

    Distributional and distributed: inner mechanics of modern word embedding models (including word2vec)

    Lecturer: Andrei Kutuzov

    Presentation

    Mandatory reading

    1. (Optional) A neural probabilistic language model. Bengio, Yoshua, et al., 2003
    2. Extracting semantic representations from word co-occurrence statistics: A computational study. Bullinaria, John A., and Joseph P. Levy, 2007
    3. Distributed representations of words and phrases and their compositionality. Mikolov, Tomas, et al., 2013.
    4. Word2vec parameter learning explained. Rong, Xin, 2014
    5. Speech and Language Processing. Daniel Jurafsky and James Martin. 3rd edition draft of April 11, 2016. Chapter 16 `Semantics with dense vectors'.
    6. (Optional) Glove: Global Vectors for Word Representation. Pennington, Jeffrey, Richard Socher, and Christopher D. Manning, 2014.

    Thu

    3 Nov

    No class

    Wed

    9 Nov

    Practical aspects of training and using distributional models

    Presentation

    Mandatory reading

    1. Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. Marco Baroni, Georgiana Dinu, and Germán Kruszewski. ACL 2014.
    2. Improving Distributional Similarity with Lessons Learned from Word Embeddings. Omer Levy, Yoav Goldberg, and Ido Dagan. TACL 2015.
    3. SimLex-999: Evaluating Semantic Models with (Genuine) Similarity Estimation. Felix Hill, Roi Reichart and Anna Korhonen. Computational Linguistics. 2015
    4. (Optional) Correlation-based Intrinsic Evaluation of Word Vector Representations. Yulia Tsvetkov, Manaal Faruqui, and Chris Dyer. RepEval, ACL 2016
    5. Word2vec in Gensim tutorial
    6. (Optional) Vector representation of words in TensorFlow

    Thu

    10 Nov

    Work on obligatory assignment 3.

    Slides on setting up your isolated Python environment at IFI cluster.

    Wed

    16 Nov

    Beyond words: distributional representations of texts

    Presentation

    Mandatory reading

    1. Distributed Representations of Sentences and Documents. Quoc Le, Tomas Mikolov. ICML 2014
    2. Learning Distributed Representations of Sentences from Unlabelled Data. Felix Hill, Kyunghyun Cho, Anna Korhonen. arXiv:1602.03483, 2016
    3. (Optional) Composition in Distributional Models of Semantics. Jeff Mitchell, Mirella Lapata. Cognitive Science, 2010
    4. (Optional) Document Classification by Inversion of Distributed Language Representations. Matt Taddy. ACL 2015
    5. (Optional) An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation. Jey Lau, Timothy Baldwin. ACL 2016

    Thu

    17 Nov

    Work on obligatory assignment 3.

    Wed

    23 Nov

    Kings and queens, men and women: semantic relations between word embeddings

    Presentation

    Mandatory reading

    1. Exploiting similarities among languages for machine translation. Tomas Mikolov, Quoc Le, Ilya Sutskever. arXiv:1309.4168, 2013
    2. Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change. William Hamilton, Jure Leskovec, Dan Jurafsky. ACL 2016
    3. (Optional) Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy et al. NAACL 2015

    Thu

    24 Nov

    No class

    Wed

    30 Nov

    What's going on: recent advances and trends in the word embeddings world

    (+ exam information and discussion of the obligatory assignment)

    Presentation

    Mandatory reading

    1. (Optional ) Defining words with words: beyond the distributional hypothesis. Pontus Stenetorp et al. RepEval, ACL 2016.

    Thu

    1 Dec

    Discussion of exam-like problems.

    Attention: the room has changed to Datastue Fortress!

       

    Thu

    8 Dec

    Discussion of exam problems in MT.

    Attention: the room has changed to seminar room Pascal!

     

    Published Aug. 17, 2016 11:10 AM - Last modified Dec. 5, 2016 11:15 AM