Publisert 8. des. 2020 13:47

When I listed the curriculum I originally wrote:

All chapters from the textbook that have been covered by the lectures: Chapters 1-9, 14,15, 19-21.

But I forgot to list Chapter 13, which has indeed been covered by a lecture and is also the basis for one of the obligatory exercises! So this should be corrected to:

All chapters from the textbook that have been covered by the lectures: Chapters 1-9, 13-15, 19-21.

Publisert 3. des. 2020 16:35

It will be possible to reach me during the final exam ("digital trøsterunde") if you have any clarification questions regarding the problem sets on the exam: I will attend a Zoom meeting (details below) exactly one hour after the start of the exam. Zoom's waiting room function will be used so that only one student is allowed to attend the meeting at the same time. 

Aleksander Øhrn is inviting you to a scheduled Zoom meeting.

Topic: IN3120/IN4120: Eksamensrunde
Time: Dec 18, 2020 04:00 PM Amsterdam, Berlin, Rome, Stockholm, Vienna

Join Zoom Meeting

Meeting ID: 697 8989 0042

Documentation on how to use Zoom can be found here:

One tap mobile
+46844682488,,69789890042# Sweden
+46850500828,,69789890042# Sweden

Dial by your location
        +46 8 4468 2488 Sweden
        +46 8 5050 0828 Sweden
        +45 32 70 12 06 Den...
Publisert 29. nov. 2020 23:01

The following is considered to be part of the course curriculum:

  • All chapters from the textbook that have been covered by the lectures: Chapters 1-9, 14,15, 19-21.
  • All slides used as part of the lectures.
  • All supplementary papers discussed in the lectures. You will not be expected to know technical minutiae from these papers, but should be able to retell the gist of what those papers are about and their basic ideas.
  • All the obligatory exercises.

The science fair is not considered as part of the curriculum, i.e., topics presented at the science fair that are not covered by the items above will not appear on the exam.

Publisert 29. nov. 2020 22:50

The exam will be is an open-book exam, i.e., all aids (textbook, online resources, notes, and so on) are allowed. It is strictly forbidden to collaborate or communicate with others about the exam during the exam. You may be randomly selected for a conversation to check ownership of your answer []. This conversation does not affect the grading or grade, but may result in IFI opting to pursue a case for cheating. You can read more about what is considered cheating on UiO’s website []. Moreover, information on the website about exams at MN in the autumn 2020 applies [].

Publisert 11. nov. 2020 16:55

To complement the slides I used during the lecture on October 29th, here are some additional slides on web search and link analysis. They cover the same topics and chapters from the textbook that I covered during the lecture, but might supply additional context and clarifications.

Publisert 5. nov. 2020 17:42

In today's topic medley lecture I went through the three slide decks found here, here, and here. A couple of papers were also referenced, which you can deep-dive into for additional depth and color: 

Publisert 30. okt. 2020 16:12

En e-post med info om eksamen med feil tidspunkt ble sendt ut i dag. En ny e-post med ritkig tidspunk er sendt ut.

Eksamen i IN3120/IN4120 blir 18.desember 15.00 - 19.00.  

Beklager feilen! 


Studadm, IFI. 

Publisert 30. okt. 2020 11:54

Due to a scheduling conflict I will have to delay the lecture this coming Thursday from 10:15-12:00 to 11:15-13:00. That is, this coming Thursday the lecture will commence one hour later than originally scheduled. 

Publisert 29. okt. 2020 13:41

The IN4120 science fair will take place on November 12th, looking forward to it! Some practical information:

  • Each group should spend no more than 10 minutes on presenting their topic. I'll be a time cop.
  • Please mail me all your presentations by November 11th, i.e., the day before the science fair. I will then upload them to the course GitHub repository, so that everyone has access to them.
  • When mailing me your presentations, please send me a PDF file named science-fair-n.pdf, where n is your group number.
  • Realistically, given the 10-minute limit, when presenting your topic you will not be able to go through more than, say, 3 slides or so. Your PDF file above may be longer and contain more details/examples than what you have time to present, if you want.
  • There are no less than 14 groups, and we only have a 10:15-12:00 slot s...
Publisert 27. okt. 2020 14:45

Tomorrows group session has been moved and will be from 17:15 to 19:00 instead of 12:15 - 14:00. 

-Markus S. H.


Publisert 22. okt. 2020 11:37

To supplement today's lecture on "learning to rank" (i.e., looking at document ranking as an ML problem) here is a good overview paper which has been added to the course's GitHub repository. 

Publisert 19. okt. 2020 14:53

The solution to assignment C is published on Github in the "solutions" folder.

Publisert 15. okt. 2020 15:34

The solution for assignment B is published on Github repo in the "solutions" folder.

Publisert 14. okt. 2020 20:36

I will be going through assignment D during the group lesson tomorrow. If you have not started on assignment D or have any questions, consider attending the group lesson :)


Haiyue Chen

Publisert 8. okt. 2020 15:45

The solution for assignment A is published on Github repo in the "solutions" folder.

Publisert 28. sep. 2020 14:22

The old version of tests for assignment C tested code from assignment D. Please pull the newest version from Github before you start working on assignment C.

Publisert 10. sep. 2020 12:50

Next Thursday I'll cover the topic of index compression. Applying good compression techniques to an inverted index yields a number of performance benefits in practice. Index compression is covered by this chapter in the textbook, and I'll additionally use this deck to illustrate two integer compression techniques not mentioned in the textbook: Simple9 and PFOR-DELTA. This paper is very implementation-oriented, but its Related Work section gives a good summary of several families of compression algorithms, if you are interested.

Publisert 10. sep. 2020 12:35

In today's lecture I mentioned MapReduce while discussing distributed indexing. Although the textbook covers this already, two supplementary papers can be found below. You can read them if you're interested and want additional detail and depth beyond what's already covered by the textbook.

  • This paper is the original MapReduce paper, as originally published by Google.
  • This paper describes a similar system, as used internally in Microsoft.
Publisert 7. sep. 2020 22:06

I've been asked which of the supplementary papers presented in this deck presented last week that you should read. I would recommend reading all of them, but I will not ask you on an exam about deep technical details from any of the papers. If you understand the takeaway summaries in the deck then you're good to go. I.e., my expectation is only that you're able to clearly articulate what the main ideas presented in the papers are.

That said, some further comments about the papers mentioned in that deck:

Publisert 31. aug. 2020 12:55

On Thursday this week I'll be talking about the topics covered in Chapter 3 of the textbook, which addresses some string processing algorithms for tolerant retrieval and how to represent sets of strings. I will also deviate a bit from the book and pull in additional topics mentioned in some of the supplementary papers that have been distributed: In particular, I will discuss what's mentioned in this slide deck.

Publisert 27. aug. 2020 14:43

Miscellaneous supplementary links related to today's lecture:

  • This paper gives a more in-depth description of skip lists, for those interested in going beyond what's mentioned in the textbook.
  • You can find some source code here, if you're interested in learning more about the details of the Porter stemmer and its heuristics.
  • See here for more information about Snowball stemmers for various languages, including online demos.
  • I mentioned Double Metaphone as an example of a phonetic algorithm, besides Soundex. You can find some example Java source code...
Publisert 26. aug. 2020 23:39

Padlet (replacement for piazza) for this course is available here: Link

Publisert 26. aug. 2020 10:15

Here's just a quick reminder:

both group sessions are digital, and hosted at the same address which can be found here 

(The second zoom invite, below "Gruppetimer")

- Markus S. H.

Publisert 20. aug. 2020 13:30

As mentioned today, please let me know if you are unable to follow the lectures unless they are in English. Thanks!

Publisert 20. aug. 2020 13:02

All lectures will be recorded and made available from here. The recording of today's lecture is now available.

If you for some reason object to or do not consent to being included in a recording, then you can choose to not speak during the lecture and instead submit questions via the chat or offline.