Mandatory problem #1 INF3110/4110 Programming Languages To be completed before 27th September 2004 at noon Background ---------- FrameMaker is an advanced text-processing system, and Ifi has purchased several licences to this product. To keep track of how much FrameMaker is used, it keeps a log of who uses it and when (but nothing of what is in the documents). In ~inf3110/oblig-1 can be found logs from the years 2001-2004 together with a total log (which is just a concatenation of the yearly logs). Each log entry is one line long. We are only interested in the entries indicating when a user starts using FrameMaker (obtaining a licence) and when he or she has finished (releasing the licence). These lines look like this: 2003/01/06-16:59:16 CI siweh@hervard.ifi.uio.no/:0.0 19199 0 661022102 2003/01/08-15:34:40 CO siweh@hervard.ifi.uio.no/:0.0 23935 10 661022102 0 Adobe FrameMaker-USUK 6 # 20-3-01-00-6-3741A The lines contain the following data: 1) Date and time 2) CI when starting ("check in") or CO when leaving ("check out"). 3) User name @ computer 4) Other information (irrelevant to us) Specifications -------------- You are to write a Perl program analyzing the log files. It should read one or more log files and produce information on 1) number of check-ins and check-outs. 2) maximum number of simultaneous users, and on which date this occurred. (Choose the last date if there are several dates on which the maximum occurred.) 3) the last day anybody used FrameMaker. 4) who has been the most active user. (If there are several users equally active, print a sorted list of them.) 5) how many different users have tried FrameMaker. These should be groups into three groups: "testers" (who have used FrameMaker five times or less), "small-time users" (who have used it 6-25 times) and "big-time users" (who have used it even more). The big-time users should be named, in an alphabetically sorted list. 6) daily distribution, in the form of a diagram showing total use every hour of the day. We only consider the starting time, i.e., the CI records. For instance, if at least 10 people have started using FrameMaker between 19:00 and 20:00 hours, print an "*" in the 19 column (as shown in the example below). Each "*" represents at least 10 users. Note that there are errors in the log files; for instance there are CI records without a matching CO record, and vice versa. Your program should do the best it can if it encounters error: it should at least not crash! Example ------- If your program is called "fm-analysis", the command fm-analysis ~inf3110/oblig-1/2003-log should produce the following: The log contains: 212 check-ins and 266 check-outs. Max simultaneous users: 4 on 2003.03.11. Last used on 2003.12.15. Most active user (with 69 check-ins): griff There are 49 different users: 44 testers, 3 small-time users and 2 big-time users: griff, siweh. Daily distribution: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 * * * * * * * * * * * * (The actual wording and spacing is not important.) Delivery details ---------------- Your delivery should be the Perl program text, sent as an e-mail attachment to your tutor. The code should be easy to read and understand. To achieve this, comments are nearly always required. If the tutor has problems understanding how your program works, it will be rejected. This problem should be solved individually. You may discuss solution methods with others, but all code must be written by you. If you copy code from others you are likely to be suspected of plagiarism. 22nd September 2004 Good luck! Dag L Note (for those who might be interested) ---------------------------------------- (Others may disregard this.) The log files are real, and this has two implications: 1) The log files are undocumented, which means that I had to guess at how to interpret them. For instance, the CI and CO records may have the opposite meaning. You may choose to solve the problem based on this supposition if you like, but please include a comment about this. 2) The files contain errors. For instance there are more CI records than CO records. In other words, some start records or some stop records -- or both -- are missing. If you solve the assignment based on the original assumption (CI-CO), you can disregard this problem; the worst that will happen is that a few sessions will be too long. Using the CO-CI interpretation, you will have to invent some way of handling sessions without the stop record.