Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks

In this assignment, you will take advantage of distributed computing to accelerate video encoding.

You are supposed to:

 

Codec63

Codec63 is a modified variant of Motion JPEG that supports inter-frame prediction. It is not compliant with any standards by itself, so the precode contains both an example of an encoder and a decoder (which converts an encoded file back to YUV). C63's inter-frame prediction works by encoding for every macroblock independently whether it uses a motion vector or not. If a motion vector is used, it refers to the previous frame.

Macroblocks are encoded according to the JPEG standard [1] if no motion vector is used, and stored in the output file. If a motion vector is used, the residual is stored in the same manner. An illustrative overview of the steps involved during JPEG encoding can be found at Wikipedia [2]. If a motion vector is used, this is stored right before storing the encoded residual.

It is your task to optimize the c63 encoder using two machines.

The c63 is very basic and shows behavior that you wouldn't allow a standard encoder to have. This concerns in particular the Huffman tables and the unconditional use of motion vectors in non-I-frames. You should not modify these Huffman tables. You can decide to use conditional motion vectors, but you must search for motion vectors, and you must write code that potentially uses the whole motion vector search range (hard-coded to 16 in the precode).

The video scenario is live streaming. You should not have an encoder pipeline of more than 3 frames. In addition, you should not use parallelization techniques that severely degrade the video quality.

You should not replace the algorithms that you find in c63. Alternative motion vector search algorithms and DCT encoding algorithms provide large speedup potential, but they distract from the main goal of this home exam, which is to identify and implement parallelization options. You should also not focus on improving your GPU implementation from Home Exam 2.

Two test sequences in YUV format are available in the /mnt/sdcard directory on the lab machines:

These should be used as input to the provided c63 encoder, and can be used to test your implementations.

 

Precode

The precode consists of the reference c63 code including:

The precode is written in C. You should not touch the decoder or c63pred.

The precode can be downloaded from a Git repository here:

git clone https://bitbucket.org/mpg_code/inf5063-codec63.git

You must login to the lab machines connected with PCI Express for this assignment. Information about how to access the machines can be found in the Dolphin FAQ.

You are free to adapt, modify or completely rewrite the provided encoder to take full advantage of the target architecture. You are however not allowed to change out the algorithms for Motion Estimation, Motion Compensation or DCT/iDCT. You are not allowed to paste any other pre-written code into your implementation. You are also not allowed to post any code from the home exam on the Internet.

Some usage examples:

To encode the foreman test sequence

$ ./c63enc -w 352 -h 288 -o /tmp/test.c63 foreman.yuv

To decode a sequence

$ ./c63dec /tmp/test.c63 /tmp/test.yuv

To playback a raw yuv file

$ vlc --rawvid-fps 25 --rawvid-width 352 --rawvid-height 288 --rawvid-chroma J420 /tmp/test.yuv

Evaluation

Write a short report where you discuss your results. The exam will be graded on how well you are able to take advantage the distributed architecture to solve the task at hand.

In evaluation, we will consider (in order):

  1. A program that works (on a Tegra and an x86 machine). (**)
  2. PCI Express is used to transport data between the machines. (*)
  3. Effective use of PCI Express:
    • Efficient use of SISCI to move vide between I/O and processing machines.
    • Efficient syncronization with SISCI between the two machines.
    • Moving data efficiently from the I/O machine to the GPU in the processing machine.
  4. Use of the potential of a distributed 3-frame pipeline. 
  5. Good documentation:
    • Readable, well-commented code.
    • Optimization steps and performance results
    • Comparison of / reflection about alternative approaches
    • Complete and well-presented document
  6. Output video has a quality with a similar or better PSNR and file size as the reference encoder’s.
  7. Bonus points for other non-obvious optimizations such as Motion Compensation and/or VLC.

(*) Automatic fail if this is not fulfilled. (**) We do not debug code before testing; correctness and effectiveness are not evaluated if this is not fulfilled.

 

Report

You must write up the results as a technical report of no more than 4 pages in ACM format. The report should serve as a guide to the code modifications you have made and the resulting performance changes.  

 

Machine Setup

The PCI Express cluster os situated at Simula Research Laboratory. Machine names and how to access them can be found in the Dolphin FAQ

Contact inf5063@ifi.uio.no or use the Slack space if you have problems logging in.

 

Formal Information

The deadline for handing in your assignment is: Friday, December 1st at (16:00:00.00).

Deliver your code and report (as PDF) at https://devilry.ifi.uio.no/. Submit the poster (as PDF) to inf5063@ifi.uio.no.

The groups should also prepare a poster (2 x A3 pages) and a quick 2 minutes talk (without slides) where you pitch your poster for the group session on December 7th. Name the poster with your group name, and email the poster to inf5063@ifi.uio.no no later than noon (12:00) on December 6th. We will then print the poster for you.

For questions and course related chats, we have created a Slack space: https://inf5063.slack.com

There will be a prize for best poster/presentation (awarded by an independent panel and independent of the grade).

Please check the Dolphin FAQ page for updates and FAQ

For questions please contact:

inf5063@ifi.uio.no

 

[1] http://www.w3.org/Graphics/JPEG/itu-t81.pdf

[2] http://en.wikipedia.org/wiki/JPEG#JPEG_codec_example