Documentation

Problem Description

This is a project for encoding and decoding a compressed video file by taking up to 100 images of various formats, and stitching them in sequence. We have taken MPEG compression methods as a reference, but chose to create our own codec to explore options without strict restrictions.

Group Members

* Qifan He (heqf@bu.edu)
* Vrushali Mahajan (vmahajan@bu.edu)
* Alex Bennet (gottbenn@bu.edu)
* Muhammad Zuhayr Raghib (mzraghib@bu.edu)
* Akash Mehta (amehta22@bu.edu)

Implementation

Main

The purpose of this Class is to create a GUI and a pipeline flow for the rest of the code. It creates a JFrame that initiates buttons, Labels and Fields for user inputs. By using button Action Listeners, it creates the following pipeline - Read all Images into Image_Objects, which internally preprocesses the data to macroblocks, Normalizes the DCT co-efficients via DCT/iDCT, runs IntraFrame class and calls the bitstream class for saving the Encoded file. Parallely, it also provides the Pipeline of the Decoder, that reads the Encoded file, and calls the Player class for displaying the images on the screen.

Encoder

Each image is first read and divided into 8x8 pixel blocks (macroblock) for easy computation. All the frames are read this way, and are each saved as 'Image_Objects'. All the frames are saved in an ArrayList of Image_Objects called "allFrames".

The allFrames datastructure was designed for quick access. By dividing the frames into blocks, finding the exact indeces of interest is much easier to compute. The overall ArrayList has an amortized access time of O(1). Each block is a 2D array which also has an access time of O(1).

This base datastructure is passed through several methods in stages, and is modified in place at each stage.

DCT / IDCT Transforms

The DCT method takes a macroblock and performs DCT with a quantization table determined by the quality setting input from the GUI. The DCT is used to remove high frequency noise within the image and assist with the interframe compression. The algorithm for the 2d-DCT was the common row column method.

Compression

When reading the raw RGB values of jpeg images, for example, highly increases the size of the resulting byte array (see ByteStream generation). Therefore, to compensate for this, we decided to use lossy compression methods to achieve higher levels of compression, rather than lossless methods such as lempel-ziv compression.

Inter-Frame Compression

For inter frame compression, the goal is to take advantage of temporal redundancies in the image data. In other words, areas of the image that do change over a group of pictures are not saved for every frame.

For the project, a group of pictures (GOP) was defined as N. If N = 4, this means that the dataset of pictures is divided into groups of 4 images (the ramaining explanation will assume N = 4). The first image is called an I frame, and the remaining 3 frames are called P frames. This naming convention was inspired by terms in MPEG interframe compression, however the algorithm is developped in house.

.For each GOP: .. For each P frame: ... For each block in P frame: .... if the block at index i in the P frame is similar to the block at index i in the I frame: ..... Then only save the block in the I frame.

In the event that a block in the P frame is similar to block in the same location in its I frame in the GOP, the first pixel in the block is changed to '9999'. This acts like a flag for bytestream generation.

To gauge similarity between 2 blocks, the sum of the absolute error of the all the pixel luma (Y) values between the two blocks are calculated. If this sum of absolute errors value is below a threshold, then the two blocks are categorized as being similar.

Chroma subsampling

Because the human visual system is less sensitive to the position and motion of color than luminance [1], it is possible to sample the color details at a lower rate, while noticable changes to the video quality.

This was the main reason for converting RGB to YCrCb to separate the the luminance (luma) and chroma (color) components.

A 4:2:0 subsampling method is used for the project. In this variant, the chroma components are sub-sampled by a factor of 2, both horizontally as well as vertically, thus reducing to a factor of a quarter.

Bitstream Generation

The bitstream was generated by converting our ArrayList with the results of our interframe dataStructure. We converted each Y, Cr, and Cb value to a byte and then stored it within a file. Along with this information we stored a header that contained, the number of Macro blocks within a Frame, the GOP, the width, height, frames per second, and number of frames. On the decoder side this information was used to piece the bitstream back to the original data structure so the frames could be recreated.

Decoder

Decoder class, once instantiated converts the compressed file back to an Arraylist of macroblocks. The decoder does the following things:

Decodes each Y,Cr,Cb arrays associated with each macroblock.
RGB conversion
Creating a data structure containing all the macroblocks in a sequence. ArrayList<ArrayList> macroblocks = new ArrayList<>(); where myColor is a class containing the R,G,B values of each pixel in the macroblock.

Bytestream Inputs

The bytestream class required the inputs that were necessary for the header. The datastruct that contained the frame image_objects and blocktypes was passed as a global variable throughout the code.

Video Player

PC

The PC player is written in JFrame along with the main GUI. The player is in an individual frame that display BufferedImage sequence and repaint per 100 ms. The image repaint is put into another thread. Video effects are added onto the BufferedImage before display. 6 different effects are controlled through a JComboBox. The image processing algorithm is implemented by using Catalano Framework which is under license LGPL. For convinient and performance need, this framework is integrated into both PC Client and Android app.

Android

Android client is written by Android Studio.

UI: Video background and Constrained-Layout make the app looks nicer. The constrained layout could better fit screens of different size and resolution.
Backend: The video renderer and decoder run in two seperate thread for better performance. They shared data through a global ConcurrentLinkedQueue. Once the decoder is instantiated, it starts parsing the file into blocks. An arraylist storing a group of pictures of four, maitaining the I frame and compute P frames. After each frame is generated, a frame builder function will concatenate blocks into a Bitmap Image. As long as the global queue has less than 10 images, the decoder will offer new images into the queue. The decoder will poll the queue 10 times/s if the queue is not empty.
Video effects: same to the PC client.

Implementation Description

Final Features

1) Ability to utilize three other image formats

The java library BufferedImage is used. This accepts image formats: * jpeg * PNG * BMP * WBMP * GIF

2) Develop a Graphical User Interface that provides an intuitive method for entering and/or reorganizing files, progress bar for encoding, and controls for viewing the encoding.

A GUI was developped for the project. Please refer to INSTALL.txt for details on usage.

3) Provide "interesting" real-time video effects during playback.

Sepia, Sharpen, HeatMap, FakeHDR, Blur, FilmGrain.
Both PC Client and Android Client support those effets.
Change at real time.

4) Allow user to adjust parameters on movie creation in order to trade disk space for quality

The user has the opportunity to change a quality bar on the graphical users interface. This bar changes the allowable error threshold value between the I and P frames in our interframe compression scheme.

5) Develop an Android client for viewer

Nice and clean UI.
Real-time effects support.

References and Backround Materials

Livingstone, Margaret (2002). "The First Stages of Processing Color and Luminance: Where and What". Vision and Art: The Biology of Seeing. New York: Harry N. Abrams. pp. 46–67. ISBN 0-8109-0406-3.
Haskell B.G., Puri A. (2012) MPEG Video Compression Basics. In: Chiariglione L. (eds) The MPEG Representation of Digital Media. Springer, New York, NY, CH 2.2
https://msdn.microsoft.com/en-us/libr
http://www.vtvt.ece.vt.edu/research/references/video/DCT_Video_Compression/Feig92a.pdf
http://www.nyx.net/~smanley/dct/DCT.java (The following URL was used to assist with the development of the 2d-DCT and quantization functions)
Jridi, M., Ouerhani, Y., & Alfalou, A. (2013). Low complexity DCT engine for image and video compression. Real-Time Image and Video Processing 2013. doi:10.1117/12.2006174
https://developer.android.com/studio/index.html --Android Developer Website
docs.oracle.com --Java/JFrame/AWT official documentation

Code

All code and data (images) considered in this project can be found in the 'Master' branch of the project repository on Bitbucket.

Please follow the instructions in INSTALL.txt to execute the project.

Unit tests were also made for a few methods using JUnit 4.

Work Breakdown

The following lists the areas of responsibility that were lead by each group member. Signatures of all group members have been attached in 'signatures.jpg'.

Akash Mehta

Main function
Graphical User Interface
Image Object preprocessing, and macroblock division
Generate frames for video players from macroblocks

Muhammad Zuhayr Raghib

Interframe compression
Compression via chrominance subsampling
Data Normalization for matching code memory limits

Vrushali

Decoding Bytestream from video file format
Integrate file format with DCT values
Integrate file format with Inteframe class

Alex

DCT transform
Generating bytestream and video file format
Integrate file format with DCT values
Integrate file format with Inteframe class

Qifan

Android client for viewer
Special real-time playback effects
Java JFrame Player
Generate frames for video players from macroblocks

hisangke / Video-Compression