yasoob / ICSPCodec

Implementation of video codec based on H.261

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ICSP Codec

Introduction

This is a implementation of video codec based on C/C++.
This codec is ICSP codec because I implemented this codec when I was working in the ICSP lab(http://icsp.hanyang.ac.kr).
ICSP Codec has been implemented based on basic video compression theories. The characteristics of the ICSP codec are as follows.

  • 16x16 macro block unit process(One 16x16 macro block is divided by four 8x8 blocks in each compression process)
  • Intra Prediction / Inter Prediction
  • All Intra Prediction Mode
  • DCT Transformation / Quantization
  • Entropy Coding
  • AVX Intrinsics Parellel Processing
  • Only Supporting CIF(4:2:0)

ICSP Codec Encoding Process

Encoding Proces

Intra Prediction

ICSP codec supports three-direction prediction mode in intra prediction. Vectical, Horizontal, DC mode.

Intra prediction

The green square is a reconstructed reference pixel around the current block. Intra prediction uses a reference pixel corresponding to each direction mode to generate a prediction block. DC mode generate prediction block using mean value of vertical and horizontal reference pixels. If reference pixels does not exist by some reason(first block, first row blocks, first colum blocks...), then it is replaced by integer value 127. You can set a period of a intra prediction frame in one sequcece by initialize 'intraPerid' as a integer value. For example, if you initialize 'intraPeriod' as 10 and one sequence consists of 300 frames, then every 10 frames are encoded as intra prediction mode and other frames are encoded as inter prediction.

All Intra Prediction Mode

ICSP Codec supports special mode, All Intra Prediction Mode. All Intra Prediction Mode encodes all of frames in one sqeuence using intra prediction mode. You can operate ICSP Codec as All Intra Prediction Mode by initialize 'intraPerid' veriable as ALL_INTRA macro value.

Inter Prediction

ICSP Codec supports only backword prediction. That is, inter prediction blocks are generated by using only previous frame(obviously, previous frame is a reconstrunted frame).

image

Basically, inter prediction process is to find the location of the most similar block with current block in previous frame(fn-1). Also, current blocks are in current frame(fn). in the figure above, red block is a current block in current frame. Green block is the one closest to current block among negibor blocks around red block in previous frame. ICSP Codec uses search method as 'spiral search'. Spiral search in ICSP codec means that search the block by round and round the current block position like sprial per a pixel unit.

image

I attached the above picture for conceptual understanding about spiral search. I hope there is no misunderstanding. ICSP codec moves search block by a pixel unit and finds a similar block which has minimum SAD(Sum of Absolute Difference) between a current block and a seach block.

Up to this point, it was a conceptual explanation of the inter prediction. The inter prediction consists of motion estimation and motion compensation. I will briefly explain these two processes.

Motion Estimation

Simplly speaking, motion estimation is to generate a motion vector that represents the distance between a current block position and a similar block which has the smallest SAD.

image

Motion Compensation

In ICSP Codec, motion compensation is to make prediction blocks using motion vectors and to make differential blocks(current block - prediction block) in results. Differential blocks are used to other encode/decode process like DCT, Quantization, entropy coding process.

DCT

Discrete Cosine Transform(DCT) decomposes a signal for time or spatial domain into frequency components such as Fourier transform. That is, DCT transforms the image of the spatial domain into the frequency domain. When the image is transformed by DCT, it is decomposed DC(direct current) components having a frequency of 0 and sixty-three AC(alternating current) components having a frequency. Since most image are composed of a large number of low frequency components and a few high frequency components, when DCT is applied to image, most of energy is concentrated in DC and just a few energy are widely spread to AC components having high frequency. This phenomenon is called 'Energy compaction' which is very widely used in the video compression field.
ICSP codec supports 8x8 block unit DCT. A 8x8 image block is decomposed by one DC(direct current) component which is having a frequency of 0 and sixty-three AC(alternating current) components having a relatively high frequency.

image

When the image block of the spatial domain is transformed into the frequency domain through the DCT. Most of energy us concentrated on the DC component and the low frequency AC component, and the remaining energy is widely distributed on the high frequency AC components.
the above figure is intended to explain DCT, so it is different from the exact DCT result. In fact, the image block used for DCT in the ICSP codec is a difference block and DCT result is stored as a floating type.

Quantization

Quantization is a process of dividing a transformed block by a constant(quantization step, QStep). Through quantization, most of the AC Components which have a high frequency component have 0 value or a very small value. The more AC components with very small values, the less bits are allocated in entropy coding process, resulting in a compression efficiency increas. However, this process directly damage to image quality. The larger the QStep, the larger the loss of image quality.
ICSP codec supports independent AC and DC component quantization(QStepAC, QStepDC). Also, ICSP codec provides 1, 8 and 16 QStep.

image

ZigZag Reordering

Zig-zag reordering arranges elements of a quantized block in certain order. The order is not just sequential order, but diagonal order like zig-zag order. please refer below figure.

image

By zig-zag reordering, a DC component and AC components with low frequency are arranged on front side and AC components with high frequency are arranged on rear side. If QStep is large(16 or bigger than 16), most elements of high frequency have 0 or very small values, so a number of 0 values are arranged on rear side.
In above figure case, the result of zig-zag reordering is like 61,13,12,0,11,7,0,6,0,...,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0... Most of values on rear side are 0. In ICSP codec, if all of AC components are 0, some unique process is applied into bitstream for compression efficiency by using AC flag. This technique will be explained in detal in Bit Steam Syntax part.

Entropy Coding

In Entropy coding, The list of numbers from zig-zag reordering is converted into a bitstream. Each numbers in the list are converted into bundle of bits. The amount of bits allocated depends on the frequency of occurence of the number. In above case, a few bits are allocated to 0 because 0 is the highest number of occurrences. Conversely, many bits are allocated to other numbers such as 63, 13, 12.... If the number of 0 values are lager in the list, the size of bitstream will become smaller and smaller. ICSP codec has already determined code word(bit form) according to certain range of integers, I didn't implement huffman coding by myself. I just convert a number into determined code word according to the range to which the number has belonged.

Bit Stream Syntax

Stream Header

The Header of ICSP bitsream has fixed structure like below figure. The size of the header is 14 bytes.

image

Stream Body

Intra Prediction Frame Body

image

Inter Prediction Frame Body

image

Vectorization Mode(SIMD, AVX Intrinsics)

ICSP Codec have been basically implemented by pixel unit process(Scalar process). Recently, I converted this pixel unit process into vectorization process using AVX Intrinsics fucntions. Since I didn't familiar with SIMD yet, There may be some unskillful logic and code. However, This conversion significantly shortened the computational complexity in similar quality. I will describe detail comparison result in following chapter. image

Computational Complexity Comparison

All Intra Prediction Mode

Sequences Scalar Encoding Time(Sec) Vector Encoding Time(sec) Encoding Time Reduction Rate(%)
akiyo 13.65 12.12 89%
children 16.9 12.91 76%
coastguard 13.98 12.36 88%
container 14.84 11.38 77%
football 5.05 4.04 80%
foreman 17.21 12.42 72%
hall monitor 14.31 12.83 90%
mobile 15.3 12.01 79%
mother_daughter 13.72 12.01 88%
news 14.26 11.41 80%
stefan 16.35 13.06 80%
table 13.62 11.33 83%
Average x x 82%

Inter/Intra Hybrid Prediction(intra period : 10)

Sequences Scalar Encoding Time(Sec) Vector Encoding Time(sec) Encoding Time Reduction Rate(%)
akiyo 30.52 13.71 45%
children 25.9 11.27 44%
coastguard 30.71 12.51 41%
container 28.58 13.31 47%
football 9.59 4.4 46%
foreman 29.33 11.94 41%
hall monitor 30.53 12.31 40%
mobile 29.08 12.36 42%
mother_daughter 30.95 12.36 40%
news 26.34 11.43 43%
stefan 29.04 13.09 45%
table 28.51 12.81 45%
Average x x 43%

I will upload comparison of operation speed in each encoding process in details.

Contents

Folder description
data test sequences
source C++ source code

About

Implementation of video codec based on H.261

License:GNU General Public License v3.0


Languages

Language:C++ 100.0%