ICSP Codec
Introduction
This is a implementation of video codec based on C/C++.
This codec is ICSP codec because I implemented this codec when I was working in the ICSP lab(http://icsp.hanyang.ac.kr).
ICSP Codec has been implemented based on basic video compression theories.
The characteristics of the ICSP codec are as follows.
- 16x16 macro block unit process(One 16x16 macro block is divided by four 8x8 blocks in each compression process)
- Intra Prediction / Inter Prediction
- All Intra Prediction Mode
- DCT Transformation / Quantization
- Entropy Coding
- AVX Intrinsics Parellel Processing
- Only Supporting CIF(4:2:0)
ICSP Codec Encoding Process
Intra Prediction
ICSP codec supports three-direction prediction mode in intra prediction. Vectical, Horizontal, DC mode.
The green square is a reconstructed reference pixel around the current block. Intra prediction uses a reference pixel corresponding to each direction mode to generate a prediction block. DC mode generate prediction block using mean value of vertical and horizontal reference pixels. If reference pixels does not exist by some reason(first block, first row blocks, first colum blocks...), then it is replaced by integer value 127. You can set a period of a intra prediction frame in one sequcece by initialize 'intraPerid' as a integer value. For example, if you initialize 'intraPeriod' as 10 and one sequence consists of 300 frames, then every 10 frames are encoded as intra prediction mode and other frames are encoded as inter prediction.
All Intra Prediction Mode
ICSP Codec supports special mode, All Intra Prediction Mode. All Intra Prediction Mode encodes all of frames in one sqeuence using intra prediction mode. You can operate ICSP Codec as All Intra Prediction Mode by initialize 'intraPerid' veriable as ALL_INTRA macro value.
Inter Prediction
ICSP Codec supports only backword prediction. That is, inter prediction blocks are generated by using only previous frame(obviously, previous frame is a reconstrunted frame).
Basically, inter prediction process is to find the location of the most similar block with current block in previous frame(fn-1). Also, current blocks are in current frame(fn). in the figure above, red block is a current block in current frame. Green block is the one closest to current block among negibor blocks around red block in previous frame. ICSP Codec uses search method as 'spiral search'. Spiral search in ICSP codec means that search the block by round and round the current block position like sprial per a pixel unit.
I attached the above picture for conceptual understanding about spiral search. I hope there is no misunderstanding. ICSP codec moves search block by a pixel unit and finds a similar block which has minimum SAD(Sum of Absolute Difference) between a current block and a seach block.
Up to this point, it was a conceptual explanation of the inter prediction. The inter prediction consists of motion estimation and motion compensation. I will briefly explain these two processes.
Motion Estimation
Simplly speaking, motion estimation is to generate a motion vector that represents the distance between a current block position and a similar block which has the smallest SAD.
Motion Compensation
In ICSP Codec, motion compensation is to make prediction blocks using motion vectors and to make differential blocks(current block - prediction block) in results. Differential blocks are used to other encode/decode process like DCT, Quantization, entropy coding process.
DCT
Discrete Cosine Transform(DCT) decomposes a signal for time or spatial domain into frequency components such as Fourier transform.
That is, DCT transforms the image of the spatial domain into the frequency domain. When the image is transformed by DCT, it is decomposed DC(direct current) components having a frequency of 0 and sixty-three AC(alternating current) components having a frequency.
Since most image are composed of a large number of low frequency components and a few high frequency components, when DCT is applied to image, most of energy is concentrated in DC and just a few energy are widely spread to AC components having high frequency. This phenomenon is called 'Energy compaction' which is very widely used in the video compression field.
ICSP codec supports 8x8 block unit DCT. A 8x8 image block is decomposed by one DC(direct current) component which is having a frequency of 0 and sixty-three AC(alternating current) components having a relatively high frequency.
When the image block of the spatial domain is transformed into the frequency domain through the DCT. Most of energy us concentrated on the DC component and the low frequency AC component, and the remaining energy is widely distributed on the high frequency AC components.
the above figure is intended to explain DCT, so it is different from the exact DCT result. In fact, the image block used for DCT in the ICSP codec is a difference block and DCT result is stored as a floating type.
Quantization
Quantization is a process of dividing a transformed block by a constant(quantization step, QStep). Through quantization, most of the AC Components which have a high frequency component have 0 value or a very small value. The more AC components with very small values, the less bits are allocated in entropy coding process, resulting in a compression efficiency increas. However, this process directly damage to image quality. The larger the QStep, the larger the loss of image quality.
ICSP codec supports independent AC and DC component quantization(QStepAC, QStepDC). Also, ICSP codec provides 1, 8 and 16 QStep.
ZigZag Reordering
Zig-zag reordering arranges elements of a quantized block in certain order. The order is not just sequential order, but diagonal order like zig-zag order. please refer below figure.
By zig-zag reordering, a DC component and AC components with low frequency are arranged on front side and AC components with high frequency are arranged on rear side. If QStep is large(16 or bigger than 16), most elements of high frequency have 0 or very small values, so a number of 0 values are arranged on rear side.
In above figure case, the result of zig-zag reordering is like 61,13,12,0,11,7,0,6,0,...,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0... Most of values on rear side are 0. In ICSP codec, if all of AC components are 0, some unique process is applied into bitstream for compression efficiency by using AC flag. This technique will be explained in detal in Bit Steam Syntax part.
Entropy Coding
In Entropy coding, The list of numbers from zig-zag reordering is converted into a bitstream. Each numbers in the list are converted into bundle of bits. The amount of bits allocated depends on the frequency of occurence of the number. In above case, a few bits are allocated to 0 because 0 is the highest number of occurrences. Conversely, many bits are allocated to other numbers such as 63, 13, 12.... If the number of 0 values are lager in the list, the size of bitstream will become smaller and smaller. ICSP codec has already determined code word(bit form) according to certain range of integers, I didn't implement huffman coding by myself. I just convert a number into determined code word according to the range to which the number has belonged.
Bit Stream Syntax
Stream Header
The Header of ICSP bitsream has fixed structure like below figure. The size of the header is 14 bytes.
Stream Body
Intra Prediction Frame Body
Inter Prediction Frame Body
Vectorization Mode(SIMD, AVX Intrinsics)
ICSP Codec have been basically implemented by pixel unit process(Scalar process).
Recently, I converted this pixel unit process into vectorization process using AVX Intrinsics fucntions.
Since I didn't familiar with SIMD yet, There may be some unskillful logic and code.
However, This conversion significantly shortened the computational complexity in similar quality.
I will describe detail comparison result in following chapter.
Computational Complexity Comparison
All Intra Prediction Mode
Sequences | Scalar Encoding Time(Sec) | Vector Encoding Time(sec) | Encoding Time Reduction Rate(%) |
---|---|---|---|
akiyo | 13.65 | 12.12 | 89% |
children | 16.9 | 12.91 | 76% |
coastguard | 13.98 | 12.36 | 88% |
container | 14.84 | 11.38 | 77% |
football | 5.05 | 4.04 | 80% |
foreman | 17.21 | 12.42 | 72% |
hall monitor | 14.31 | 12.83 | 90% |
mobile | 15.3 | 12.01 | 79% |
mother_daughter | 13.72 | 12.01 | 88% |
news | 14.26 | 11.41 | 80% |
stefan | 16.35 | 13.06 | 80% |
table | 13.62 | 11.33 | 83% |
Average | x | x | 82% |
Inter/Intra Hybrid Prediction(intra period : 10)
Sequences | Scalar Encoding Time(Sec) | Vector Encoding Time(sec) | Encoding Time Reduction Rate(%) |
---|---|---|---|
akiyo | 30.52 | 13.71 | 45% |
children | 25.9 | 11.27 | 44% |
coastguard | 30.71 | 12.51 | 41% |
container | 28.58 | 13.31 | 47% |
football | 9.59 | 4.4 | 46% |
foreman | 29.33 | 11.94 | 41% |
hall monitor | 30.53 | 12.31 | 40% |
mobile | 29.08 | 12.36 | 42% |
mother_daughter | 30.95 | 12.36 | 40% |
news | 26.34 | 11.43 | 43% |
stefan | 29.04 | 13.09 | 45% |
table | 28.51 | 12.81 | 45% |
Average | x | x | 43% |
I will upload comparison of operation speed in each encoding process in details.
Contents
Folder | description |
---|---|
data | test sequences |
source | C++ source code |