wolverinn / HEVC-CU-depths-dataset

A dataset that contains the Coding Unit image files and their corresponding depths for HEVC intra-prediction.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HEVC-CU-depths-dataset

A dataset that contains the Coding Unit image files and their corresponding depths for HEVC intra-prediction.

What's in the dataset?

In HEVC intra-prediction, each I-frame is divided into 64x64 Coding Tree Units (CTU). For each 64x64 CTU, there's a depth prediction represented by a 16x16 matrix. The elements in the matrix are 0, 1, 2 or 3, indicating depth 0/1/2/3 for a 4x4 block in the CTU.

The dataset contains images and corresponding labels. There're three folders: train, validation, test

Unzip the images and label files first!

  • Image files: Each image may have different size, and is one frame extracted from a video. When you use it, you can split the image into several 64x64 images or 32x32 and so on.
  • Labels: The labels are in the pkl folder. For one CTU, which is a 64x64 image file, the label will be a Python list with a length of 16. Why a length 16 vector instead of a 16x16 matrix? Because there's redundant information for a 16x16 matrix, and it can be reduced to a 16x1 vector. So, for a 64x64 CTU, it has 16 labels, each label corresponds to a 16x16 image block in the CTU.

If you split the image files into 64x64 CTUs, the size of the train dataset is around 110K images. The size of the validation dataset is around 40K images.

How to relate images and labels

The name of a image file is like: v_0_42_104_.jpg, which means v_VideoNumber_FrameNumber_CtuNumber_.jpg.

You can use the VideoNumber to find the corresponding .pkl file, like v_0.pkl. Then, when you load the pickle file, you will get a Python dict:

{
    "2":{
        "0":[...]
        "1":[...]
        .
        .
        .
        "103":[...]
    }
    "27":{
        ...
    }
}

To get the label you want for a certain 64x64 CTU, you can index the dict by: label_vector = video_dict[FrameNumber][CtuNumber], for example: label_vector = video_dict["42"]["104"]. The label_vector will be a length 16 Python list.

Example for loading the dataset

Here's an example for loading the dataset in deep learning projects implemented in PyTorch. Find the example in load_example.py. Mind that the example is used to load 32x32 image blocks and predict 4 corresponding labels.

How to use the dataset in deep learning?

You can refer to these documents:

In HEVC intra-prediction, for each 64x64 CTU, it will take the encoder a lot of time to find the best CU depths, which is the 16x16 matrix. So we can use a deep learning approach to predict the CU depths for a 64x64 CTU.

Advanced Option: build your own dataset

I provide my source code for generating the dataset here. You can modify my code gen_dataset.py to build your own dataset. It's better to download the whole Advanced folder. Here are some tips:

TIP 1: Download YUV file resources

YUV files are used as input of HEVC encoder, and as output, you will get the 16x16 matrix, which you can later process. At the same time, you can use FFmpeg to extract each frame from YUV files.

Here are some sites to find YUV resources:

TIP 2: Check the directories in the code for:

  • The directory of image files and pickle files: /dataset/img/train, /dataset/img/test, /dataset/img/validation, /dataset/pkl/train, /dataset/pkl/test, /dataset/pkl/validation
  • The directory of YUV files: /yuv-file/train, /yuv-file/test, /yuv-file/validation
  • The directory of the config files for HEVC encoder: /config
  • The directory to store temporary frames extracted from YUV files: /temp-frames

TIP 3: Here are the YUV files already used in the dataset:

type Train Validation Test
2K NebutaFestival_2560x1600_60 PeopleOnStreet_2560x1600_30 Traffic_2560x1600_30
SteamLocomotiveTrain_2560x1600_60
1080p BasketballDrive_1920x1080_50 BQTerrace_1920x1080_60 Cactus_1920x1080_50
Kimono1_1920x1080_24
Tennis_1920x1080_24
ParkScene_1920x1080_24
720p FourPeople_1280x720_60 SlideShow_1280x720_20 KristenAndSara_1280x720_60
SlideEditing_1280x720_30
480p BasketballDrill_832x480_50 Flowervase_832x480_30 BQMall_832x480_60
Keiba_832x480_30 Mobisode2_832x480_30 PartyScene_832x480_50
RaceHorses_832x480_30
288 waterfall_352x288_20 akiyo_352x288_20 container_352x288_20
flower_352x288_20 coastguard_352x288_20
highway_352x288_20
news_352x288_20
paris_352x288_20
240 BasketballPass_416x240_50 BlowingBubbles_416x240_50 BQSquare_416x240_60

The markdown file in the Advanced folder explains how my TAppEncoder.exe is made. It shows how to modify HEVC source code to output the information you need like depth info.

It will take some time to generate the dataset. Be prarared.

About

A dataset that contains the Coding Unit image files and their corresponding depths for HEVC intra-prediction.

License:GNU General Public License v3.0


Languages

Language:Python 100.0%