dzy-cxy / SEN_CSLR

Self-Emphasizing Network for Continuous Sign Language Recognition (AAAI2023)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SEN_CSLR

This repo holds codes of the paper: Self-Emphasizing Network for Continuous Sign Language Recognition.(AAAI 2023) [paper]

This repo is based on VAC (ICCV 2021). Many thanks for their great work!

Prerequisites

  • This project is implemented in Pytorch (>1.8). Thus please install Pytorch first.

  • ctcdecode==0.4 [parlance/ctcdecode],for beam search decode.

  • sclite [kaldi-asr/kaldi], install kaldi tool to get sclite for evaluation. After installation, create a soft link toward the sclite: mkdir ./software ln -s PATH_TO_KALDI/tools/sctk-2.4.10/bin/sclite ./software/sclite

  • SeanNaren/warp-ctc for ctc supervision.

Implementation

The implementation for the SSEM (line 47) and TSEM (line 23) is given in ./modules/resnet.py.

They are then equipped with the BasicBlock in ResNet in line 93 ./modules/resnet.py.

We later found that a multi-scale architecture could perform on par with what we report in the paper for TSEM, and thus implement it as such.

Data Preparation

You can choose any one of following datasets to verify the effectiveness of SEN.

PHOENIX2014 dataset

  1. Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]. Our experiments based on phoenix-2014.v3.tar.gz.

  2. After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
    ln -s PATH_TO_DATASET/phoenix2014-release ./dataset/phoenix2014

  3. The original image sequence is 210x260, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.

    cd ./preprocess
    python data_preprocess.py --process-image --multiprocessing

PHOENIX2014-T dataset

  1. Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]

  2. After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
    ln -s PATH_TO_DATASET/PHOENIX-2014-T-release-v3/PHOENIX-2014-T ./dataset/phoenix2014-T

  3. The original image sequence is 210x260, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.

    cd ./preprocess
    python data_preprocess-T.py --process-image --multiprocessing

CSL dataset

  1. Request the CSL Dataset from this website [download link]

  2. After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
    ln -s PATH_TO_DATASET ./dataset/CSL

  3. The original image sequence is 1280x720, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.

    cd ./preprocess
    python data_preprocess-CSL.py --process-image --multiprocessing

CSL-Daily dataset

  1. Request the CSL-Daily Dataset from this website [download link]

  2. After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
    ln -s PATH_TO_DATASET ./dataset/CSL-Daily

  3. The original image sequence is 1280x720, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.

    cd ./preprocess
    python data_preprocess-CSL-Daily.py --process-image --multiprocessing

Inference

PHOENIX2014 dataset

Backbone Dev WER Test WER Pretrained model
Baseline 21.2% 22.3% ---
ResNet18 19.5% 21.0% [Baidu] (passwd: jnii)
[Google Drive]

PHOENIX2014-T dataset

Backbone Dev WER Test WER Pretrained model
Baseline 21.1% 22.8% ---
ResNet18 19.3% 20.7% [Baidu] (passwd: kqhx)
[Google Drive]

CSL-Daily dataset

Backbone Dev WER Test WER Pretrained model
Baseline 32.8% 32.3% ---
ResNet18 31.1% 30.7% [Baidu] (passwd: xkhu)
[Google Drive]

​ To evaluate the pretrained model, run the command below:
python main.py --device your_device --load-weights path_to_weight.pt --phase test

Training

The priorities of configuration files are: command line > config file > default values of argparse. To train the SLR model, run the command below:

python main.py --device your_device

Note that you can choose the target dataset from phoenix2014/phoenix2014-T/CSL/CSL-Daily in line 3 in ./config/baseline.yaml.

Citation

If you find this repo useful in your research works, please consider citing:

@inproceedings{hu2023self,
  title={Self-Emphasizing Network for Continuous Sign Language Recognition},
  author={Lianyu Hu, Liqing Gao, Zekang Liu and Wei Feng},
  booktitle={Thirty-seventh AAAI conference on artificial intelligence},
  year={2023},
}

About

Self-Emphasizing Network for Continuous Sign Language Recognition (AAAI2023)

License:Apache License 2.0


Languages

Language:Python 98.4%Language:Shell 1.6%