dengyl20 / SCOPE

Source code for the paper "Improving Chinese Spelling Check by Character Pronunciation Prediction: The Effects of Adaptivity and Granularity" in EMNLP 2022

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SCOPE

Source code for the paper "Improving Chinese Spelling Check by Character Pronunciation Prediction: The Effects of Adaptivity and Granularity" in EMNLP 2022

Environment

  • Python: 3.8
  • Cuda: 11.7 (NVIDIA GeForce RTX 3090)
  • Packages: pip install -r requirements.txt

Data

Raw Data

Data Processing

  • The code for cleaning data refers to REALISE.

Recommend to directly download the cleaned data from this and put them in the data directory.

  • process data to the training format.
python data_process/get_train_data.py \
    --data_path data \
    --output_dir data

Further Pre-train

Recommend to directly download the checkpoint after FPT.

Finetune

After the above steps are completed, modify the path parameters of the script and run:

bash train.sh

Inference

Please modify the path parameters of the script and run:

bash predict.sh

Citation

If you find this work is useful for your research, please cite our papers:

Improving Chinese Spelling Check by Character Pronunciation Prediction: The Effects of Adaptivity and Granularity

@article{li2022improving,
  title={Improving Chinese Spelling Check by Character Pronunciation Prediction: The Effects of Adaptivity and Granularity},
  author={Li, Jiahao and Wang, Quan and Mao, Zhendong and Guo, Junbo and Yang, Yanyan and Zhang, Yongdong},
  journal={arXiv preprint arXiv:2210.10996},
  year={2022}
}

About

Source code for the paper "Improving Chinese Spelling Check by Character Pronunciation Prediction: The Effects of Adaptivity and Granularity" in EMNLP 2022


Languages

Language:Python 98.8%Language:Shell 1.2%