instruction-set instruction-tuning korean large-language-models toxicity

KoTox

Repository for the paper 'Automatic Construction of a Korean Toxic Instruction Dataset for Ethical Tuning of Large Language Models'

The paper has been accepted for the 'Instruction Tuning and Instruction Following' workshop at NeurIPS 2023.

Paper : https://arxiv.org/abs/2311.18215

KoTox Dataset

KoTox is an automatically generated toxic instruction dataset in Korean, comprising 39K unethical instruction-output pairs.

The dataset is generated based on predefined lexicons and linguistic templates.

It is designed to address potentially harmful or misleading instructions by including outputs that refrain from providing specific opinions or information in response.

The dataset has been proven effective in mitigating toxicity in Korean Large Language Models (LLMs).

Citation

@misc{byun2023automatic,
      title={Automatic Construction of a Korean Toxic Instruction Dataset for Ethical Tuning of Large Language Models}, 
      author={Sungjoo Byun and Dongjun Jang and Hyemi Jo and Hyopil Shin},
      year={2023},
      eprint={2311.18215},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

KoTox is an automatically generated instruction dataset in Korean. The instruction set is used to mitigate the toxicity of the LLMs.

https://arxiv.org/pdf/2311.18215.pdf

instruction-set instruction-tuning korean large-language-models toxicity

MIT License