birolkuyumcu / toxic_spans

Detect toxic spans in toxic texts

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Toxic Spans Detection (SemEval 2021 Task 5)

The Toxic Spans Detection task concerns the evaluation of systems that detect the spans that make a text toxic, when detecting such spans is possible. Moderation is crucial to promoting healthy online discussions. Although several toxicity (a.k.a. abusive language) detection datasets (Wulczyn et al., 2017; Borkan et al., 2019) and models (Schmidt and Wiegand, 2017; Pavlopoulos et al., 2017b; Zampieri et al., 2019) have been released, most of them classify whole comments or documents, and do not identify the spans that make a text toxic. But highlighting such toxic spans can assist human moderators (e.g., news portals moderators) who often deal with lengthy comments, and who prefer attribution instead of just a system-generated unexplained toxicity score per post. The evaluation of systems that could accurately locate toxic spans within a text is thus a crucial step towards successful semi-automated moderation.

See more about this task here or directly on our Codalab site.

  • In this repository you will find a notebook with code to prepare a valid submission.
  • Evaluation code and baseline methods are included.
  • The trial, train and test data that were used in the 2021 SemEval challenge are also included.

About

Detect toxic spans in toxic texts

License:Creative Commons Zero v1.0 Universal


Languages

Language:Jupyter Notebook 54.4%Language:Python 45.6%