HamedBabaei / PAN2019_cross_domain

PAN 2019, Cross-Domain Authorship Attribution Task

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cross-domain Authorship Attribution: Author Identification using a Multi-Aspect Ensemble Approach

This project contains our paper's codes in python, used to indetifying authors in PAN 2019 competition.

Usage

Parameters:

authorship_attribution.py [-h] [-i INPUT] [-o OUTPUT] [-n N] [-ft FT]
                                 [-pt PT]
optional arguments:
  -h, --help                      show this help message and exit
  -i INPUT, --input INPUT         path to input dataset
  -o OUTPUT, --output OUTPUT      path to output directory
  -n N                            n gram
  -ft FT                          frequency term for tfidf and ngram
  -pt PT                          threshold for UNK authors

and the default parameters are:

  • n : 4
  • ft: 5
  • pt: 0.08

running the script usage:

python authorship_attribution.py -i path_to_input_dataset_dir -o path_to_output_dir

Citation

Please cite us as:

M Rahgouy, HB Giglou, T Rahgouy, MK Sheykhlan, E Mohammadzadeh. Cross-domain Authorship Attribution: Author Identification using a Multi-Aspect Ensemble Approach - Notebook for PAN at CLEF 2019. In CLEF 2019 Evaluation Labs and Workshop–Working Notes Papers. CEUR-WS. org.

About

PAN 2019, Cross-Domain Authorship Attribution Task


Languages

Language:Python 100.0%