andreabac3 / Bot-Gender-Profiling-Pan2019

πŸ€– In this work, we present our approach for the Author Profiling task of PAN 2019. The task is divided into two sub-problems, bot, and gender detection, for two different languages: English and Spanish.

Home Page:https://pan.webis.de/clef19/pan19-web/author-profiling.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

πŸ€– Bot and Gender Detection of Twitter Accounts Using Distortion and LSA

License: GPL v3 License: GPL v3

Citation

To cite this work please use:

@article{bacciu2019bot,
  title={Bot and Gender Detection of Twitter Accounts Using Distortion and LSA},
  author={Bacciu, Andrea and La Morgia, Massimo and Mei, Alessandro and Nemmi, Eugenio Nerio and Neri, Valerio and Stefa, Julinda},
  year={2019}
}

Link to the original paper @ ceur-ws.org

Abstract

In this work, we present our approach for the Author Profiling task of PAN 2019. The task is divided into two sub-problems, bot, and gender detection, for two different languages: English and Spanish. For each instance of the problem and each language, we address the problem differently. We use an ensemble architecture to solve the Bot Detection for accounts that write in English and a single SVM for those who write in Spanish. For the Gender detection we use a single SVM architecture for both the languages, but we pre-process the tweets in a different way. Our final models achieve accuracy over the 90% in the bot detection task, while for the gender detection, of 84.17% and 77.61% respectively for the English and Spanish languages.

Getting Started

How to install

Requirements

  • git
  • Python 3.7
  • Pip

After pull the repository, you need to install all dependency.
We suggest the use of python environment.

pip3 install -r --user requirements.txt

Install spacy

Install spacy globally with admin permission. Execute the following command

python -m spacy download es_core_news_sm 

Enter in python3 shell and try to load the 'es_core_news_sm'

import spacy
spacy.load('es_core_news_sm')

Dataset Directory Structure

dataset
|
└───en
β”‚   β”‚   id1.xml
β”‚   β”‚   id2.xml
β”‚   β”‚   ...
|   |   ...
β”‚   β”‚   truth-train.txt
β”‚   β”‚   truth-dev.txt
β”‚   
└───es
    β”‚   id1.xml
    β”‚   id2.xml
    β”‚   ...
    |   ...
    β”‚   truth-train.txt
    β”‚   truth-dev.txt

Authors

About

πŸ€– In this work, we present our approach for the Author Profiling task of PAN 2019. The task is divided into two sub-problems, bot, and gender detection, for two different languages: English and Spanish.

https://pan.webis.de/clef19/pan19-web/author-profiling.html

License:GNU General Public License v3.0


Languages

Language:Python 100.0%