Repository accompanying the CLEF 2021 eRisk Workshop Working Notes for the UNSL team (Universidad Nacional de San Luis). This repository contains the code necessary to:
- create the conda environment to run the code;
- create training and testing datasets based on Reddit for the tasks T1 and T2;
- pre-process datasets;
- train and test the proposed models (EarlyModel, EARLIEST and SS3);
- evaluate their performance using the measures ERDE and F Latency.
Make sure you have conda
installed. If that's not the case, you can install it following this guide.
Then, to create the conda environment necessary to run the code, run:
conda env create -f environment.yml
# Activate the environment using:
conda activate unsl_erisk_2021
To generate the datasets for T1, for example, run:
python -m datasets.make_reddit_corpus t1 keep
The resulting raw datasets will be stored in datasets/interim/t1/
: t1-train-raw.txt
for the training dataset, and t1-test-raw.txt
for the testing dataset.
To pre-process the datasets for T1, run:
python -m datasets.clean_corpus t1
The resulting pre-processed datasets will be stored in datasets/interim/t1/
: t1-train-clean.txt
for the training dataset, and t1-test-clean.txt
for the testing dataset.
To train and test the models on the datasets, use the scripts under the folder examples/
.
To evaluate the EarlyModel, run:
python -m examples.earlymodel_test t1 -n 30
To evaluate the EARLIEST, run:
python -m examples.earliest_test t1 cpu -n 30
To evaluate the SS3, run:
python -m examples.ss3_test t1 -n 30
If you use this code in a scientific publication, we would appreciate citations to the following paper:
J. M. Loyola, S. Burdisso, H. Thompson, L. Cagnina, M. Errecalde, UNSL at eRisk 2021: A comparison of three early alert policies for early risk detection, in: Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum, Bucarest, Romania, September 21-24, 2021.