This repository contains reproducible experiments of our article Applications of deep language models for reflective writings, with experiments using classification of reflective types in reflective writings.
We hope that the great ease of reproducibility of our results will allow other researchers a head start on further research in related topics.
For each step, we provide a Google Colab notebook that will set everything ready for you to use.
However, the core package 'reflection_classification', as well as evaluation scripts will as well work locally. Here are the instructions to run the package in the new environment, for example, locally:
We presume you have linux-like system with python3.8 installed.
git clone {this repository}
cd reflection-classification
# do not forget to activate, or create an appropriate environment here:
# here's how you create it (presuming you have python3.8 installed,
# you can check with `which python`, or on Windows `where python`):
python -m venv reflection
# on Windows `py -m venv reflection`
source reflection/bin/activate
# on Windows: `.\reflection\Scripts\activate`
# install the package with dependences:
python -m pip install -e .
As the functionality of simple, shallow classifiers is very distinct from the neural ones, we provide a separate functionality for experimenting with each.
You can train and evaluate selected shallow classifier
using train_eval_shallow_classifier.py
,
as a standalone python application:
python scripts/train_eval_shallow_classifier.py [--args]
with following arguments:
usage: train_eval_shallow_classifier.py [-h] [--classifier CLASSIFIER] --sentences_dir SENTENCES_DIR [--train_confidence_threshold TRAIN_CONFIDENCE_THRESHOLD]
[--test_confidence_threshold TEST_CONFIDENCE_THRESHOLD] [--use_context USE_CONTEXT] [--vocabulary_size VOCABULARY_SIZE]
optional arguments:
-h, --help show this help message and exit
--classifier CLASSIFIER
Classifier to use. One of: {random_forrest, logistic_regression, naive_bayes, support_vector_classifier}
--sentences_dir SENTENCES_DIR
Directory with {split}/sentence.tsv of annotated sentences
--train_confidence_threshold TRAIN_CONFIDENCE_THRESHOLD
Minimal confidence threshold for sentences to train on.
--test_confidence_threshold TEST_CONFIDENCE_THRESHOLD
Minimal confidence threshold for sentences to test on.
--use_context USE_CONTEXT
Whether the model was trainer using context.
--vocabulary_size VOCABULARY_SIZE
Number of top-n most-occurring words used to create Bag of Words representation for classification
Our trained models are available for download: see the instructions in
classifiers/models
, you should be able to reproduce the published results without a new training.
Note that for reproducing a training of neural classifier, you might need an access to at least one GPU with at least 15 GB of GRAM. We used Nvidia Tesla T4 for training the referenced models. Compared to CPU training (40 cores), his will cut the training time from 3-4 days to 6-10 hours, depending on configuration.
You can train neural classifiers
using classifiers/train_neural_classifier.py
,
as a standalone python application:
python scripts/train_neural_classifier.py [--args]
with following arguments
usage: train_neural_classifier.py [-h] --model_name MODEL_NAME --sentences_dir SENTENCES_DIR [--train_confidence_threshold TRAIN_CONFIDENCE_THRESHOLD] --trained_model_dir
TRAINED_MODEL_DIR --device DEVICE [--eval_on_test_set EVAL_ON_TEST_SET] [--use_context USE_CONTEXT]
optional arguments:
-h, --help show this help message and exit
--model_name MODEL_NAME
Model name, or local path to finetune.
--sentences_dir SENTENCES_DIR
Directory with .tsvs of annotated sentences
--train_confidence_threshold TRAIN_CONFIDENCE_THRESHOLD
Minimal confidence threshold for sentences to train on.
--trained_model_dir TRAINED_MODEL_DIR
Directory to be filled with trained model
--device DEVICE Device used for training. One of {cpu, cuda, cuda:[idx]}
--eval_on_test_set EVAL_ON_TEST_SET
Whether to evaluate model (having lowest eval loss) on test set
--use_context USE_CONTEXT
Whether the model will be trained using context.
Note that the training process produces training logs with evaluations
on validation set, that is used for picking the best model on output.
These logs are saved in
runs
directory and can be accessed using tensorboard:
tensorboard --logdir=runs
Following command evaluates the accuracy of your model. The model can be
- Downloaded automatically from HuggingFace by setting
--TRAINED_MODEL_DIR MU-NLPC/XLM-R-large-reflective-conf4
. Use this option to reproduce our results. - Downloaded manually - see models directory.
- Trained using the script above and picked from
--trained_model_dir
you chose before.
python scripts/eval_neural_classifier.py [--args]
with following arguments
usage: eval_neural_classifier.py [-h] --trained_model_dir TRAINED_MODEL_DIR --sentences_dir SENTENCES_DIR [--device DEVICE]
[--test_confidence_threshold TEST_CONFIDENCE_THRESHOLD] [--use_context USE_CONTEXT]
optional arguments:
-h, --help show this help message and exit
--trained_model_dir TRAINED_MODEL_DIR
Local path containing pre-trained model, filled on training, or downloaded separately
--sentences_dir SENTENCES_DIR
Directory with {split}/sentence.tsv of annotated sentences
--device DEVICE Device used to infer. One of {cpu, cuda, cuda:[idx]}
--test_confidence_threshold TEST_CONFIDENCE_THRESHOLD
Minimal confidence threshold for sentences to test on.
--use_context USE_CONTEXT
Whether the model was trainer using context.
A table of evaluation best-performing shallow (Random Forrest) and neural (XLM-RoBERTa) models, trained and tested on sentences having mean category confidence over the threshold in table. See the manuscript for details.
XLM-RoBERTa | Test >= 3 | Test >= 4 | Test >= 5 | Test >= 6 |
---|---|---|---|---|
Train >= 3 | 76.562% | 80.608% | 85.906% | 92.682% |
Train >= 4 | 75.937% | 79.467% | 89.261% | 97.560% |
Train >= 5 | 74.062% | 77.566% | 85.906% | 95.121% |
Train >= 6 | 63.437% | 68.061% | 83.892% | 92.682% |
Random Forrest | Test >= 3 | Test >= 4 | Test >= 5 | Test >= 6 |
Train >= 3 | 73.154% | 72.483% | 73.154% | 74.496% |
Train >= 4 | 71.812% | 71.812% | 72.483% | 72.483% |
Train >= 5 | 73.825% | 73.825% | 72.483% | 71.812% |
Train >= 6 | 73.825% | 73.154% | 73.154% | 73.825% |
Baseline* | 39.597% | 28.137% | 39.597% | 48.780% |
*proportion of most-common category in test dataset
\ | Test >= 3 | Test >= 4 | Test >= 5 | Test >= 6 |
---|---|---|---|---|
Train >= 3 | 79.375% | 82.706% | 90.506% | 95.454% |
Train >= 4 | 75.937% | 79.323% | 87.341% | 93.181% |
Train >= 5 | 79.375% | 82.331% | 92.405% | 100.00% |
Train >= 6 | 67.812% | 73.684% | 87.974% | 97.727% |
Baseline* | 39.597% | 28.137% | 39.597% | 48.780% |
*proportion of most-common category in test dataset
\ | Test >= 3 | Test >= 4 | Test >= 5 | Test >= 6 |
---|---|---|---|---|
Train >= 3 | 75.936% | 79.087% | 85.235% | 90.244% |
Train >= 4 | 73.125% | 77.566% | 83.221% | 90.244% |
Train >= 5 | 68.438% | 73.004% | 84.563% | 95.122% |
Train >= 6 | 59.063% | 64.638% | 81.208% | 95.122% |
Baseline* | 39.597% | 28.137% | 39.597% | 48.780% |
*proportion of most-common category in test dataset
Notebooks in folder analyses
contains reproducible evaluations of hypotheses introduced in the manuscript.
Each of them utilize our pre-trained NeuralClassifier
in order to identify reflectivity in the original, anonymized
reflective diaries.
Notebooks also demonstrate how the classifier can be used in practice for your own research.
If you have read the evaluation scripts and hypotheses notebooks, but still need a support with reproduction, or use of classifier for your own research, please create an issue, or contact us at stefanik.m@mail.muni.cz.
If you use or extend our results or software in your research, it would be great if you cite us as follows:
@Article{Nehyba2022applications,
author={Nehyba, Jan and {\v{S}}tef{\'a}nik, Michal},
title={Applications of deep language models for reflective writings},
journal={Education and Information Technologies},
year={2022},
month={Sep},
day={05},
issn={1573-7608},
doi={10.1007/s10639-022-11254-7},
url={https://doi.org/10.1007/s10639-022-11254-7}
}
If you use CEReD dataset, please use the following citation:
@misc{Stefanik2021CEReD,
title = {Czech and English Reflective Dataset ({CEReD})},
author = {{\v S}tef{\'a}nik, Michal and Nehyba, Jan},
url = {http://hdl.handle.net/11372/LRT-3573},
copyright = {Creative Commons - Attribution 4.0 International ({CC} {BY} 4.0)},
year = {2021}
}