k4black / uds-2024-nlp-for-lr

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

UdS 2024: Final project for NLP for Low-Resource Languages course

Data Augmentation limits in Low-Resource environment: A Case Study with Serbian

How Install

To run the training script, you need to have Python 3.12 and the required packages installed.

pip install -r requirements.txt

Additionally, you need to fill .env file with your Neptune.ai NEPTUNE_PROJECT and NEPTUNE_API_TOKEN to log the experiments.

Main Libraries used

  • transformers for obtaining the checkpoints, training loop and evaluation
  • datasets for loading the Super GLUE datasets
  • fast-aug - our custom library for random data augmentation - written on rust with python bindings
  • neptune for logging the experiments (runs available)

Run

To get all the available options, run:

python main.py --help

For example, to train the roberta-base model on the CB task with words substitution augmentation, run:

python main.py --task_name super_glue/cb --model_name roberta-base --augmentation words-sub

About


Languages

Language:Python 88.1%Language:Shell 11.9%