STIF-Indonesia
Implementation of "Semi-Supervised Low-Resource Style Transfer of Indonesian Informal to Formal Language with Iterative Forward-Translation".
We change the data where it is different than the data published in the paper. We expect you to find a different result.
To be denounced, please wait!
Paper
Semi-Supervised Low-Resource Style Transfer of Indonesian Informal to Formal Language with Iterative Forward-Translation (IALP 2020)
Requirements
we use the Ubuntu 17.04+ Moses which only works on the specified OS.
If you use other moses, please change the scripts/download_moses.sh
curl http://www.statmt.org/moses/RELEASE-4.0/binaries/ubuntu-17.04.tgz -o moses.tgz
to
curl [OTHER MOSES URL] -o moses.tgz
In this experiment, we wrap the MOSES code by using Python's subprocess
. So a python installation is necessary. The system is tested on Python 3.9. We recommend it to install with miniconda
. You can install it by following this link: https://docs.conda.io/en/latest/miniconda.html
How To Run
First, clone the repository
git clone https://github.com/haryoa/stif-indonesia.git
Then run the MOSES downloader. We use .sh, so use a CLI applications that can execute it. On the root project folder directory, do:
bash scripts/download_moses.sh
The script will download the moses toolkit and extract it by itself.
Run Supervised Experiments
To run the supervised one, do:
python -m stif_indonesia --exp-scenario supervised
It will read the experiment config in experiment-config/00001_default_supervised_config.json
Run Semi-Supervised Experiments
To run the semi-supervised one, do:
python -m stif_indonesia --exp-scenario semi-supervised
It will read the experiment config in experiment-config/00002_default_semi_supervised_config.json
Output
- The training process will output the log of the experiment in
log.log
- The output of the model will be produced in
output
folder
Supervised output
It will output evaluation
, lm
, and train
. evaluation
is the result of prediction on the test set, lm
is the output of the trained LM, and train
is the produced model by the moses toolkit
Semi supervised output
It will output agg_data
, best_model_dir
, and produced_tgt_data
. agg_data
is the result of the forward-iteration data synthesis. best_model_dir
is the best model produced by the training process, and produced_tgt_data
is the prediction output of the test set.
Score
Please check the log.log
file which is the output of the process.
TODO Write
- Link to arxiv + short description
- Acknowledgement
- Team