Low-Resource Machine Translation through Retrieval-Augmented LLM Prompting: A Study on the Mambai Language
- Training data used for prompting compiled in
mambai_parallel_eng_mgm.csv
, with data coming from the Mambai Languaage Manual. Thesplit
column can betrain
ortest
- Additional test data compiled from a native Mambai speaker compiled in
test_leo.json
From the Mambai Language Manual, dictionaries have also been extracted through extract_mambai_dict.ipynb
and stored in eng_mgm.json
and mgm_eng.json
These dictionaries were used to mine bitext from the Mambai Language Manual through hunalign, see extract_mambai_parallel.ipynb
.
- Setup Python requirements:
python3 -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt
- Extract text from Mambai Language Guide: see
extract_mambai_parallel.ipynb
andextract_mambai_dict.ipynb
- Run translation experiments - see below
See notebook Mambai_prompt_for_MT.ipynb
. Tweak parameters in the config
dict to adjust hyper parameters.