OpenNMT-test

Before Starting

If you are working on a fresh installation install these first:

apt install git make python3-pip python3-virtualenv python3-pdfminer
Nvidia CUDA: https://developer.nvidia.com/cuda-zone (Do not use the packaged version nvidia-cuda-framework as it is outdated)

MEB dataset (training-material/meb/) was created as follows:

Downloaded PDF document and named Finnish and Swedish versions accordingly. Filenames after this step: 1-fi.pdf, 1-sv.pdf, 2-fi.pdf, 2-sv.pdf, ...
Executing meb-pdf-to-text.py which extracts text from PDF files, removes unwanted characters and tries to format one sentence into single line. Filenames after this step: 1-fi.raw-txt, 1-sv.raw-txt, 2-fi.raw-txt, 2-sv.raw-txt, ...
Going throw the raw text files and making sure the lines in the Finnish and Swedish have the same meaning. This is manual work and most time-consuming part of the process. Filenames after this step: 1-fi.txt, 1-sv.txt, 2-fi.txt, 2-sv.txt, ...