hplt-project / OpusCleaner

OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.

Home Page:https://pypi.org/project/opuscleaner/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add instructions for running it without installing it

XapaJIaMnu opened this issue · comments

For easier filter development and in general for power users, one should be able to build the code and run it from the local directory, as opposed to installing it via pip.

Could you please add instructions to do that?

Is what is currently there not adequate?

OpusCleaner/README.md

Lines 33 to 43 in 36b0097

### Installation for development
```sh
python3 -m venv .env
bash --init-file .env/bin/activate
pip install -e .
cd frontend
npm clean-install
npm run build
cd ..
```

You need pip to install dependencies anyway, the -e will make sure that pip will just symlink the package to your source code. It will make commands like opuscleaner-server work but you can still mess around with the code.

If you want to avoid pip install -e . entirely I've just pushed 8e1d9c0 that should be the missing bit for using python3 -m opuscleaner.server to run the server from the project root directory. I've been using this method internally to run sampling and filters.

I get that you sometimes want to call python3 ../whatever/path/to/opuscleaner/server.py, but that's hard to implement because of how server.py imports files from elsewhere. I'd need to add some hacky import site; site.add_package(os.path.dirname(__file__)) to each script you could call directly. Incidentally, it does work if you do install opuscleaner with pip (with or without -e, doesn't matter)

If you just want to do filter development, you can actually already run OpusCleaner without checking out the source:

pip install opuscleaner

# Setup some starting directories
mkdir -p filters data/train-parts

tee filters/tail.json <<'EOF'
{
    "description": "Last N entries",
    "command": "tail -n $N",
    "parameters":
    {
        "N":
        {
            "type": "int"
        }
    },
    "type": "bilingual"
}
EOF

opuscleaner-server

This will show you all the installed filters, plus the ones from the local filters/ directory.