Cost is nan When adding guided alignment
mahmoudaymo opened this issue · comments
Bug description
I have trained a model for 5 epochs without guided alignment. Then I trained for 5 epochs more with guided alignment. When training without guided alignment everything went fine. However, when adding the guided alignment (the second 5 epochs) the training cost is nan in every update.
How to reproduce
Describe steps or include command to reproduce the behavior.
I have run this script:
`#!/bin/bash
set -e
exp_dir=path_to_experiment_dir
exp=$exp_dir/basemodel
config=$exp/config.yml
/marian/build/marian -c $config
--valid-log $exp/valid.log
--log $exp/train.log
--model $exp/model.npz
--after 5e
exp=$exp_dir/finetuned
config=$exp/config.yml # This config is similar to the above except I unset --all-caps-every and --english-title-case-every params
/marian/build/marian -c $config
--pretrained-model $pretrained_model_path
--valid-log $exp/valid.log
--log $exp/train.log
--model $exp/model.npz
--after 10e
--guided-alignment /Engines/MAS/ENUSDEDE/alignment/corpus.align
--guided-alignment-cost ce`
marian.logs.txt
Context
- Marian version: 1.12.0
- CMake command: Type the cmake command you used and attach the output of
--build-info all
- Log file: Attach your training/decoding logs
Add any other information about the problem here.
We are experiencing this issue, too, even when training with alignment from the start. Could it be related to the guided-alignment-cost? We used to use mse and then changed to ce when mse was no longer supported. The issue started after that for us.
It also means that to restart training in a directory you need to edit the cost in the model.npz.progress.yml
or it throws an error