marian-nmt / marian

Fast Neural Machine Translation in C++

Home Page:https://marian-nmt.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cost is nan When adding guided alignment

mahmoudaymo opened this issue · comments

Bug description

I have trained a model for 5 epochs without guided alignment. Then I trained for 5 epochs more with guided alignment. When training without guided alignment everything went fine. However, when adding the guided alignment (the second 5 epochs) the training cost is nan in every update.

How to reproduce

Describe steps or include command to reproduce the behavior.
I have run this script:

`#!/bin/bash

set -e

exp_dir=path_to_experiment_dir

exp=$exp_dir/basemodel
config=$exp/config.yml

/marian/build/marian -c $config
--valid-log $exp/valid.log
--log $exp/train.log
--model $exp/model.npz
--after 5e

exp=$exp_dir/finetuned
config=$exp/config.yml # This config is similar to the above except I unset --all-caps-every and --english-title-case-every params

/marian/build/marian -c $config
--pretrained-model $pretrained_model_path
--valid-log $exp/valid.log
--log $exp/train.log
--model $exp/model.npz
--after 10e
--guided-alignment /Engines/MAS/ENUSDEDE/alignment/corpus.align
--guided-alignment-cost ce`
marian.logs.txt

Context

  • Marian version: 1.12.0
  • CMake command: Type the cmake command you used and attach the output of --build-info all
  • Log file: Attach your training/decoding logs

Add any other information about the problem here.

We are experiencing this issue, too, even when training with alignment from the start. Could it be related to the guided-alignment-cost? We used to use mse and then changed to ce when mse was no longer supported. The issue started after that for us.
It also means that to restart training in a directory you need to edit the cost in the model.npz.progress.yml or it throws an error