marian-nmt / marian-dev

Fast Neural Machine Translation in C++ - development repository

Home Page:https://marian-nmt.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Per-factor embedding dimensions when concatenating

eltorre opened this issue · comments

Feature description

Right now, factors-dim-emb takes a single INT. Then, in Layers::Embedding creates a matrix where every embedding has the same dimension:

      FactorEmbMatrix_
          = graph_->param("factor_" + name, {numberOfFactors, dimFactorEmb}, initFunc, fixed);

Then, embedWithConcat (and maybe data::factored_vocab?) take this into account.

I feel like this is not too good when dealing with factors with very different vocab sizes, for example capitalization of a word (vocab size 3) vs word inflection (vocab size ~100 for some languages). This forces either a too small embedding for the second factor, or a too large embedding for the first, which seems wasteful.

Example

factors-dim-emb should behave like dim-vocabs when --factors-combine=concat

Comments

This seems easy enough to implement
Famous last words

I'd appreciate if somebody with a good knowledge of the codebase would gauge the size of the footgun beforehand.