Documentation of BERT command line options

Question

Documentation of BERT command line options

tomsbergmanis opened this issue 3 years ago · comments

Marian command line options include:
--bert-mask-symbol TEXT=[MASK] Masking symbol for BERT masked-LM training
--bert-sep-symbol TEXT=[SEP] Sentence separator symbol for BERT next sentence prediction training
--bert-class-symbol TEXT=[CLS] Class symbol BERT classifier training
--bert-masking-fraction FLOAT=0.15 Fraction of masked out tokens during training
--bert-train-type-embeddings=true Train bert type embeddings, set to false to use static sinusoidal embeddings
--bert-type-vocab-size INT=2 Size of BERT type vocab (sentence A and B)
Yet I can not find example of how and for what these can be used.
It would be good to have simple use case example on how to use these to do BERT pretraining.

Toms Bergmanis · Answer 1 · Thu Oct 21 2021 20:51:17 GMT+0800 (China Standard Time)

Even a code example in a comment below would be really good!

Roman Grundkiewicz · Answer 2 · Wed Oct 27 2021 18:34:59 GMT+0800 (China Standard Time)

Hi Toms, I've never used it and don't know what it actually does. Pinging @emjotde, who may know. It might be that it was an experimental feature.

Kevin Brubeck Unhammer · Answer 3 · Thu May 16 2024 16:06:17 GMT+0800 (China Standard Time)

@tomsbergmanis Did you ever figure out how to use these?

(I have need of a cpu-decoding classifier for Norwegian text and marian seems promising
cf. https://groups.google.com/g/marian-nmt/c/iVq-jGa3N8M but I don't know if it's possible to use it that way)

Toms Bergmanis · Answer 4 · Thu May 16 2024 16:26:15 GMT+0800 (China Standard Time)

Nop. Still no idea.

…

On Thu, May 16, 2024 at 11:06 AM Kevin Brubeck Unhammer < ***@***.***> wrote: @tomsbergmanis <https://github.com/tomsbergmanis> Did you ever figure out how to use these? (I have need of a cpu-decoding classifier for Norwegian text and marian seems promising cf. https://groups.google.com/g/marian-nmt/c/iVq-jGa3N8M but I don't know if it's possible to use it that way) — Reply to this email directly, view it on GitHub <#885 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZHLNNB4EEZYBXPXA2ILR3ZCRSI7AVCNFSM5GLIYO22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMJRGQZTMOBRGY3Q> . You are receiving this because you were mentioned.Message ID: ***@***.***>