Hello，I have some questions about tokenization and embedding

Question

Hello，I have some questions about tokenization and embedding

DeerEyre opened this issue 6 months ago · comments

Hello, I am doing experiments on XLM in Cantonese-Mandarin translation. I would like to ask you how the char-based pivot-private-embedding method you used in the paper was trained to get the embedding. It is fastext. .
I was reading Ka Ming Wong and Richard Tzong-Han Tsai.
2022. Mixed embedding of xlm for unsupervised
cantonese-chinese neural machine translation After this paper, it is still not clear what method the author uses to obtain embedding. Is it using char-based BPE or loading char-based pre-trained fastext?
If you noticed this problem, I would really appreciate your help