laurieburchell / open-lid-dataset

Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Could you open your specific training parameters for fasttext?

karrynest opened this issue · comments

Could you open your specific training parameters so that novices can reproduce your outstanding work? e.g. lr, epoch, wordNgrams, bucket, dim, loss. Thanks so much!

FYI any fasttext model has the training parameters saved. You can have it like this:

fasttext dump lid.218a.bin args
dim 256
ws 5
epoch 2
minCount 1000
neg 5
wordNgrams 1
loss softmax
model sup
bucket 1000000
minn 2
maxn 5
lrUpdateRate 100
t 0.0001

Thanks Jaume! The specific training parameters are in the paper. I will add the command I ran to train the model to the repo and make a link to access the paper more obvious.