base
huseinzol05 opened this issue · comments
HUSEIN ZOLKEPLI commented
HUSEIN ZOLKEPLI commented
Train using 2048 context length, 0.15 masking probability and 3.0 mean span length, does not converge, not sure why.
HUSEIN ZOLKEPLI commented
Going to use default settings, 512 context length and same batch size as nanoT5.
HUSEIN ZOLKEPLI commented
done