lucidrains / electra-pytorch

A simple and working implementation of Electra, the fastest way to pretrain language models from scratch, in Pytorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Custom Dataset

appledora opened this issue · comments

Trying to use this repo to train electra from scratch for Bangla. I have my dataset as a csv where each row is a document.
Would the default openwebtext/preprocess.py file would help here? Where else might I need to modify? Thanks!