JayYip / m3tl

BERT for Multitask Learning

Home Page:https://jayyip.github.io/m3tl/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

question for training process

chiyuzhang94 opened this issue · comments

commented

I have few questions of the training process. My task is "cls1&cls2&cls3" task. 1. For the classification task, the model uses pre-trained BERT to obtain a sentence representation of each input, how does this representation generate (how is it pooled)? 2. what is the loss function of classification task? 3. the loss using for backpropagation is the mean of these three classification losses, is it correct? 4. During the backpropagation, the model whether updates entire model (include BERT) or only the top layer.

  1. For the classification task, the model uses pre-trained BERT to obtain a sentence representation of each input, how does this representation generate (how is it pooled)?

A special token [CLS] will be added before every input and it will be used as a sentence representation of classification tasks. For more details, please refer to the original paper.

  1. what is the loss function of classification task?

Cross Entropy.

  1. the loss using for backpropagation is the mean of these three classification losses, is it correct?

Sum of three losses.

  1. During the backpropagation, the model whether updates entire model (include BERT) or only the top layer.

By default, the training will update the whole model. But you can specify params.freeze_step to freeze the BERT model for a specific steps.

commented

Thanks very much!

commented

Hi,
I have a question for training multitask on different data sources. If I have 'cls1|cls2' problems, should these two datasets include exactly same number of instances?

Nope. They would be sampled based on the number of training instances while training.

commented

Thanks. By this way, I am wondering how to calculate the training epochs. For example, the size of DB1 is ten times than that of DB2, if I give same batch size, the model will be trained on DB2 ten times when the model finish the training on DB1 at first epoch. Is that correct? Could you please give any detail?

Yes. Your understanding is right. You can specify params. multitask_balance_type to control sampling probability. For more details, you can take a look at here.

Hi @chiyuzhang94 and @JayYip , I have a question about the data format for classification task, I have two task, one is sentence cls, another is sentence-pair cls, from your experience, how can I prepare the data for training, evaluation and test, thanks a lot

commented

Hi @OYE93, I think it is easy to import your data.

(train_data, train_labels), (test_data, test_labels) = keras.datasets.imdb.load_data(num_words=10000)
You change this list to fit your data. Each parameter of left side is a list.

Hope this help.

Hi @chiyuzhang94 thanks for your quick response, I will check the data format. Another question is about the training mechanism. As I mentioned before, I have two tasks (t1, t2) about Chinese text classification, how should I train this mt-dnn model, may I should input the two tasks, fine-tuning the model, then use the model for t1 prediction? Or use two tasks for fine-tuning first, then only use data for t1 to fine-tuning again, after that, used for t1 prediction, Could you share some experiences on this, Thanks.