question for training process

Question

question for training process

chiyuzhang94 opened this issue 5 years ago · comments

I have few questions of the training process. My task is "cls1&cls2&cls3" task. 1. For the classification task, the model uses pre-trained BERT to obtain a sentence representation of each input, how does this representation generate (how is it pooled)? 2. what is the loss function of classification task? 3. the loss using for backpropagation is the mean of these three classification losses, is it correct? 4. During the backpropagation, the model whether updates entire model (include BERT) or only the top layer.

Jay Yip · Answer 1 · Wed Jul 10 2019 10:23:08 GMT+0800 (China Standard Time)

For the classification task, the model uses pre-trained BERT to obtain a sentence representation of each input, how does this representation generate (how is it pooled)?

A special token [CLS] will be added before every input and it will be used as a sentence representation of classification tasks. For more details, please refer to the original paper.

what is the loss function of classification task?

Cross Entropy.

the loss using for backpropagation is the mean of these three classification losses, is it correct?

Sum of three losses.

During the backpropagation, the model whether updates entire model (include BERT) or only the top layer.

By default, the training will update the whole model. But you can specify params.freeze_step to freeze the BERT model for a specific steps.

chiyu · Answer 2 · Thu Jul 11 2019 15:26:41 GMT+0800 (China Standard Time)

Thanks very much!

chiyu · Answer 3 · Sun Jul 14 2019 05:35:54 GMT+0800 (China Standard Time)

Hi,
I have a question for training multitask on different data sources. If I have 'cls1|cls2' problems, should these two datasets include exactly same number of instances?

Jay Yip · Answer 4 · Sun Jul 14 2019 14:40:20 GMT+0800 (China Standard Time)

Nope. They would be sampled based on the number of training instances while training.

chiyu · Answer 5 · Tue Jul 16 2019 06:48:15 GMT+0800 (China Standard Time)

Thanks. By this way, I am wondering how to calculate the training epochs. For example, the size of DB1 is ten times than that of DB2, if I give same batch size, the model will be trained on DB2 ten times when the model finish the training on DB1 at first epoch. Is that correct? Could you please give any detail?

Jay Yip · Answer 6 · Tue Jul 16 2019 11:24:31 GMT+0800 (China Standard Time)

Yes. Your understanding is right. You can specify params. multitask_balance_type to control sampling probability. For more details, you can take a look at here.

En Ouyang · Answer 7 · Fri Sep 27 2019 16:15:44 GMT+0800 (China Standard Time)

Hi @chiyuzhang94 and @JayYip , I have a question about the data format for classification task, I have two task, one is sentence cls, another is sentence-pair cls, from your experience, how can I prepare the data for training, evaluation and test, thanks a lot

chiyu · Answer 8 · Mon Sep 30 2019 05:10:46 GMT+0800 (China Standard Time)

Hi @OYE93, I think it is easy to import your data.

(train_data, train_labels), (test_data, test_labels) = keras.datasets.imdb.load_data(num_words=10000)
You change this list to fit your data. Each parameter of left side is a list.

Hope this help.

En Ouyang · Answer 9 · Mon Sep 30 2019 14:11:55 GMT+0800 (China Standard Time)

Hi @chiyuzhang94 thanks for your quick response, I will check the data format. Another question is about the training mechanism. As I mentioned before, I have two tasks (t1, t2) about Chinese text classification, how should I train this mt-dnn model, may I should input the two tasks, fine-tuning the model, then use the model for t1 prediction? Or use two tasks for fine-tuning first, then only use data for t1 to fine-tuning again, after that, used for t1 prediction, Could you share some experiences on this, Thanks.