question for training process
chiyuzhang94 opened this issue · comments
I have few questions of the training process. My task is "cls1&cls2&cls3" task. 1. For the classification task, the model uses pre-trained BERT to obtain a sentence representation of each input, how does this representation generate (how is it pooled)? 2. what is the loss function of classification task? 3. the loss using for backpropagation is the mean of these three classification losses, is it correct? 4. During the backpropagation, the model whether updates entire model (include BERT) or only the top layer.
- For the classification task, the model uses pre-trained BERT to obtain a sentence representation of each input, how does this representation generate (how is it pooled)?
A special token [CLS]
will be added before every input and it will be used as a sentence representation of classification tasks. For more details, please refer to the original paper.
- what is the loss function of classification task?
Cross Entropy.
- the loss using for backpropagation is the mean of these three classification losses, is it correct?
Sum of three losses.
- During the backpropagation, the model whether updates entire model (include BERT) or only the top layer.
By default, the training will update the whole model. But you can specify params.freeze_step
to freeze the BERT model for a specific steps.
Thanks very much!
Hi,
I have a question for training multitask on different data sources. If I have 'cls1|cls2' problems, should these two datasets include exactly same number of instances?
Nope. They would be sampled based on the number of training instances while training.
Thanks. By this way, I am wondering how to calculate the training epochs. For example, the size of DB1 is ten times than that of DB2, if I give same batch size, the model will be trained on DB2 ten times when the model finish the training on DB1 at first epoch. Is that correct? Could you please give any detail?
Yes. Your understanding is right. You can specify params. multitask_balance_type
to control sampling probability. For more details, you can take a look at here.
Hi @chiyuzhang94 and @JayYip , I have a question about the data format for classification task, I have two task, one is sentence cls, another is sentence-pair cls, from your experience, how can I prepare the data for training, evaluation and test, thanks a lot
Hi @OYE93, I think it is easy to import your data.
(train_data, train_labels), (test_data, test_labels) = keras.datasets.imdb.load_data(num_words=10000)
You change this list to fit your data. Each parameter of left side is a list.
Hope this help.
Hi @chiyuzhang94 thanks for your quick response, I will check the data format. Another question is about the training mechanism. As I mentioned before, I have two tasks (t1
, t2
) about Chinese text classification, how should I train this mt-dnn model, may I should input the two tasks, fine-tuning the model, then use the model for t1
prediction? Or use two tasks for fine-tuning first, then only use data for t1
to fine-tuning again, after that, used for t1
prediction, Could you share some experiences on this, Thanks.