Feature-based approach with BERT for seq. labelling is super slow

Question

Feature-based approach with BERT for seq. labelling is super slow

kermitt2 opened this issue 4 years ago · comments

We are currently using keras-bert for the feature-based approach with BERT for seq. labelling and this is super slow: 56 tokens per second (using the concatenation of the 4 top four hidden layers of the pre-trained transformer, as in the original paper) - to be compared to ~300 tokens/s with ELMo and, more relevant, around 1000 tokens per second when using BERT fine-tuned model.

I think there is no reason to have something so slow when using the pre-trained transformer as compared to the fine-tuned model, so we should use our BERT integration too, rather than keras-bert for the feature-based approach (as bonus, it will remove this dependency).