Google QUEST Q&A Labeling
Improving automated understanding of complex question answer content
In order to run the code install 'A lightweight python library that helps to keep track of numerical experiments'.
You can find competition data here.
Example of default bert-base training command from master
branch:
run.py --epochs=5 --max_sequence_length=500 --max_title_length=26 --max_question_length=260 --max_answer_length=210 --batch_accumulation=1 --batch_size=8 --warmup=300 --lr=1e-5 --bert_model=bert-base-uncased
Example of BART training command from bart
branch:
run.py --epochs=4 --max_sequence_length=500 --max_title_length=26 --max_question_length=260 --max_answer_length=210 --batch_accumulation=4 --batch_size=2 --warmup=250 --lr=2e-5 --bert_model=./bart.large
After you've added a pseudo labels set (we used a 100k subset from archive):
run.py --epochs=4 --max_sequence_length=500 --max_title_length=26 --max_question_length=260 --max_answer_length=210 --batch_accumulation=4 --batch_size=2 --warmup=250 --lr=2e-5 --bert_model=./bart.large --pseudo_file ../input/leak-free-pseudo-100k/pseudo-100k-4x-blend-no-leak-fold-{}.csv.gz --split_pseudo --leak_free_pseudo
In monty
branch you can find code for LM pretraining on stackexchange data
Read our solution and explanation here.
To be done.