tks0123456789 / quest_qa_labeling

Google QUEST Q&A Labeling. Improving automated understanding of complex question answer content

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Google QUEST Q&A Labeling

Improving automated understanding of complex question answer content

In order to run the code install 'A lightweight python library that helps to keep track of numerical experiments'.
You can find competition data here.

Example of default bert-base training command from master branch:

run.py --epochs=5 --max_sequence_length=500 --max_title_length=26 --max_question_length=260 --max_answer_length=210 --batch_accumulation=1 --batch_size=8 --warmup=300 --lr=1e-5 --bert_model=bert-base-uncased

Example of BART training command from bart branch:

run.py --epochs=4 --max_sequence_length=500 --max_title_length=26 --max_question_length=260 --max_answer_length=210 --batch_accumulation=4 --batch_size=2 --warmup=250 --lr=2e-5 --bert_model=./bart.large

After you've added a pseudo labels set (we used a 100k subset from archive):

run.py --epochs=4 --max_sequence_length=500 --max_title_length=26 --max_question_length=260 --max_answer_length=210 --batch_accumulation=4 --batch_size=2 --warmup=250 --lr=2e-5 --bert_model=./bart.large --pseudo_file ../input/leak-free-pseudo-100k/pseudo-100k-4x-blend-no-leak-fold-{}.csv.gz --split_pseudo --leak_free_pseudo

In monty branch you can find code for LM pretraining on stackexchange data

Read our solution and explanation here.
To be done.

About

Google QUEST Q&A Labeling. Improving automated understanding of complex question answer content


Languages

Language:Python 100.0%