TPU-Ready TF 2.1 Solution to Google QUEST Q&A Labeling using Siamese RoBERTa Encoder Model

The 5-fold models can be trained in about an hour using Colab TPU. The model performance after post-processing the predictions (to optimize the Spearman correlation to the target):

This is at around 65th place on the private leaderboard. The post-processing (which unfortunately I did not use in the competition) gives an almost 0.03 score boost.

Inference Kernel on Kaggle

Train on Colab TPU

The Notebook used the generate the above submission is on Github Gist, and can be opened in Colab.

Preparation

Build the wheels

Run this command in the project root director and in the tf-helper-bot subdirectory:

python setup.py sdist bdist_wheel

And upload the .whl files in the dist directory to Google Cloud Storage.

Create the TFRecord files

Run this command and then upload the content in cache/tfrecords to Google Cloud Storage:

python -m quest.prepare_tfrecords --model-name roberta-base -n-folds 5

(Note: check requirements.txt for missing dependencies.)

Acknowledgements

Some of the TPU resources used in the project is generously sponsored by TensorFlow Research Cloud.

About

TPU-Ready TF 2.1 Solution to Google QUEST Q&A Labeling using Siamese RoBERTa Encoder Model

MIT License

Languages

Language:Python 100.0%