rbiswasfc / kaggle-nbme-3rd-place-solution

3rd Place solution for NBME - Score Clinical Patient Notes Kaggle competiiton

Home Page:https://www.kaggle.com/competitions/nbme-score-clinical-patient-notes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Please refer to the following documentation to reproduce my solution for the NBME - Score Clinical Patient Notes competition.

If you run into any trouble with the setup/code or have any questions please contact me at saun.walker.150892@gmail.com

HARDWARE

Colab Pro + (High RAM + GPU)

The following specs were used to create the original solution Ubuntu 18.04.5 LTS (Bionic Beaver) with 200GB Disk 8 vCPUs, 56 GB memory 1 x NVIDIA Tesla P100

SOFTWARE

python packages are detailed separately in requirements.txt Python 3.7 CUDA 11.2

It is assumed that the Kaggle API is installed.

Please execute the following command from top level directory i.e. folder containing this file

python convert_deberta_v2_v3_tokenizer.py --python_path <path_to_python_env>

where path_to_python_env is path to folder containing site-packages folder e.g. /Users/rajabiswas/opt/anaconda3/envs/nbme_env/lib/python3.7/. This will convert slow tokenizer to fast tokenizer from DeBERTa V2/V3 models.

MODEL BUILD:

There are two options to produce the solution.

  1. ordinary prediction a) uses binary model in prod-models folder (~8 hours)
  2. retrain models a) expect this to run around two weeks b) trains all models from scratch c) follow this with (1) to produce entire solution from scratch

For option 1:

Please follow the 5 steps detailed in # Section B: NBME Predictions of entry_points.md (Overwrites files in the outputs folder)

For option 2:

Please follow the 5 steps detailed in # Section A: NBME Training of entry_points.md (Overwrites files in the prod-models folder)

About

3rd Place solution for NBME - Score Clinical Patient Notes Kaggle competiiton

https://www.kaggle.com/competitions/nbme-score-clinical-patient-notes


Languages

Language:Python 100.0%