clips / fewshot-biomedical-names

Code for the BioNLP 2021 paper "Scalable Few-Shot Learning of Robust Biomedical Name Representations"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Results

ashishkishor opened this issue · comments

Using ICD-10
I used FastText embedding and 15 shots and I didn't got exactly as same result as you mentioned in the paper as you can see in the screenshot
15 shots_icd10
5 shots results for UMNSRS and EHR-RelB are better as you can in following screenshot
5 shots_icd10

And you Didn't mentioned in the paper how to use BNE and BioBert pretrainded Embedding and how to do continual
learning from SNOMED-CT to ICD-10
I want to know the instructions to see the results that is mentioned in your pape

Hi Ashish,

As the paper says on page 26 of the BioNLP proceedings, "We average all test results over 5 different random training samples." The main.py script we provide as a demo performs one run over a single random training sample. You can loop over this and average the scores and see if the trends hold.

With regards to the representations:

"As a second baseline, we average the 728-dimensional context-specific token activations of a name extracted from the publicly released BioBERT model (Lee et al., 2019)."

"As state-of-the-art reference, we extract 200- dimensional name representations using the publicly released pretrained BNE model with skipgram word embeddings, BNE + SGw, which was trained on approximately 16K synonym sets of disease concepts in the UMLS, containing 156K disease names." https://github.com/minhcp/BNE

With regards to the continual learning:

"Lastly, we also look at continual learning from SNOMED-CT to ICD-10 (S → I) or vice versa (I → S), where we use the out- put of the first model as input representations to train the second model."

Please read the paper more thoroughly.