Xiaowen-JI / Semi-automation-of-systematic-review-of-clinical-trials-in-medical-psychology-with-BERT-models

We employed pre-trained BERT models (distillBERT, BioBert, and SciBert) for text-classifications of the titles and abstracts of clinical trials in medical psychology. The average score of AUC is 0.92. A stacked model was then built by featuring the probability predicted by distillBERT and keywords of search domains. The AUC improved to 0.96 with F1, precision, and recall increasing to 0.95, 0.94, and 0.96 respectively. Training sample size of 100 results in the most cost-effective performance.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Methodology

We used Bio.Entrez package of Python 3 to query , search and fetch the metainformations of the RCT studies in PubMed (search period from 2010 to 2020 February; Protocol of the systematic review has been published https://www.sciencedirect.com/science/article/abs/pii/S1087079221000307). The three BERT models of distillBERT, BioBERT and SciBERT are used to classify the title and abstract via Pytorch. We manually labelled the text by reading abstract. After diagnosing the wrong predictions, a stacked model was built by featuring the probability predicted by distillBERT and keywords of the search domains (complementary and alternative medicine). For the studies labelled as 1 (positive) based on the abstract, their full texts in PDF format were fetched from PubMed Central when available. Haystack question-answering pipeline(https://github.com/deepset-ai/haystack/#tutorials) was then fine-tunned and applied to the preprocessed full text to extract key information for further article screening.

pipeline

flowchart

Stacked Model Design (by Salash)

About

We employed pre-trained BERT models (distillBERT, BioBert, and SciBert) for text-classifications of the titles and abstracts of clinical trials in medical psychology. The average score of AUC is 0.92. A stacked model was then built by featuring the probability predicted by distillBERT and keywords of search domains. The AUC improved to 0.96 with F1, precision, and recall increasing to 0.95, 0.94, and 0.96 respectively. Training sample size of 100 results in the most cost-effective performance.


Languages

Language:Jupyter Notebook 100.0%