tiru-patel / Medical_Text_Summarization_using_LLMs

Performance analysis of medical text summarization using fine-tuned T5, BART, Pegasus and GPT-3 and GPT-4 models.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About the Project

Due to the rapid expansion of medical literature, keeping pace with the latest research and clinical guidelines has become more challenging for healthcare pro- fessionals. To overcome this challenge, effective text summarization is crucial for improving access to knowledge, enhancing clinical decision-making, and ultimately benefiting patient outcomes. In this study, a medical text summarization system that employs large language models (LLMs) is fine-tuned and evaluated with the objective of generating precise, logical, and brief summaries of medical literature, emphasizing clinical relevance and ease of understanding. The projects focus on evaluating the performance of GPT3, GPT4 and the fine-tuned T5, BART and Pegasus model trained on the standard PubMed dataset using standard evaluation metrics.

Evaluation Metrics

The standard evaluation metrics used for summarization tasks include ROUGE scores, METEOR and Bert Scores and are used for evaluation purpose. The above mentioned metrics captures the structural and semantic similarity between the generated and reference summary. Other metrics also used for evaluation include BLEU score and SaceBLEU.

Results Obtained

From the result, it is observed that GPT-4 outperforms the summarization models in their niche tasks and even without fine-tuning. Medical summarization is very technical in nature and zero-shot prediction providing competitive results reaffirms belief in the generalized nature of their learning. The trends are similar to GPT-4 performance in medical competitive tests of UCMLE, thus again highlighting the complex data analysis capabilities of such LLMS. As far as the comparative study of this work is considered It has been proven that very large models which are zero-shot learners give comparable or better performance than the fine-tuned models. Further, it has been observed that smaller models are marginally poor than the comparatively very large models. So these can point to future work, which can analyze the computational cost, and accuracy analysis based on the number of parameters. Even further focus on comparing the models on these attributes by fine-tuning them with larger data sets. And thus inspecting the marginal utility of spending extra computational cost.

Files Structure

1. BART_BASE_Medical_Text_Summarization.ipynb: Fine-tuning of BART-Base Model
2. T5_BASE_Medical_Text_Summarization.ipynb: Fine-tuning of T5-Base Model
3. PEGASUS_LARGE_Medical_Text_Summarization.ipynb: Fine-tuning of Pegasus-Large Model
4. LLMs_evaluation.ipynb: Testing and Inferencing on the test set for Fine-tuned and GPT Models


Performance analysis of medical text summarization using fine-tuned T5, BART, Pegasus and GPT-3 and GPT-4 models.


Language:Jupyter Notebook 100.0%