MatthewJansen / Medical-Abstract-Segmentation

A Natural Language Processing (NLP) model with TensorFlow to segment text lines of abstracts from medical research papers in order to improve readability.

Home Page:https://www.kaggle.com/code/matthewjansen/nlp-medical-abstract-segmentation/notebook

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Medical Abstract Segmentation

Medical-Abstract-Segmentation

Overview

Abstracts from medical research papers can be challenging to read at a glance as they contain complex wording, densely represented in a single paragraph. What if there was a way to segment these abstracts so that they become optimized for speed reading (skimmable)?

The purpose of this notebook is to explore building a Natural Language Processing (NLP) model with TensorFlow to segment text lines of abstracts from medical research papers in order to improve the readability of these said abstracts while maintaining a compute efficiency & implementation complexity constraint (CPU-only and simple implementation).

The dataset used to train the NLP model is based on a paper titled "PubMed 20k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts", published in October 2017.

The NLP model architecture used in this notebook is inspired by this paper titled "Neural Networks for Joint Sentence Classification in Medical Paper Abstracts" (also mentioned in the dataset paper), published in December 2016. Note that the model implemented in this notebook aims to reproduce similar results as seen in the aforementioned paper.

Dataset: PubMed 200k RCT

The dataset is available on GitHub, see the link attached below to access the dataset.

GitHub Source: https://github.com/Franck-Dernoncourt/pubmed-rct

Using the Kaggle version of the dataset

I have uploaded the dataset here on Kaggle to make it more accessible for notebook usage.

Here's the link to the Kaggle dataset: PubMed 200k RCT

Note that this version includes .csv versions of the original dataset.

Requirements

  • Recommended python version: Python +3.8.10 64-bit

[Note: This section will be updated in due course.]

Project Structure

.
├── LICENSE
├── README.md
└── nlp_medical_abstract_segmentation.ipynb
  • LICENSE | project license (MIT)
  • README.md | project readme file
  • nlp_medical_abstract_segmentation.ipynb | project notebook

Usage

See nlp_medical_abstract_segmentation.ipynb

License

This project is licensed under the terms and conditions of the MIT license.

About

A Natural Language Processing (NLP) model with TensorFlow to segment text lines of abstracts from medical research papers in order to improve readability.

https://www.kaggle.com/code/matthewjansen/nlp-medical-abstract-segmentation/notebook

License:MIT License


Languages

Language:Jupyter Notebook 100.0%