Abstracts from medical research papers can be challenging to read at a glance as they contain complex wording, densely represented in a single paragraph. What if there was a way to segment these abstracts so that they become optimized for speed reading (skimmable)?
The purpose of this notebook is to explore building a Natural Language Processing (NLP) model with TensorFlow to segment text lines of abstracts from medical research papers in order to improve the readability of these said abstracts while maintaining a compute efficiency & implementation complexity constraint (CPU-only and simple implementation).
The dataset used to train the NLP model is based on a paper titled "PubMed 20k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts", published in October 2017.
The NLP model architecture used in this notebook is inspired by this paper titled "Neural Networks for Joint Sentence Classification in Medical Paper Abstracts" (also mentioned in the dataset paper), published in December 2016. Note that the model implemented in this notebook aims to reproduce similar results as seen in the aforementioned paper.
- Dataset paper: PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts
- Model architecture paper: Neural Networks for Joint Sentence Classification in Medical Paper Abstracts
The dataset is available on GitHub, see the link attached below to access the dataset.
GitHub Source: https://github.com/Franck-Dernoncourt/pubmed-rct
I have uploaded the dataset here on Kaggle to make it more accessible for notebook usage.
Here's the link to the Kaggle dataset: PubMed 200k RCT
Note that this version includes .csv
versions of the original dataset.
- Recommended python version: Python +3.8.10 64-bit
[Note: This section will be updated in due course.]
.
├── LICENSE
├── README.md
└── nlp_medical_abstract_segmentation.ipynb
- LICENSE | project license (MIT)
- README.md | project readme file
- nlp_medical_abstract_segmentation.ipynb | project notebook
See nlp_medical_abstract_segmentation.ipynb
This project is licensed under the terms and conditions of the MIT license.