ermahechap / PubMed-Sum

Pubmed articles summarization project for NLP course

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PubMed-Sum

Pubmed articles summarization project for the 2020 NLP graduate course from the National University of Colombia.

By: Edwin Mahecha & Jimmmy Pulido

Notebooks

The project comprises 3 notebooks:

  1. xml_data_preprocessing.ipynb: Used to connect wit OA/OAI PubMed API. It also performs a basic XML preprocessing to remove the parts of the articles that are not relevant for text analysis such as diagrams, pictures, etc.
  2. litcovid_data_preprocessing.ipynb: Similar to the notebook above, but instead it process the text database provided by PubMed regarding the COVID-19 emergency (LitCovid).
  3. summarization.ipynb: Performs summarization using the Google T5 model available in HugginFace.

Additional Documents

We provide a paper (which is more like a technical document that summarizes the project scope) and a set of slides. Both are in spanish.

About

Pubmed articles summarization project for NLP course


Languages

Language:Jupyter Notebook 100.0%