🎯 Goal: Applying text mining techniques to make a literature review more productive.
🙋 Author: Inês G. Gonçalves
📅 Date: March 2021
☑️ Requisites:
- APIs (see the requests library)
- The BioC file type
- Text mining/Natural Language Processing (see the spaCy library)
- Libraries
numpy
,pandas
,matplotlib
,seaborn
,spacy
,requests
,biopython
,bioc
- Using the Entrez module form BioPython to automatically gather relevant articles on neuroblastoma from PubMed;
- Processing the articles with [PubTator]((https://www.ncbi.nlm.nih.gov/research/pubtator/api.html) to get data on species, dieseases, genes,...;
- Using the bioc package to parse the PubTator data (BioCJSON files);
- Transforming the data into a Pandas DataFrame and doing some data exploration and visualisation (Which cell line/gene is referenced the most; Are other diseases associated with neuroblastoma?);
- Extras: Using tools like Entrez/Cellosaurus to get additional data on genes, species and cell lines.