GeneStat / Literature-Review

Literature review repository of GeneStat team.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Literature Review

This repository is licensed under CC-BY-SA-4.0.

Motivation

We started this repository to keep tracking related cutting-edge papers to find out the topic(s) and scientific questions to work on.

There are too many things worth investigating, nearly everywhere and anytime. We need to keep our curiosity and keep thinking about why and how to answer the why. Every question could be a good question, and there is no stupid question. What affects our decision is only about the strategies or methods for answering the questions.

Therefore, for our (virtual) team, the best question should be like this:

  1. We like the field, and we believe in the value of answering the question.
  2. We can get the data (public data, or new data we could generate by ourselves) that are required to answer the question.
  3. The research procedure could be done by using our skills or experience (programming and statistics, so far).

Paper List

Here goes a continuous updating paper list. We select at least ten papers to study every month. Unless it is crucial and necessary, only open access papers are selected and added to this list. Suggestion reasons and related comments are provided following each paper.

December 2021

  1. Abdelkader, W. et al. A Deep Learning Approach to Refine the Identification of High-Quality Clinical Research Articles From the Biomedical Literature: Protocol for Algorithm Development and Validation. JMIR Res. Protoc. 10, e29398 (2021). doi: 10.2196/29398

    This paper is added for Linlin's personal interest. There are so many new papers published every day. Sooner or later, we need AI to help us to select papers, or even read and understand the papers.

  2. Beck, T. et al. Auto-CORPus: A Natural Language Processing Tool for Standardising and Reusing Biomedical Literature. bioRxiv 2021.01.08.425887 (2021) doi: 10.1101/2021.01.08.425887

    This paper is also about NLP technology application on biomedical papers.

  3. Davies, A. et al. Advancing mathematics by guiding human intuition with AI. Nature 600, 70–74 (2021). doi: 10.1038/s41586-021-04086-x

    I believe that in a not-so-far future, most even creative scientific research work, including in the biomedicine field, could be performed by AI.

  4. Murray, B. et al. Accessible data curation and analytics for international-scale citizen science datasets. Sci. Data 8, 297 (2021). doi: 10.1038/s41597-021-01071-x

    This paper talks about manipulating a massive amount of TB-scale public data. The authors developed a software tool to fulfill the requirement.

  5. HT, Y. et al. Literature-based discovery of new candidates for drug repurposing. Brief. Bioinform. 18, 488–497 (2017). doi: 10.1093/bib/bbw030

    This example shows how to discover new usage of a drug by mining knowledge from literature papers. It could be one of our research projects in the future since literature is one of the most easy-to-access and valuable data sources.

  6. Chopard, D. et al. Text Mining of Adverse Events in Clinical Trials: Deep Learning Approach. JMIR Med. Informatics 9, e28632 (2021). doi: 10.2196/28632

    Adverse events (AE) are very important in clinical trials for drug development. This paper provides an example of AE research by text mining. In addition, we can learn knowledge about clinical trials in this research.

  7. Skidmore, Z. L. et al. Genomic and transcriptomic somatic alterations of hepatocellular carcinoma in non-cirrhotic livers. bioRxiv 2021.12.14.472689 (2021) doi: 10.1101/2021.12.14.472689

    This paper is typical genomics and transcriptomics landscape research. We can learn the basic protocols of investigation such -omics data.

  8. Löchel, H. F. et al. Fractal construction of constrained code words for DNA storage systems. Nucleic Acids Res. (2021) doi: 10.1093/nar/gkab1209

    Using DNA as a new material for data storage, especially for long-term persistence, is a cutting-edge field. There are algorithms for data encoding/decoding for DNA sequences. This paper is one of them.

  9. Shao, D. et al. Artificial intelligence in clinical research of cancers. Brief. Bioinform. (2021) doi: 10.1093/bib/bbab523.

    This is a review that introduces how AI techniques are used in medical-related research. From this paper, we can learn what kinds of data are available and what tools and protocols are proper to use on the data.

  10. Liu, S.-Y. et al. Genomic signatures define three subtypes of EGFR-mutant stage II-III non-small-cell lung cancer with distinct adjuvant therapy outcomes. Nat. Commun. 12, 6450 (2021). doi: 10.1038/s41467-021-26806-7

    Genomic signature is usually a numeric indicator calculated by DNA sequencing results, which could distinguish different groups of patients. Such information is the key to providing new laboratory tests for precision clinical practice. This paper is a typical procedure of genomic signature creation for lung cancer.

November 2021

  1. Masum, H. et al. Ten Simple Rules for Cultivating Open Science and Collaborative R&D. PLoS Comput. Biol. 9, 7–10 (2013). doi: 10.1371/journal.pcbi.1003244

    Since our goal is to do scientific research openly and transparently, this paper should be a proper kick-off for this journey.

  2. Rollin, G. et al. Wikipedia network analysis of cancer interactions and world influence. PLoS One. 14, e0222508 (2019). doi: 10.1371/journal.pone.0222508

    Wikipedia provides an easy-to-access and large data source for us to know a new field quickly and mining for knowledge. This paper is an example of how to launch an analysis of Wikipedia data.

  3. Pantziarka, P. et al. An Open Access Database of Licensed Cancer Drugs. Front. Pharmacol. 0, 236 (2021). doi: 10.3389/fphar.2021.627574

    For public data mining, we usually generate an open-access database as the final product. The database could elevate other investigators' research.

  4. Kampers, L. F. C. et al. From Innovation to Application: Bridging the Valley of Death in Industrial Biotechnology. Trends Biotechnol. 39, 1240–1242 (2021). doi: 10.1016/j.tibtech.2021.04.010

    Although this is not even a research article paper, I added it here because I think it is crucial for us, a loosely connected virtual team. We want to do something innovative as well as in an open way. Sooner or later, financial supporting will be a critical problem, which could affect some of our decisions. So, keep caution and keep thinking about what value we could create and transfer, and then we should go further.

  5. Kleanthous, S. et al. Perception of fairness in algorithmic decisions: Future developers’ perspective. Patterns. 0, 100380 (2021). doi: 10.1016/j.patter.2021.100380

    In this beginning month of our literature review journey, I tend to add some papers which may be helpful for us to build up our principles and core values. This research article is such an interesting example of judging algorithms in a scientific research way. Keeping being good but not evil is not an easy thing.

  6. Paullada, A. et al. Data and its (dis)contents: A survey of dataset development and use in machine learning research. Patterns. 2, 100336 (2021). doi: 10.1016/j.patter.2021.100336

    Collecting and tidying up data is the first thing to do before we try statistical methods. This review provides information about dataset development and applications.

  7. The Cancer Genome Atlas Research Network et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nature Genetics. 45, 1113-1120 (2013). doi: 10.1038/ng.2764

    Since we are all interested in tumor-related research, TCGA, the largest public tumor data source is worth learning carefully. This paper is one of the earliest official papers about TCGA.

  8. Ozhan, A. et al. SmulTCan: A Shiny application for multivariable survival analysis of TCGA data with gene sets. Comput. Biol. Med. 137, 104793 (2021). doi: 10.1016/j.compbiomed.2021.104793

    This is a typical example of software development based on TCGA data.

  9. Jia, D. et al. LINC02678 as a Novel Prognostic Marker Promotes Aggressive Non-small-cell Lung Cancer. Front Cell Dev Biol. 9, 686975 (2021). doi: 10.3389/fcell.2021.686975

    This is a typical example of scientific research (biomarker discovery) project based on TCGA and GEO (Gene Expression Omnibus) data.

  10. Fuhrman J. et al. A review of explainable and interpretable AI with applications in COVID-19 imaging. Med Phys. 2021 Nov 18. Online ahead of print. doi: 10.1002/mp.15359

    We mentioned image processing for diagnosis. This review paper could provide some related knowledge.

  11. Lee, C. M. et al. UCSC Genome Browser enters 20th year. Nucleic Acids Res. 48, D756–D761 (2020). doi: 10.1038/ng.2764

    This is a long-term (20 years) software project, and it is one of the currently the most widely used websites. It integrates multi-omics data (annotations) in the order of genomic coordinates and provides many useful command-line tools. This is not the original paper of the software (UCSC genome browser), but an introduction paper after this long time.

  12. Song, L. et al. CINdex: A Bioconductor Package for Analysis of Chromosome Instability in DNA Copy Number Data. Cancer Inform. 16, (2017). doi: 10.1177/1176935117746637

    This paper is about a R package, which provide an algorithm of calculating indicator to measure chromosome instability, which may be useful for tumor research and diagnosis.

  13. Staedtke, V. et al. Actionable molecular biomarkers in primary brain tumors. Trends in cancer 2, 338–349 (2016). doi: 10.1016/j.trecan.2016.06.003

    This is a review paper, which introduces biomarkers (including genomic features such as chromosome instability) in brain tumor. This paper and above one are both provided in precisionFDA challenge.

February 2022

  1. https://www.medrxiv.org/content/10.1101/2022.01.19.22269566v1.full.pdf doi :[https://doi.org/10.1101/2022.01.19.22269566]

  2. https://towardsdatascience.com/simple-3d-mri-classification-ranked-bronze-on-kaggle-87edfdef018a

    this is example for mri classification with monai image processing framework

  3. https://clincancerres.aacrjournals.org/content/27/20/5586.long (Minimal Residual Disease Detection using a Plasma-only Circulating Tumor DNA Assay in Patients with Colorectal Cancer)

About

Literature review repository of GeneStat team.

License:Creative Commons Attribution Share Alike 4.0 International