bindi-nagda / promSEMBLE

Promoter region classification in DNA sequences

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

promSEMBLE: Hard Pattern Mining and Ensemble Learning for Detecting DNA Promoter Sequences

Bindi M. Nagda, Van Minh Nguyen, Ryan T. White

alt text

Motivation:

Accurate identification of DNA promoter sequences is of crucial importance in unraveling the underlying mechanisms that regulate gene transcription. Initiation of transcription is controlled through regulatory transcription factors binding to promoter core regions in the DNA sequence. Detection of promoter regions is necessary if we are to build genetic regulatory networks for biomedical and clinical applications. We propose a novel ensemble learning technique using deep recurrent neural networks with convolutional feature extraction and hard negative pattern mining to detect several types of promoter sequences, including promoter sequences with the TATA-box and without the TATA-box, within DNA sequences of both humans and mice. Using previously published results and extensive independent tests demonstrates our method sets a new state of the art in all four categories for accurately and precisely recognizing the stretch of base pairs that code for the promoter region within the DNA sequences.

Data

EPDNew Database

Results

Our method shows superiority to 4 other state-of-the-art models since it minimizes the rate of both false positives and false negatives. The model presented is unrivaled in multiple measures of performance including Matthews Correlation Coefficient (MCC), precision, sensitivity and specificity. Our model yields the best MCC values across all organisms, achieving a greater than 99% score for all organisms except fruit fly with and without TATA where it achieves a 98.1% score. It goes on to achieve $\geq$ 98% across all 4 performance metrics evaluated for all 8 organisms.

Contact

Go to contact information

About

Promoter region classification in DNA sequences


Languages

Language:Jupyter Notebook 99.6%Language:Python 0.4%