LncADeep: An ab initio lncRNA identification and functional annotation tool based on deep learning

Introduction
Requirements
- LncADeep.py
Version
Installation
Usage
- LncADeep.py
Examples
- For lncRNA identification.
- For lncRNA functional annotaion.
Citation
Contact

Introduction

Long noncoding RNAs (lncRNAs) play important biological roles and have been implicated in human diseases. To characterize lncRNAs, identifying and annotating lncRNAs is necessary. Here, we propose a novel lncRNA identification and functional annotation tool named LncADeep. First, LncADeep identifies lncRNAs by integrating sequence intrinsic and homology features based on deep belief networks. Second, LncADeep predicts lncRNA-protein interactions using sequence and structure features based on deep neural networks. Third, since accurate lncRNA-protein interactions can help to infer the functions of lncRNAs, LncADeep conducts KEGG and Reactome pathway enrichment analysis and functional module detection with the predicted interacting proteins of lncRNAs. Case studies show that LncADeep's annotations for lncRNAs comply with their known functions. As a tool for lncRNA identification and functional annotation based on deep learning, LncADeep has outperformed state-of-the-art tools on predicting lncRNAs and lncRNA-protein interactions, and can automatically provide informative functional annotations for lncRNAs.

LncADeep is freely available for non-commercial use at http://cqb.pku.edu.cn/ZhuLab/lncadeep or https://github.com/cyang235/LncADeep.

Requirements

LncADeep.py

For lncRNA identification.

For predicting lncRNA-protein interactions and annotating lncRNA functions.

Version

LncADeep 1.0 (Tested on Linux_64, including CentOS 6.5 and Ubuntu 16.04)

Installation

Prerequisites

Please install numpy, theano, pandas, Keras, h5py, and iGraph according to their manuals. The following are examples for installing these prerequisites.

numpy, theano, pandas, and h5py are python packages, which can be installed with pip, for example:

# we use python v2.7.13
# our machine is implemented with the following versions

pip install numpy       # numpy v1.13.1
pip install Theano      # Theano v0.9.0
pip install pandas      # pandas v0.20.3
pip install h5py        # h5py v2.7.0

iGraph is an R package, which can be installed with:

# we use R v3.3.2
# Download and install the package

install.packages("igraph")

Keras is a Python Deep Learning library. For Keras, please be noted that we use Keras v1.2.2, and we use theano as its backend. Please edit the file ~/.keras/keras.json and change the backend. For example,

# Download Keras v1.2.2
wget https://github.com/fchollet/keras/archive/1.2.2.tar.gz

# unpack the zipped file
tar xzvf 1.2.2.tar.gz

# install Keras v1.2.2
cd keras-1.2.2
python setup.py install

# edit ~/.keras/keras.json and change the backend
{
    "image_dim_ordering": "th",
    "epsilon": 1e-07,
    "floatx": "float32",
    "backend": "theano"
}

HMMER and MCL package have been included in LncADeep package. You don't need to install them yourself. After installing the above prerequisites, you can now install LncADeep.

Install LncADeep from zipped file

download the zipped file

wget http://cqb.pku.edu.cn/ZhuLab/LncADeep/LncADeep_v1.0.tgz

unpack the zipped file

tar xzvf LncADeep_v1.0.tgz

change directory to LncADeep

cd LncADeep_v1.0

configure and add directory to the PATH, and you are done!

chmod +x configure
./configure

source $HOME/.bash_profile

Install LncADeep using git

clone LncADeep package

git clone https://github.com/cyang235/LncADeep.git

change directory to LncADeep

cd LncADeep

download Pfam 29.0 database

wget ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam29.0/Pfam-A.hmm.gz
gzip -d Pfam-A.hmm.gz
mv Pfam-A.hmm ./LncADeep_lncRNA/src/

# the Pfam-A.hmm need be put in directory /path to LncADeep/LncADeep_lncRNA/src/

configure and add directory to the PATH, and you are done!

chmod +x configure
./configure

source $HOME/.bash_profile

Usage

LncADeep.py

An ab initio lncRNA identification and functional annotation tool based on deep learning

usage: LncADeep.py [options]

An ab initio lncRNA identification and functional annotation tool based on
deep learning

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -MODE {lncRNA,anno}, --MODE {lncRNA,anno}
						(Required) The mode used for lncRNA identification or
						functional annotation. If "lncRNA" is chosen, LncADeep
						will identify lncRNAs. If "anno" is chosen, LncADeep
						will predict lncRNA-protein interactions and annotate
						lncRNA functions. Default is "lncRNA"
  -o OUT_PREFIX, --out OUT_PREFIX
						(Required) The output prefix of results
  -f FASTA_FILE, --fasta FASTA_FILE
						(Required for lncRNA identification) Sequence file in
						FASTA format to be predicted
  -m {full,partial}, --model {full,partial}
						(Optional for lncRNA identification) The model used
						for lncRNA identification, default is "partial"
  -s {human,mouse}, --species {human,mouse}
						(Optional for lncRNA identification) The species used
						for lncRNA identification, default is "human"
  -th THREAD, --thread THREAD
						(Optional for lncRNA identification) Use multi-thread
						for predicting, default is 1
  -HMM HMMTHREAD, --HMMthread HMMTHREAD
						(Optional for lncRNA identification) The thread number
						of using HMMER, default is 8
  -l RNA_FILE, --lncRNA RNA_FILE
						(Required for functional annotation) lncRNA sequence
						file in FASTA format
  -p PROTEIN_FILE, --protein PROTEIN_FILE
						(Optional for functional annotation) protein sequence
						file in FASTA format
  -a {1,0}, --annotation {1,0}
						(Optional for functional annotation) To annotate
						lncRNA functions. If "1" is selected, LncADeep will
						annotate the functions for lncRNAs, otherwise LncADeep
						will only give the interacting proteins for lncRNAs.
						The default is "1".
  -r PAIR_FILE, --pair PAIR_FILE
						(Optional for functional annotation) The lncRNA-
						protein pairs to be predicted. If this option is
						selected, LncADeep will only output interacting
						proteins.

Examples

The files for example are stored at directory /path to LncADeep/data

For lncRNA identification.

Identify lncRNAs using model for transcripts including full- and partial-length
```
 python LncADeep.py -MODE lncRNA -f ./data/LncADeep_lncRNA/lncRNA_mRNA_test.fa -o test
```
The output files will be generated at directory test_LncADeep_lncRNA_results.

To use multi-processes (e.g., 4) for lncRNA identification

 python LncADeep.py -MODE lncRNA -f ./data/LncADeep_lncRNA/lncRNA_mRNA_test.fa -o test \
                         -th 4

To use the model for full-length transcripts, please use the following command

 python LncADeep.py -MODE lncRNA -f ./data/LncADeep_lncRNA/lncRNA_mRNA_test.fa -o test \
                         -m full

LncADeep has been trained on the datasets of two species, including "human" and "mouse", the default model is "human". To use the model trained on mouse full- and partial-length transcripts, please use the following command.
```
 python LncADeep.py -MODE lncRNA -f ./data/LncADeep_lncRNA/lncRNA_mRNA_test.fa -o test \
                         -s mouse
```

To use the model trained on mouse full-length transcripts, please use the following command.

 python LncADeep.py -MODE lncRNA -f ./data/LncADeep_lncRNA/lncRNA_mRNA_test.fa -o test \
                         -m full -s mouse

LncADeep accepts nucleotide FASTA sequence as input, e.g.:

 >RNA_id_1
 GGAAACGGCCGTGGGCATTTTGGTGTATTTTTATTCAACTTTGAAAGACATATTTTATTTTTACACATTTTATTTTATACAGTA
 TAGACATACATATGCATACACGCCTCCTCTCATGACATTAAACTTTTGCACAACTTCACAATTGTAAATGATCACAGAAAAATG
 CCTCAAAATGAATGTATCATATCCTAGCCCCACCACTTAACCTCTCTGTGCCTCAGTTTTCTCCTCTGTAAAACGGGGATAATA
 ATAGTATCTACTTTATAAGTTGCTTGTAAGGGTTCAATGTGATTATGGTGTGAATGTGGGAAGCGCTCAGAAAGTATCATTTTC
 ATTATTATTAGAACTATTATTCCTTAATTGCAAACATTTAAATTCTAATTTTAT
 
 >RNA_id_2
 CATCTCTTTCCTTCTCAGGAAATTTTATACATTGTCAATTATTCCTTCTCTCTAACTTCAACCTCGCCTTCTTTGCTGAGTCTG
 ACCCATCAACAGTTAAACATGATCAAGTCTTCCGATTTAAAAGTCCCTCTTTCTTGACACAGCTCATTTATAGCCAAACTTCTT
 TCTGAAGAGTAGTCTACATTCATTTTCTTTTTCTCCCTCACTTCTGATAATATTGAACCAACTCCATTTTAGTTTCTGTCCCTA
 TCATTCCTCTAAATTGATTAAGGTCTCCAGAATATTCCTCTGTATTTACGGGCATTATTCACTGCTCTTCTTATTTGACTACTC
 AGCAAGCATTTAACTTTTGATCAGTTTTTCCTTAAAATACTTTACTTGGCTTCCTTGACATCATGGTTTTTGTTCAGATCTCTG
 TGGTTATTTCTGTCTCCTTTGCTGCCTTCTCCTCTTGGTCCTTG
 
 # LncADeep will predict whether `RNA_id_1` and `RNA_id_2` are lncRNAs. 
 # More example input can be found at directory `/path to LncADeep/data/LncADeep_lncRNA`

For lncRNA functional annotaion.

To predict lncRNA-protein interactions and annotate the functions of lncRNAs.

 python LncADeep.py -MODE anno -l ./data/LncADeep_anno/ENST00000424518.5.fa -o test

 # Here, LncADeep will predict the interactions between given lncRNAs and 20,121 reviewed proteins
 # and then annotate the functions of lncRNAs with their predicted interacting proteins.
 # The output files will be generated at directory `test_LncADeep_anno_results`

To predict lncRNA-protein interactions.

 python LncADeep.py -MODE anno -l ./data/LncADeep_anno/ENST00000424518.5.fa -o test -a 0

 # Here, LncADeep will predict the interactions between given lncRNAs and 20,121 reviewed proteins

To predict lncRNA-protein interactions for given pairs.

 python LncADeep.py -MODE anno -l ./data/LncADeep_anno/ENST00000424518.5.fa -o test \
             -r ./data/LncADeep_anno/pair.dat -p ./data/LncADeep_anno/protein.fa

 # Here, LncADeep will predict the interactions between given lncRNAs and proteins for given pairs
 # Users are required to provide the lncRNA and protein sequences in FASTA format 
 # and lncRNA-protein pairs in text format, see below

lncRNA-protein pairs in text format as input, e.g.:

 ENST00000424518.5|ENSG00000228630.5|OTTHUMG00000152934.1|OTTHUMT00000328662.1|HOTAIR-001|HOTAIR|2421| sp|P27361|MK03_HUMAN
 ENST00000424518.5|ENSG00000228630.5|OTTHUMG00000152934.1|OTTHUMT00000328662.1|HOTAIR-001|HOTAIR|2421| sp|P53779|MK10_HUMAN
 ENST00000424518.5|ENSG00000228630.5|OTTHUMG00000152934.1|OTTHUMT00000328662.1|HOTAIR-001|HOTAIR|2421| sp|Q15049|MLC1_HUMAN
 ENST00000424518.5|ENSG00000228630.5|OTTHUMG00000152934.1|OTTHUMT00000328662.1|HOTAIR-001|HOTAIR|2421| sp|Q9UHC1|MLH3_HUMAN
 ENST00000424518.5|ENSG00000228630.5|OTTHUMG00000152934.1|OTTHUMT00000328662.1|HOTAIR-001|HOTAIR|2421| sp|P0DMT0|MLN_HUMAN

 # LncADeep will predict the interactions for the above lncRNA-protein pairs. 
 # Users are also required to provide the lncRNA and protein FASTA sequence files.
 # More example input can be found at directory `/path to LncADeep/data/LncADeep_anno`

Citation

Cheng Yang, Longshu Yang, Man Zhou, Haoling Xie, Chengjiu Zhang, May D Wang, Huaiqiu Zhu; LncADeep: An ab initio lncRNA identification and functional annotation tool based on deep learning, Bioinformatics, 2018, bty428

Contact

Please direct your questions to: Dr. Huaiqiu Zhu, hqzhu@pku.edu.cn

cyang235 / LncADeep

LncADeep: An ab initio lncRNA identification and functional annotation tool based on deep learning

Introduction

Requirements

LncADeep.py

Version

Installation

Prerequisites

Install LncADeep from zipped file

Install LncADeep using git

Usage

LncADeep.py

Examples

For lncRNA identification.

For lncRNA functional annotaion.

Citation

Contact

About

Languages