We developed PEA-m5C, an accurate transcriptome-wide m5C modification predictor under machine learning framework with random forest algorithm. PEA-m5C was trained with features from the flanking sequences of m5C modifications. In addition, we also deposited all the candidate m5C modification sites in the Ara-m5C database (http://bioinfo.nwafu.edu.cn/software/Ara-m5C.html) for follow-up functional mechanism researches. Finally, in order to maximize the usage of PEA-m5C, we implement it into a cross-platform, user-friendly and interactive interface and an R package named “PEA-m5C” based R statistical language and JAVA programming language, which may advance functional researches of m5C.
- Version 0.11--R -First version released on January, 6th, 2018
- Version 0.1--JAVA -First version released on January, 6th, 2018
- R (>= 3.3.1)
- randomForst (>= 0.6)
- seqinr (>= 3.4-5)
- stringr (>= 1.2.0)
- pROC (>= 1.10.0)
- ggplot2 (>= 2.2.1)
- FSelector (>= 0.21)
- JAVA1.8 Environmentally dependent
## Install rJAVA
sudo apt-get update
sudo apt-get install r-cran-rjava r-cran-rweka
## Install R Dependency
dependency.packages <- c("randomForest", "seqinr", "stringr", "FSelector", "bigmemory", "ggplot2", "PRROC", "pROC")
install.packages(dependency.packages)
install.packages("Download path/PEAm5C_0.11.tar.gz",repos = NULL, type = "source")
- Read FASTA file and motif scanning
- Feature encoding of sequences
- m5C prediction using Random Forest models
- Provide positive and negative sample information
- Automatic verification of the training process
- Prediction using user-defined models
The basic data set can be finded in data.
More details can be seen from user manual.
- 1.1 Read FASTA file and motif scanning
seq <- extra_motif_seq(input_seq_dir = paste0(system.file(package = "PEAm5c"),"/data/cdna.fa"),up = 5)
seq <- lapply(seq, c2s)
- 1.2 Feature encoding of sequences
seq_feature <- FeatureExtract(seq)
- 1.3 m5C prediction using Random Forest models
res <- predict_m5c(seq_feature)
- 2.1 Provide positive and negative sample information
load(paste0(system.file(package = "PEAm5c"),"/data/samples.Rds"))
### The positive and negative sequence can be read and identified by extra_motif_seq and feature encoding by FeatureExtract
- 2.2 Automatic verification of the training process
seq <- PEA_ml(pos_sample = pos_sample,neg_sample = neg_sample)
model <- extra_model(res = seq)
model
- 2.3 Prediction using user-defined models
res <- predict_self_model(models = model,sequence_dir = paste0(system.file(package = "PEAm5c"),"/data/cdna.fa"))
table(res[,4])
Please use PEAm5C/issues for how to use PEAm5C and reporting bugs.