PEAm5C: An integrated R toolkit for plant m5C analysis.

We developed PEA-m5C, an accurate transcriptome-wide m5C modification predictor under machine learning framework with random forest algorithm. PEA-m5C was trained with features from the flanking sequences of m5C modifications. In addition, we also deposited all the candidate m5C modification sites in the Ara-m5C database (http://bioinfo.nwafu.edu.cn/software/Ara-m5C.html) for follow-up functional mechanism researches. Finally, in order to maximize the usage of PEA-m5C, we implement it into a cross-platform, user-friendly and interactive interface and an R package named “PEA-m5C” based R statistical language and JAVA programming language, which may advance functional researches of m5C.

Version and download

Version 0.11--R -First version released on January, 6th, 2018
Version 0.1--JAVA -First version released on January, 6th, 2018

Depends

R environment

R (>= 3.3.1)
randomForst (>= 0.6)
seqinr (>= 3.4-5)
stringr (>= 1.2.0)
pROC (>= 1.10.0)
ggplot2 (>= 2.2.1)
FSelector (>= 0.21)

Global software environment

JAVA1.8 Environmentally dependent

Dependency installation

## Install rJAVA
sudo apt-get update
sudo apt-get install r-cran-rjava r-cran-rweka

## Install R Dependency
dependency.packages <- c("randomForest", "seqinr", "stringr", "FSelector", "bigmemory", "ggplot2", "PRROC", "pROC")
install.packages(dependency.packages)

Installation

install.packages("Download path/PEAm5C_0.11.tar.gz",repos = NULL, type = "source")

Predicting m5C sites

Read FASTA file and motif scanning
Feature encoding of sequences
m5C prediction using Random Forest models

user-defined model

Provide positive and negative sample information
Automatic verification of the training process
Prediction using user-defined models

Quick start

The basic data set can be finded in data.
More details can be seen from user manual.

1.Predicting m5C sites

1.1 Read FASTA file and motif scanning

seq <- extra_motif_seq(input_seq_dir = paste0(system.file(package = "PEAm5c"),"/data/cdna.fa"),up = 5)
seq <- lapply(seq, c2s)

1.2 Feature encoding of sequences

seq_feature <- FeatureExtract(seq)

1.3 m5C prediction using Random Forest models

res <- predict_m5c(seq_feature)

2.User-defined model

2.1 Provide positive and negative sample information

load(paste0(system.file(package = "PEAm5c"),"/data/samples.Rds"))
### The positive and negative sequence can be read and identified by extra_motif_seq and  feature encoding by FeatureExtract

2.2 Automatic verification of the training process

seq <- PEA_ml(pos_sample = pos_sample,neg_sample = neg_sample)
model <- extra_model(res = seq)
model

2.3 Prediction using user-defined models

res <- predict_self_model(models = model,sequence_dir = paste0(system.file(package = "PEAm5c"),"/data/cdna.fa"))
table(res[,4])

Ask questions

Please use PEAm5C/issues for how to use PEAm5C and reporting bugs.

lovivi / PEA-m5C