lovivi / PEA-m5C

A machine learning-based m5C predictor

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PEAm5C: An integrated R toolkit for plant m5C analysis.


We developed PEA-m5C, an accurate transcriptome-wide m5C modification predictor under machine learning framework with random forest algorithm. PEA-m5C was trained with features from the flanking sequences of m5C modifications. In addition, we also deposited all the candidate m5C modification sites in the Ara-m5C database (http://bioinfo.nwafu.edu.cn/software/Ara-m5C.html) for follow-up functional mechanism researches. Finally, in order to maximize the usage of PEA-m5C, we implement it into a cross-platform, user-friendly and interactive interface and an R package named “PEA-m5C” based R statistical language and JAVA programming language, which may advance functional researches of m5C.

Version and download

Depends

R environment

Global software environment

  • JAVA1.8 Environmentally dependent

Dependency installation

## Install rJAVA
sudo apt-get update
sudo apt-get install r-cran-rjava r-cran-rweka
## Install R Dependency
dependency.packages <- c("randomForest", "seqinr", "stringr", "FSelector", "bigmemory", "ggplot2", "PRROC", "pROC")
install.packages(dependency.packages)

Installation

install.packages("Download path/PEAm5C_0.11.tar.gz",repos = NULL, type = "source")

Contents

Predicting m5C sites

  • Read FASTA file and motif scanning
  • Feature encoding of sequences
  • m5C prediction using Random Forest models

user-defined model

  • Provide positive and negative sample information
  • Automatic verification of the training process
  • Prediction using user-defined models

Quick start

The basic data set can be finded in data.
More details can be seen from user manual.

1.Predicting m5C sites

  • 1.1 Read FASTA file and motif scanning
seq <- extra_motif_seq(input_seq_dir = paste0(system.file(package = "PEAm5c"),"/data/cdna.fa"),up = 5)
seq <- lapply(seq, c2s)
  • 1.2 Feature encoding of sequences
seq_feature <- FeatureExtract(seq)
  • 1.3 m5C prediction using Random Forest models
res <- predict_m5c(seq_feature)

2.User-defined model

  • 2.1 Provide positive and negative sample information
load(paste0(system.file(package = "PEAm5c"),"/data/samples.Rds"))
### The positive and negative sequence can be read and identified by extra_motif_seq and  feature encoding by FeatureExtract 
  • 2.2 Automatic verification of the training process
seq <- PEA_ml(pos_sample = pos_sample,neg_sample = neg_sample)
model <- extra_model(res = seq)
model
  • 2.3 Prediction using user-defined models
res <- predict_self_model(models = model,sequence_dir = paste0(system.file(package = "PEAm5c"),"/data/cdna.fa"))
table(res[,4])

Ask questions

Please use PEAm5C/issues for how to use PEAm5C and reporting bugs.

About

A machine learning-based m5C predictor


Languages

Language:R 100.0%