HAN-Siyu / LION

An Integrated R Package for Effective Prediction of ncRNA- and lncRNA-protein Interaction

Home Page:https://doi.org/10.1093/bib/bbac420

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LION: An Integrated R Package for Effective Prediction of LncRNA- and ncRNA-ProteIn InteractiON

Understanding ncRNA-protein interaction is of critical importance to unveil ncRNAs' functions. Now many computational tools have been developed to facilitate the research on ncRNA-protein interaction. Nonetheless, the majority of these tools show unstable results and lack the flexibility required by dataset-specific prediction. Here we propose an integrated package LION which comprises a new method for predicting ncRNA/lncRNA-protein interaction as well as a comprehensive strategy to meet the requirement of customisable prediction. As an integrated tool for predicting ncRNA-protein interaction, LION can be used to build adaptable models for species and tissue-specific prediction and considerably enhance the performance of several widely-used tools. Experimental results also demonstrate our method outperforms its competitors on multiple benchmark datasets. We expect LION will be a powerful and efficient tool for the prediction and analysis of ncRNA- and lncRNA-protein interaction.

Try our LncFinder if you want to identify lncRNAs!

Any questions regarding LION please drop an email to siyu.han@tum.de or post it to issues.

Install LION

Using devtools

# Enter the following command in R:

if (!library("devtools", logical.return = T)) install.packages("devtools")
devtools::install_github("HAN-Siyu/LION")

Or Download Source Package Here and Install Manually.

Supporting Files

[PDF Manual] [Datasets and Raw Results]

Dependencies

Almost all dependencies have been installed when installing LION. However, secondary strucutre features are computed using standalone software, RNAsubopt (from ViennaRNA package) and Predator. You need to download these two programmes if you would like to use method lncPro or extract structural features.

Basic Guideline

We expect LION could be a powerful package for predicting RNA-protein interaction in a uniform R environment. The functions of LION can be categorized into several groups to facilitate feature extraction, interaction prediction and model tuning. We here provide a basic summary for LION's function. Detailed examples and parameters explanations can be found in our manual.

Functions for feature extraction

  • computeFreq(): compute k-mer frequencies of RNA/protein sequences. Support three amino acids reprentations, entripy density profile (EDP) computation and data normalization.
  • computeMLC(): compute the most-like coding region of RNA sequences. Support two strategies: longest open reading frame (ORF) and maximum subarray sum (MSS).
  • computeMotifs(): compute number of motifs in RNA/protein sequences. User-defined motifs are also supported.
  • computePhysChem(): compute physicochemical features of RNA/protein sequences. See the manual for details.
  • computePhysChem_AAindex(): compute various physicochemical features of protein sequences using AAindex.
  • computeStructure(): computes the secondary structural features of RNA/protein sequences using ViennaRNA/Predator packages (the packages are required).

Functions for feature set construction

  • featureFreq(): calculate and construct feature set using k-mer frequencies.
  • featureMotifs(): calculate and construct feature set using motif patterns.
  • featurePhysChem(): calculate and construct feature set using physicochemical properties.
  • featureStructure(): calculate and construct feature set using the secondary structural information.

Functions for random forestion model training

  • randomForest_CV(): perform stratified k-cross-validation.
  • randomForest_RFE(): perform stratified feature selection using recursive feature elimination (RFE).
  • randomForest_tune(): tuning mtry of random forest model.

Functions for RNA-protein prediction with different methods

  • run_LION(): predict interaction or construct feature set or retrain models using LION method (this work).
  • run_LncADeep(): predict interaction (retrained random forest model) or construct feature set or retrain models using LncADeep method. If you would like to use original deep neural network-based model, please refer to the original repository.
  • run_lncPro(): predict interaction (support original algorithm and retrained random forest model) or construct feature set or retrain models using lncPro method. Original repository is not available when publishing this readme document.
  • run_rpiCOOL(): predict interaction (retrained random forest model) or construct feature set or retrain models using rpiCOOL method. Original repository is not available when publishing this readme document.
  • run_RPISeq(): predict interaction (support web-based original algorithm and retrained random forest model) or construct feature set or retrain models using RPISeq method.
  • run_confidentPrediction(): perform confident prediction by employing all available methods. Users can further calculate intersection/union or build new models with the output of this function.

Other Utilities

  • formatSeq(): generate sequences pairs for feature extraction or prediction.
  • evaluatePrediction(): compute metrics, including TP, TN, FP, FN, Sensitivity, Specificity, Accuracy, F1-Score, MCC (Matthews Correlation Coefficient) and Cohen’s Kappa, to evaluate prediction results.
  • runPredator(): call Predator to process protein sequences (Predator is required).
  • runRNAsubopt(): call RNAsubopt to process protein sequences (ViennaRNA package is required).

Citation

To cite LION in publications, please use:

Han, Siyu, et al. "LION: an integrated R package for effective prediction of ncRNA–protein interaction." Briefings in Bioinformatics 23.6 (2022): bbac420.

The authors would be glad to hear how LION is used in your study. You are kindly encouraged to notify us (siyu.han@tum.de) about any work you publish!

About

An Integrated R Package for Effective Prediction of ncRNA- and lncRNA-protein Interaction

https://doi.org/10.1093/bib/bbac420

License:GNU General Public License v3.0


Languages

Language:R 100.0%