gogleva / metablastr

Perform and analyze large-scale BLAST searches with R

Home Page:https://hajkd.github.io/metablastr/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

metablastr

An easy way to perform local large-scale BLAST searches with R

The Basic Local Alignment Search Tool (BLAST) finds regions of sequence similarity between a query and a subject sequence or sequence database.

The metablastr package provides interface functions between R and the standalone (command line tool) version of BLAST. The search report generated by BLAST can then be imported into the R session either as data.frame/tibble or as PostgresSQL database connection and can be analysed, filtered, or processed with common R data science tools such as dplyr, ggplot2, etc.

The metablastr::read_blast() function has a PostgresSQL database backend, which allows users to generate and store very large BLAST reports and still process them in R via the dplyr database notation.

Hence, using metablastr::read_blast() in database mode users can use the familiar dplyr notation to analyze large scale BLAST reports. Additional functions such as blast_nr(), blast_pdb_protein(), blast_refseq_protein(), etc. are designed to perform easy-to-run BLAST searches on a metagenomic scale. Metagenomic data retrieval can be performed via the biomartr package. The biomartr and metablastr packages are designed to seamlessly work together. The corresponding BLAST output can then be analyzed using specialized metablastr::filter_blast_* functions.

Install metablastr

For Linux Users:

Please install the libpq-dev library on you linux machine by typing into the terminal:

sudo apt-get install libpq-dev

For all systems install metablastr by typing

# install.packages("devtools")
# install the current version of metablastr on your system
library(devtools)
install_github("HajkD/metablastr", build_vignettes = TRUE, dependencies = TRUE)

Quick start

library(metablastr)
# run blastn (nucleotide to nucleotide search) between example query and subject sequences
blast_test <- blast_nucleotide_to_nucleotide(
                 query   = system.file('seqs/qry_nn.fa', package = 'metablastr'),
                 subject = system.file('seqs/sbj_nn.fa', package = 'metablastr'),
                 output.path = tempdir(),
                 db.import  = FALSE)
                 
# look at BLAST results
blast_test

Discussions and Bug Reports

I would be very happy to learn more about potential improvements of the concepts and functions provided in this package.

Furthermore, in case you find some bugs, need additional (more flexible) functionality of parts of this package, or want to contribute to this project please let me know:

https://github.com/HajkD/metablastr/issues

Interfaces implemented in metablastr:

Perform BLAST searches

  • blast_protein_to_protein(): Perform Protein to Protein BLAST Searches (BLASTP)
  • blast_nucleotide_to_nucleotide(): Perform Nucleotide to Nucleotide BLAST Searches (BLASTN)
  • blast_nucleotide_to_protein(): Perform Nucleotide to Protein BLAST Searches (BLASTX)
  • blast_protein_to_nucleotide(): Perform Protein to Nucleotide BLAST Searches (TBLASTN)
  • blast_best_hit(): Retrieve only the best BLAST hit for each query
  • blast_best_reciprocal_hit(): Retrieve only the best reciprocal BLAST hit for each query
  • read_blast(): Import BLAST output into R session (in memory) or via PostgresSQL database connection.

BLAST against common NCBI databases

  • blast_nr(): Perform Protein to Protein BLAST Searches against the NCBI non-redundant database
  • blast_nt(): Perform Nucleotide to Nucleotide BLAST Searches against the NCBI non-redundant database
  • blast_est(): Perform Nucleotide to Nucleotide BLAST Searches against the NCBI expressed sequence tags database
  • blast_pdb_protein():
  • blast_pdb_nucleotide():
  • blast_swissprot():
  • blast_delta():
  • blast_refseq_rna():
  • blast_refseq_gene():
  • blast_refseq_protein():

Analyze BLAST Report

  • filter_blast_:

Navigation functions

  • list_outformats(): List available BLAST output formats

About

Perform and analyze large-scale BLAST searches with R

https://hajkd.github.io/metablastr/


Languages

Language:R 100.0%