metablastr
An easy way to perform local large-scale BLAST searches with R
The Basic Local Alignment Search Tool (BLAST) finds regions of sequence similarity between a query and a subject sequence or sequence database.
The metablastr
package provides interface functions between R and the standalone (command line tool) version
of BLAST. The search report generated by BLAST can then be imported into the R session
either as data.frame
/tibble or as PostgresSQL
database connection and can be analysed, filtered, or processed with common R data science tools such as dplyr, ggplot2, etc.
The metablastr::read_blast()
function has a PostgresSQL database backend, which allows users to generate and store very large BLAST reports
and still process them in R via the dplyr database notation.
Hence, using metablastr::read_blast()
in database mode users can use the familiar dplyr
notation
to analyze large scale BLAST reports. Additional functions such as blast_nr()
, blast_pdb_protein()
, blast_refseq_protein()
, etc.
are designed to perform easy-to-run BLAST searches on a metagenomic scale. Metagenomic data retrieval can be performed via the biomartr package. The biomartr
and metablastr
packages are designed to seamlessly work together. The corresponding BLAST output can then
be analyzed using specialized metablastr::filter_blast_*
functions.
metablastr
Install For Linux Users:
Please install the libpq-dev
library on you linux machine by typing into the terminal:
sudo apt-get install libpq-dev
metablastr
by typing
For all systems install # install.packages("devtools")
# install the current version of metablastr on your system
library(devtools)
install_github("HajkD/metablastr", build_vignettes = TRUE, dependencies = TRUE)
Quick start
library(metablastr)
# run blastn (nucleotide to nucleotide search) between example query and subject sequences
blast_test <- blast_nucleotide_to_nucleotide(
query = system.file('seqs/qry_nn.fa', package = 'metablastr'),
subject = system.file('seqs/sbj_nn.fa', package = 'metablastr'),
output.path = tempdir(),
db.import = FALSE)
# look at BLAST results
blast_test
Discussions and Bug Reports
I would be very happy to learn more about potential improvements of the concepts and functions provided in this package.
Furthermore, in case you find some bugs, need additional (more flexible) functionality of parts of this package, or want to contribute to this project please let me know:
https://github.com/HajkD/metablastr/issues
metablastr
:
Interfaces implemented in Perform BLAST searches
blast_protein_to_protein()
: Perform Protein to Protein BLAST Searches (BLASTP)blast_nucleotide_to_nucleotide()
: Perform Nucleotide to Nucleotide BLAST Searches (BLASTN)blast_nucleotide_to_protein()
: Perform Nucleotide to Protein BLAST Searches (BLASTX)blast_protein_to_nucleotide()
: Perform Protein to Nucleotide BLAST Searches (TBLASTN)blast_best_hit()
: Retrieve only the best BLAST hit for each queryblast_best_reciprocal_hit()
: Retrieve only the best reciprocal BLAST hit for each queryread_blast()
: Import BLAST output into R session (in memory) or viaPostgresSQL
database connection.
BLAST against common NCBI databases
blast_nr()
: Perform Protein to Protein BLAST Searches against theNCBI non-redundant database
blast_nt()
: Perform Nucleotide to Nucleotide BLAST Searches against theNCBI non-redundant database
blast_est()
: Perform Nucleotide to Nucleotide BLAST Searches against theNCBI expressed sequence tags database
blast_pdb_protein()
:blast_pdb_nucleotide()
:blast_swissprot()
:blast_delta()
:blast_refseq_rna()
:blast_refseq_gene()
:blast_refseq_protein()
:
Analyze BLAST Report
filter_blast_
:
Navigation functions
list_outformats()
: List available BLAST output formats