a-slide / blastpy3

Simple and lightweight Python 3 wrapper module for NCBI BLAST+

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Blastpy3

PyPI version Downloads Anaconda Version Anaconda Downloads License Language


Lightweight High level Python 3 API for NCBI BLAST+ blastn


Blastn

This class contain the wrapper for Blastn and require the installation of ncbi Blast+ 2.2.28+.

Setup Blastn object: Create subject database

Upon instantiation, a database is created from the user-provided subject sequence. Database files are created in a temporary directory. The following parameters can be customized at Blastn objects instantiation

  • ref_path: Path to the reference fasta file (not gzipped). Mandatory
  • makeblastdb_exec: Path of the makeblastdb executable. Default = "makeblastdb"
  • makeblastdb_opt: makeblastdb command line options as a string. Default = ""

To ensure a proper database files deletion at the end of the execution it is possible to call the object using the with statement. Alternatively you can call the rm_db method at the end of the Blastn usage.

Code

with Blastn(ref_path="./subject.fa") as blastn:
    print (blastn)

Output

CREATE DATABASE: makeblastdb  -dbtype nucl -input_type fasta -in subject.fa -out temp_dir

MAKEBLASTDB CLASS	Parameters list
	db_dir	/tmp/tmplbkdwzm2
	db_path	/tmp/tmplbkdwzm2/Yeast
	makeblastdb_exec	makeblastdb
	makeblastdb_opt
	ref_path	./data/Yeast.fa
	verbose	False

Cleaning up blast DB files for "subject"

Calling Blastn object: Perform Blastn and return a list of hits

The "align" method of a Blastn object can then be called with a query fasta file (query_path) or directly with a sequence string (query_seq).. The following parameters can be customized at Blastn objects calling:

  • query_path: Path to a fasta file containing the query sequences (not gzipped). Mandatory
  • query_seq: sequence string
  • blast_exec: Path of the blast executable. By Default blastn will be used. Default = "blastn"
  • blastn_opt: Blastn command line options as a string. Default = ""
  • task: Type of blast to be performed ('blastn' 'blastn-short' 'dc-megablast' 'megablast' 'rmblastn'). Default = "dc-megablast"
  • evalue: E Value cuttoff to retain alignments. Default = 1
  • best_query_hit: find and return only the best hit per query. Default = False

A list containing 1 BlastHit object for each query hit found in the subject will be returned, except if not hit were found in which situation 'None' will be returned. If the best_query_hit flag was set to True, Only the best hit per query sequence from the query file will be returned.

Code

with Blastn(ref_path="./subject.fa") as blastn:
    hit_list = blastn(query_path="./query.fa")
    for hit in hit_list:
        print (hit)

Output

CREATE DATABASE: makeblastdb  -dbtype nucl -input_type fasta -in ./subject.fa -out /tmp/tmp1ZBlfT/subject

MAKE BLAST: blastn  -num_threads 4 -task dc-megablast -evalue 1 -outfmt "6 std qseq" -dust no -query ./query.fa -db /tmp/tmp1ZBlfT/subject

	2 hits found
HIT 0	Query	query1:0-48(+)
	Subject	subject:19-67(+)
	Lenght : 48	Identity : 100.0%	Evalue : 2e-23	Bit score : 87.8
	Aligned query seq : GCATGCTCGATCAGTAGCTCTCAGTACGCATACGCTAGCATCACGACT

HIT 1	Query	query2:0-48(+)
	Subject	subject:89-137(+)
	Lenght : 48	Identity : 100.0%	Evalue : 2e-23	Bit score : 87.8
	Aligned query seq : CGCATCGACTCGATCTGATCAGCTCACAGTCAGCATCAGCTACGATCA

Cleaning up blast DB files for "subject"

BlastHit

Python object representing a hit found by blastn. The object contains the following public fields:

  • id: Auto incremented unique identifier [INT]
  • q_id: Query sequence name [STR]
  • s_id: Subject sequence name [STR]
  • identity: % of identity in the hit [FLOAT 0:100]
  • length: length of the hit [INT >=0]
  • mis: Number of mismatch in the hit [INT >=0]
  • gap: Number of gap in the hit [INT >=0]
  • q_start: Hit start position of the query sequence [INT >=0]
  • q_end: Hit end position of the query sequence [INT >=0]
  • s_start: Hit start position of the subject sequence [INT >=0]
  • s_end: Hit end position of the subject sequence [INT >=0]
  • evalue: E value of the alignment [FLOAT >=0]
  • bscore: Bit score of the alignment[FLOAT >=0]
  • q_seq: Sequence of the query aligned on the subject sequence [STR]
  • q_orient: Orientation of the query sequence [+ or -]
  • s_orient: Orientation of the subject sequence [+ or -]

The validity of numeric value is checked upon instantiation. Invalid values will raise assertion errors.

BlastHit Objects can return a comprehensive report of themselves under the form of an ordered dictionnary:

code

# Interactive import
from BlastHit import BlastHit

# Create a default BlastHit object
h = BlastHit()

# Call the report method
h.get_report(full = True)

Output

OrderedDict([('Query', 'query:0-10(+)'), ('Subject', 'subject:0-10(+)'), ('Identity', 100.0), ('Evalue', 0.0), ('Bit Score', 0.0), ('Hit length', 10), ('Number of gap', 0), ('Number of mismatch', 0)])

Testing pyBlast module

The module can be easily tested thanks to pytest

  • Install pytest with pip pip instal pytest
  • Run test with py.test-2.7 -v

Example of output if successful. Please note than some tests might fail due to the random sampling of DNA sequences, and uncertainties of Blastn algorithm.

========================================== test session starts ===========================================
platform linux2 -- Python 2.7.5 -- py-1.4.27 -- pytest-2.7.0 -- /usr/bin/python
rootdir: /home/adrien/Programming/Python/pyBlast, inifile:
collected 21 items

test_pyBlast.py::test_BlastHit[4.16866907958-57-98-69-88-12-100-43-1.40452897105-47.3666242716] PASSED
test_pyBlast.py::test_BlastHit[-1-7-10-20-73-54-25-45-98.7921480151-45.2397166228] xfail
test_pyBlast.py::test_BlastHit[8.92741377413--1-100-36-34-33-14-71-18.8547135761-97.6604693294] xfail
test_pyBlast.py::test_BlastHit[10.5987790458-46--1-45-78-81-86-86-73.8740266727-56.887410005] xfail
test_pyBlast.py::test_BlastHit[66.8213911219-62-48--1-91-10-60-20-88.7850139735-81.7901609219] xfail
test_pyBlast.py::test_BlastHit[86.6626174287-29-83-34--1-53-57-68-17.9799756069-7.83036609495] xfail
test_pyBlast.py::test_BlastHit[5.23985331666-43-85-33-7--1-14-3-74.2130782704-88.9289495285] xfail
test_pyBlast.py::test_BlastHit[75.6935977321-8-78-68-10-39--1-74-44.1447867052-22.5203082483] xfail
test_pyBlast.py::test_BlastHit[39.8692596061-60-5-49-77-9-31--1-2.59963139531-46.3133849683] xfail
test_pyBlast.py::test_BlastHit[15.7192632366-24-92-1-64-82-83-90--1-75.5540618409] xfail
test_pyBlast.py::test_BlastHit[18.6627439886-34-57-60-5-45-26-40-77.7840842678--1] xfail
test_pyBlast.py::test_Blastn[blastn-Queries from Subject] PASSED
test_pyBlast.py::test_Blastn[blastn-Random queries] xfail
test_pyBlast.py::test_Blastn[blastn-short-Queries from Subject] PASSED
test_pyBlast.py::test_Blastn[blastn-short-Random queries] xfail
test_pyBlast.py::test_Blastn[dc-megablast-Queries from Subject] PASSED
test_pyBlast.py::test_Blastn[dc-megablast-Random queries] xfail
test_pyBlast.py::test_Blastn[megablast-Queries from Subject] PASSED
test_pyBlast.py::test_Blastn[megablast-Random queries] xfail
test_pyBlast.py::test_Blastn[rmblastn-Queries from Subject] PASSED
test_pyBlast.py::test_Blastn[rmblastn-Random queries] xfail

================================== 6 passed, 15 xfailed in 5.91 seconds ==================================

Dependencies

Authors and Contact

Adrien Leger - 2015

About

Simple and lightweight Python 3 wrapper module for NCBI BLAST+

License:GNU General Public License v3.0


Languages

Language:Python 95.9%Language:Shell 4.1%