giuferrero / galaxy_blast

Galaxy wrappers for NCBI BLAST+ and related BLAST tools.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Linux testing with TravisCI Landscape Code Metrics Code style: black

Introduction

Galaxy is a web-based platform for biological data analysis, supporting extension with additional tools (often wrappers for existing command line tools) and datatypes. See http://www.galaxyproject.org/ and the public server at http://usegalaxy.org for an example.

The NCBI BLAST suite is a widely used set of tools for biological sequence comparison. It is available as standalone binaries for use at the command line, and via the NCBI website for smaller searches. For more details see http://blast.ncbi.nlm.nih.gov/Blast.cgi

This repository is for the development of the main Galaxy wrappers for the NCBI BLAST+ suite, associated datatype definitions for use within Galaxy, and dependency handling within the Galaxy Tool Shed framework.

It also contains additional related Galaxy tools for working with BLAST.

Each of these Galaxy wrappers, tools, datatypes, etc has its own README file.

Galaxy wrappers for NCBI BLAST+

The main focus of this work is the development of the NCBI BLAST+ command line tool wrappers and datatype definitions for Galaxy, published on the Galaxy Tool Shed here:

Development test releases are on the Test Tool Shed here:

The associated development is on GitHub at:

Galaxy Packages

The NCBI BLAST+ binaries were initially included within the Galaxy wrappers Tool Shed package (ncbi_blast_plus), but were then handled as individual Tool Shed packages:

However, in line with current Galaxy policy, hereafter we assume Galaxy will be fetching packages from Conda, specifically the BioConda channel:

Note all of these packages target the NCBI's C++ rewrite of BLAST called BLAST+, available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/ -- we do not support the now deprecated "legacy" BLAST suite written in C, still available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/

History

These Galaxy BLAST+ wrappers were originally written by Peter Cock and were incorporated into https://bitbucket.org/galaxy/galaxy-central/ the main Galaxy repository on BitBucket in September 2009, where they were maintained via pull requests and patches from Peter's repository fork: https://bitbucket.org/galaxy/galaxy-central/

In August 2012 Dannon Baker from the Galaxy Team migrated the BLAST+ tools and datatypes to the Galaxy Tool Shed (links above). This was part of a long term plan to move most tools out of the main Galaxy repository. Development of the wrappers continued on the 'tools' branch of Peter's Galaxy fork on BitBucket with additional contributions via patches and pull requests.

Recognising the growing number of potential contributors, an informal "Birds of a Feather" (BoF) http://wiki.galaxyproject.org/Events/GCC2013/BoF/GalaxyBlast meeting was held in July 2013 during the annual Galaxy Community Conference. It was agreed to move the code into a dedicated Git repository on GitHub, with the goal of giving the project a clearer identity and making it easier for Peter to manage.

Other Galaxy BLAST tools

This repository also contains other BLAST related Galaxy tools, some already available on the Galaxy Tool Shed:

Any development test releases are on the Test Tool Shed, for example:

Folder Structure

Within the tools folder is one folder for each Tool or Tool Suite released on the Galaxy Tool Shed (these child folder names match the Tool Shed names).

Similarly, packages contains packages for Galaxy Tool Shed dependency definitions, datatypes contains packages for Galaxy Tool Shed datatype definitions, and data_managers contains Galaxy Data Managers for tasks like setting up local copies of NCBI BLAST databases (currently unfinished).

All of these child folders contain additional README files, which cover things like how to install each tool manually or via the Galaxy Tool Shed.

Additionally there is a shared test-data folder used for functional test sample data, and a shared tool-data folder used for configuration files.

Installation

The individual Galaxy tools (under the tools/ folder as described above) must be installed into a Galaxy instance for use. In general the easiest and recommended way to do this is via the Galaxy Tool Shed which should handle the dependencies for you. However, manual installation is possible as described in the README file of each tool.

Binary dependencies like NCBI BLAST+ have been packaged for the Galaxy Tool Shed (links given above), and installing via the Tool Shed allow multiple versions to be available under Galaxy's control for full reproducibility. If you opt to install the NCBI BLAST+ simply on the system $PATH outside of Galaxy's control, you are giving up full reproducibility as Galaxy has no control over which version of BLAST+ will be run.

If you wish to use pre-existing BLAST databases, either local to your institute or from the NCBI BLAST databases FTP site, they must currently be managed by the Galaxy Administrator manually via the blastdb*.loc configuration files. In many cases, your system administrator may already have automatically updated NCBI BLAST database available centrally. In this case, telling Galaxy to use these is a simple solution, but gives up full reproducibilty as there is only a single "live" version of each database.

Note that individual Galaxy users may also create their own databases within Galaxy from FASTA files using the makeblastdb wrapper.

Testing

Most of these Galaxy tools include a <tests> section in the tool XML files, which defines one or more functional tests - listing sample input files and user parameters, along with the expected output. If you install the tools, you can run these tests via Galaxy's run_tests.sh script - and/or do this automatically if installing the tools via the Tool Shed. See the README file for each tool for more details.

The Galaxy team run regular tests on all the tools which have been uploaded to the main Tool Shed and the Test Tool Shed, simulating how they would behave in a local Galaxy instance once installed via the Tool Shed.

In addition we are running the same functional tests via TravisCI whenever this GitHub repository is updated:

Current status of TravisCI build for master branch

This TravisCI integration simulates a manual install of these Galaxy Tools and their dependencies. See the special .travis.yml file for more technical details.

Bug Reports

You can file an issue here https://github.com/peterjc/galaxy_blast/issues or ask us on the Galaxy development list http://lists.bx.psu.edu/listinfo/galaxy-dev

Citation

There should be more specific guidance in the README file of each folder, and in the user-facing help text within the each Galaxy tool. In general, please cite the following paper:

NCBI BLAST+ integrated into Galaxy. P.J.A. Cock, J.M. Chilton, B. Gruening, J.E. Johnson, N. Soranzo. GigaScience 2015, 4:39 https://doi.org/10.1186/s13742-015-0080-7

In most cases, you should also cite the NCBI BLAST+ tools:

BLAST+: architecture and applications. C. Camacho et al. BMC Bioinformatics 2009, 10:421. https://doi.org/10.1186/1471-2105-10-421

License

Please see the README file in each folder, but by default the MIT license is being used.

About

Galaxy wrappers for NCBI BLAST+ and related BLAST tools.


Languages

Language:Python 75.8%Language:Shell 12.7%Language:HTML 11.3%Language:TeX 0.1%