futianfan / pyscreener

pythonic interface to virtual screening software

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

codecov CI Documentation Status

pyscreener

A pythonic interface to high-throughput virtual screening software

Overview

This repository contains the source of pyscreener, both a library and software for conducting HTVS via python calls

Table of Contents

Requirements

  • python>=3.6
  • all virtual screening software and openbabel located on your PATH
  • (if using DOCK6-based HTVS) the location of the DOCK6 parent directory in your environment variables

adding an executable to your PATH

pyscreener is not a virtual screening software in itself. Rather, it is a wrapper around common VS software to enable a simple and common interface between them without having to learn the ins and outs of the preparation+simulation pipeline for each different software. With that in mind, it is up to the user to install the appropriate virtual screening software and place them on their PATH.

After you have downloaded the appropriate software relevant to preparing inputs for and actually running your VS software of choice, you have two options:

  1. append the directory containing these software to your bin via the command export PATH=$PATH:<dir>, where is the directory containing the software in question.
  2. alternatively, and perhaps easier, is to copy the software to a directory that is already on your path. Typically this will be either ~/bin or ~/.local/bin. To see what directories are currently on your path, type echo $PATH. There will typically be a lot of directories on your path, and it is best to avoid creating files in any directory above your home directory ($HOME on most *nix-based systems)

specifying an environment variable

Due to some wonkiness with getting DOCK6 to work with pyscreener, it requires that the DOCK6 environment variable be set with the location of the DOCK6 parent folder (the folder that is unpacked after downloading the original zip file and contains both the bin and parameters subdirectories.) To set the environment variable, enter the following command: export DOCK6=<path/to/dock6>. (note: this environment vairable must always be set before running pyscreener, so it's probably best to place this inside your .bashrc or .bash_profile)

Note: both of the above steps must be satisifed before using pyscreener

To avoid having to do this every time you start a new shell, you can add whatever commands you typed to your respective shell's startup file (e.g., .bash_profile for a bash shell) (you can also add them to the non-login shell startup file, but it's not good a idea to recursively edit your PATH in these files)

Installation

The first step in installing pyscreener is to clone this repository: git clone <this_repo>

The easiest way to install all dependencies is to use conda along with the supplied environment.yml file, but you may also install them manually, if desired. All libraries listed in that file are required before using pyscreener

virtual environment setup via conda

  1. (if necessary) install conda
  2. cd /path/to/pyscreener
  3. conda env create -f environment.yml

Before running pyscreener, be sure to first activate the environment: conda activate pyscreener

external software

  • vina-type software
    1. install ADFR Suite for receptor preparation
    2. install any of the following docking software: vina, qvina2, smina, psovina and ensure the desired software is located on your path
  • DOCK6
    1. install DOCK6 and specify the DOCK6 environment variable as the path of the parent folder (the one containing bin, install, etc.) as detailed above
    2. install sphgen_cpp and place the executable inside the bin subdirectory of the DOCK6 parent directory
    3. install chimera and place the executable on your PATH (either by moving the exectuable to a folder already on your PATH or adding the folder containing the exectuable to your PATH)

Running pyscreener as a software

pyscreener was designed to have a minimal interface under the principal that a high-throughput virtual screen is intended to be a broad strokes technique to gauge ligand favorability. With that in mind, all one really needs to get going are the following:

  • the PDB id of your receptor of interest or a PDB format file of the specific structure
  • a file containing the ligands you would like to dock, in SDF, SMI, or CSV format
  • the coordinates of your docking box (center + size), a PDB format file containing the coordinates of a previously bound ligand, or a numbered list of residues from which to construct the docking box (e.g., [42, 64, 117, 169, 191])

There are a variety of other options you can specify as well (including how to score a ligand given that multiple scored conformations are output, how many times to repeatedly dock a given ligand, etc.) To see all of these options and what they do, use the following command: python run.py --help

All of these options may be specified on the command line, but they may also be placed in a configuration file that accepts YAML, INI, and argparse syntaxes. Example configuration files are located in test_configs. Assuming everything is working and installed properly, you can run any of these files via the following command: python run.py --config test_configs/<config>

Using pyscreener as a library

At the core of the pyscreener software is the pyscreener library that enables the running of docking software from input preparation all the way to output file parsing. The workhorse class is the Screener ABC, which handles all of this for a user. To actually initialize a screener object, either of the derived classes: Vina or DOCK. Vina is the Screener class for performing docking simulations using any software derived from AutoDock Vina and accepts the software keyword argument to its initializer. Currently, the list of supported Vina-type software is as follows: AutoDock Vina, Smina, QVina2, and PSOVina. DOCK is the Screener class for performing DOCKing using the DOCK software from UCSF. The input preparation pipeline for this software is a little more involved, so we encourage readers to look at the file to see what these additional parameters are.

Copyright

Copyright (c) 2020, david graff

Acknowledgements

Project based on the Computational Molecular Science Python Cookiecutter version 1.5.

About

pythonic interface to virtual screening software

License:MIT License


Languages

Language:Python 99.8%Language:Shell 0.2%