Requirements

OS: Linux or Mac

This is a standalone package, however it requires python3 and java to run.

Python3 check:

python3 --version

Should return something like:

Python 3.x.y

(otherwise you will need to install it)

Java check:

java --version

Should return something like:

openjdk 11.0.18 2023-01-17
OpenJDK Runtime Environment (build 11.0.18+10-post-Ubuntu-0ubuntu122.04)
OpenJDK 64-Bit Server VM (build 11.0.18+10-post-Ubuntu-0ubuntu122.04, mixed mode, sharing)

(otherwise you will need to install it)

Installation of LignadMapper.py

Once you have python3 and java on your machine, clone the repository or download it on your computer.

Change directory to the folder where you downloaded the zip:

cd {folder}

Unzip file and enter folder:

unzip LigandMapper.py-main.zip

cd LigandMapper.py-main

And run:

python3 installMe.py

During the installation you will be prompted to enter the password for sudo because the file needs to create a link in the /usr/bin/ folder.

After a successful installation you can remove the zip and the LigandMapper.py-main.zip directory.

Manual installation ( in case instalMe.py fails )

In case the installation fails:

You can run the program from the folder itself because it's already an executable with a python3 shebang.
And if you want it accessible to the command line as a command, then:
- choose a directory from your $PATH (echo $PATH) - ex. /usr/bin/
- create a symbolic link there (sudo ln -s ./LigandMapper/LigandMapper.py /usr/bin/)
- then add add permissions to the whole package (sudo chmod -R 777 ./)
- do not move or delete the folder!
```
dir_path=$(find ~ -name 'LigandMapper' -type d)
fileLM=$dir_path"/LigandMapper.py"
sudo ln -s $fileLM /usr/bin/
sudo chmod -R 777 ./
```

Commands

Note: The ouput is to the standard error.

usage: LigandMapper.py [-h] [-l {file}.pdb] [-o {pdb_name}] [-d {direcory_name}] [--local_many [{file1}.pdb {file2}.pdb ...]] [--online_many [{pdb_name1} {pdb_name2} ...]] [-v] [-ch]
                       [-pm]

LigandMapper.py is a python script build to predict ligand binding sites of proteins from their .pdb files.

optional arguments:
  -h, --help            show this help message and exit
  -l {file}.pdb, --local {file}.pdb
                        One pdb local file.
  -o {pdb_name}, --online {pdb_name}
                        Get a pdb file from the pdb server and analyse that.
  -d {direcory_name}, --directory {direcory_name}
                        Analyse all files located in one local directory.
  --local_many [{file1}.pdb {file2}.pdb ...]
                        Analyse many local pdb files.
  --online_many [{pdb_name1} {pdb_name2} ...]
                        Get many pdb files from the pdb server and analyse them.
  -v, --verbose         Get more detailed output of the process to the standard error.
  -ch, --chimera        Open chimera immediately when file is ready to be visualised. Only applies to local and online SINGLE file.
  -pm, --pymol          Open pymol immediately when file is ready to be visualised. Only applies to local and online SINGLE file.

Tutorials

Tutorials can be found here.

Examples

Local

If you have one protein structure pdb file (ex. 1gln.pdb) on your computer and you want to predict the ligand binding pockets, run the --local (-l) method:

LigandMapper.py -l 1gln.pdb

Online

If you know whats the pdb name but you don't have the pdb file downloaded, you can automatically download it and run the analysis with the --online (-o) method:

LigandMapper.py -o 1gln

Many local

If you have many local files that you want to analyse, but they're not exclusively gropued in one directory, you can just list them with the --local_many method:

LigandMapper.py --local_many 1gln.pdb 2ew2.pdb subfol1/1gln.pdb

Many online

If you want to download from the pdb server and analyse many pdb files, list them with the --online_many method:

LigandMapper.py --online_many 1gn3 2ew2 1gln

Visualisation:

The output can directly visualised with the --chimera (-ch) and --pymol (-pm) switch, given that you have them installed on your computer, by including the switch when running the comand. Note: this only woks for the single-file methods (-l (--local) and -o (--online)); for the --directory, --online_many and --local_many you need to open the visualisation cmd files manually (see: Output below).

Ex.

LigandMapper.py -l 1gln.pdb -ch
LigandMapper.py -o 1gln -pm

General structure of the code

Output

A prediction for a file {pdb}.pdb will create the following structure in the folder in which LigandMapper.py was executed.

predict_{pdb}/
├── {pdb}.pdb_predictions.tsv
└── visualizations/
    ├── chimera_{pdb}.cmd
    ├── {pdb}.pdb.pml
    └── data/
        ├── {pdb}.pdb_points.pdb.gz
        └── {pdb}.pdb

TSV file:

The tsv file lists the predicted pockets in order of their score (probability, see theoretical background ). Each pocket has the following attributes:

rank
score
probability
sas_points - (int) number of solvent accessible surface points
surf_atoms - (int) integer of the number of surface atoms
center_x - (float) the predicted pockets x center
center_y - (float) the predicted pockets y center
center_z - (float) the predicted pockets z center
residue_ids - (py dict) the residue sequence numbers that create the pocket { Chain : [ residue sequence numbers ] }
residue_names - (py dict) the residue names that create the pocket { Chain : [ residue names ] }
residue_types - (py dict) the character of the residues that create the pocket { Chain : [ characters ] }
- 'N' represents non-polar amino acids
- 'P' represents polar amino acids
- '+' represents positively charged amino acids
- '-' represents negatively charged amino acids
- '0' a specific residue for which there is no info in the program
surf_atom_ids - (py list) the atom serial number of all the atoms that are on the surface of the pocket

Information from the PDB is taken in this fashion:

Columns	Data	Justification	Data Type
1-4	"ATOM"	left	character
7-11	Atom serial number	right	integer
13-16	Atom name	left*	character
17	Alternate location indicator	-	character
18-20	Residue name	right	character
22	Chain identifier	-	character
23-26	Residue sequence number	right	integer
27	Code for insertions of residues	-	character
31-38	X orthogonal Angstrom coordinate	right	floating
39-46	Y orthogonal Angstrom coordinate	right	floating
47-54	Z orthogonal Angstrom coordinate	right	floating
55-60	Occupancy	right	floating
61-66	Temperature factor	right	floating
73-76	Segment identifier (optional)	left	character
77-78	Element symbol	right	character
79-80	Charge (optional)	-	character

Chimera CMD file

Has neccessary information to create the visualisations in chimera. The pockets are saved as selections titled "Pocket{NUM}" and colored untill the 18th pockets.

The colors are ranked the same in all output, so it can be a visual aid for quick understanding of the pockets' rankings. This is true only for the Chimera file, not for the PyMol, because in PyMol the pockets and their colors can be easily viewed in the side panel.

Color	Rank
red	1
orange	2
yellow	3
green	4
cyan	5
blue	6
medium blue	7
purple	8
hot pink	9
magenta	10
white	11
gray	12
black	13
tan	14
slate gray	15
dark khaki	16
plum	17
rosy brown	18

run :

chimera {path}/predict_{pdb}/visualizations/chimera_{pdb}.cmd

PyMol CMD file

In PyMol the pockets and their colors are conveniently displayed in the side panel. run:

pymol {path}/predict_{pdb}/visualizations/{pdb}.pdb.pml

Theoretical background

The theorethical background can be found here.

Analysis

Analysis of the perfomance can be found here.

References

This software is a lightweight version of p2rank.

Krivak R, Hoksza D. P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure. Journal of Cheminformatics. 2018 Aug.
Capra JA, Laskowski RA, Thornton JM, Singh M, Funkhouser TA. Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure. PLoS
Connolly M. Solvent-accessible surfaces of proteins and nucleic acids. Science. 1983;221(4612):709–13.
Huang B, Schroeder M. LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation. BMC Struct Biol. 2006 Sep 24;6:19.
Krivák R, Hoksza D. Improving protein-ligand binding site prediction accuracy by classification of inner pocket points using local features. J Cheminform. 2015 Apr 1;7:12.
Le Guilloux V, Schmidtke P, Tuffery P. Fpocket: an open source platform for ligand pocket detection. BMC Bioinformatics. 2009 Jun 2;10:168.
RICHARDS, E M. (1977). Ann. Rev. Biophys. Bioeng. 6, 151-176.
Zheng X, Gan L, Wang E, Wang J. Pocket-based drug design: exploring pocket space. AAPS J. 2013 Jan;15(1):228-41.

niqolla / LigandMapper.py