This notebook is a collection of examples for processing the MPro fragment screening virutal screening follow up data.
This is not supposed to be the definitive way to approach this, just a collection of suggestions.
See these links for background on the project:
- https://www.diamond.ac.uk/covid-19/for-scientists/Main-protease-structure-and-XChem.html
- https://covid19.galaxyproject.org/cheminformatics
If running a local Jupyter environment create a conda environment:
conda env create -f environment.yml
Then activate the environment:
conda activate jupyter-xchem
To get NGLViewer working you might need to run this:
jupyter-nbextension enable nglview --py --sys-prefix
Then start Jupyter:
jupyter notebook
The following notebooks may be of interest:
- Score_distrbutions.ipynb - initial playground with a hotch-potch of appraoches that are used in the main notebooks.
- 1_DataPrep.ipynb - initial data merging and preparation.
- 2_InititalDataAnalysis.ipynb - basic analysis of the results.
- 3_AugmentationAndFiltering.ipynb - augmentation and filtering of the results.
See also:
- FeatureStein - scoring overlap of poses with original fragment screening hits using RDKit feature maps.
The following datasets may be of interest:
- Mpro_16_data.sdf.gz - SD file containing the output of the 1_DataPrep notebook
- Mpro_16_data.smi.gz - file with the SMILES from the output from the 1_DataPrep notebook
The follow datasets have been provided to supplement the data with ADMET data
data/enalos/16/Enalos_data.csv.gz
- data provided by NovaMechanics Ltd through Enalos Suite (analysis here)data/prosilico/16/predictions.csv.gz
- data provided by Prosilico (analysis here)data/marionegri/16/EPA_tox_class_1_to_20000.txt.gz
- EPA tox class predictions generated by The Mario Negri IRCCS Institute (analysis here)
These are intended to be used in the 3_AugmentationAndFiltering notebook.
If you are wanting to generate data that can be used in the process of selecting compounds (see the above data for examples) you should use datasets 1 or 2 and PLEASE make sure the SMILES string (title line of the SDF) is included in your data so that it can be merged into the main data.
This data is continually being updated. We try to keep this README up to date.