jtd1g16 / HLC_Prediction

Prediction of Henry’s Law Constants using descriptors calculated from simple molecular representations

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HLC_Predictor

Computational Chemistry Masters Project - University of Southampton A set of Jupyter notebooks illustrating a Henry's law constant (HLC) predictive model, starting from a species' SMILES string.

The compilation of HLCs used in this project was created by R. Sander, the paper published is available here.

The CAS reference numbers in the compilation were used to create SMILES strings (via cirpy). These were in turn passed through DRAGON or a series of RDkit functions to calculate molecular descriptors.

Supervised machine learning algorithms were trained (using the calculated descriptors labelled with their molecules' HLCs) to predict the constants.

  • 7 ML algorithms
  • 4 feature selection methods
  • 6 sets of descriptors

Dependancies

  • Jupyter notebooks, with the following python packages installed:
    • pandas (data structures)
    • numpy (maths)
    • statsmodels.api (stats)
    • cirpy (conversion between chemical identifiers)
    • ipywidgets and IPython.display (widgets and nicer outputs)
    • RDKit (descriptors)
    • matplotlib.pyplot (visualisation)
    • scikit-learn (models, feature selection, PCA)
    • joblib (saving python objects)
    • mpld3 (hover-over labels for plots)
  • DRAGON 6 (not within python, external software for descriptor calculation)

About

Prediction of Henry’s Law Constants using descriptors calculated from simple molecular representations


Languages

Language:Fortran 88.8%Language:Jupyter Notebook 7.4%Language:Gnuplot 1.6%Language:TeX 1.0%Language:HTML 1.0%Language:Perl 0.1%Language:Makefile 0.0%