RekerLab / YokedLearning

:woman_teacher: :arrow_right: :woman_student: Active learning teaching other models

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

YokedLearning4

Yoked machine learning utilizes a teacher model to guide a student model. We provide an example pipeline to evaluate yoked learning performance on both classical (part 1) and deep (part 2) machine learning models.

Benchmarking Datasets

  • Therapeutics Data Commons

    • ADME: Pharmaco-kinetics (from tdc.single_pred import ADME)
      • CYP2C9 Substrate, Carbon-Mangels et al.
        • data = ADME(name = 'CYP2C9_Substrate_CarbonMangels')
      • CYP2D6 Substrate, Carbon-Mangels et al.
        • data = ADME(name = 'CYP2D6_Substrate_CarbonMangels')
      • CYP3A4 Substrate, Carbon-Mangels et al.
        • data = ADME(name = 'CYP3A4_Substrate_CarbonMangels')
      • HIA (Human Intestinal Absorption), Hou et al.
        • data = ADME(name = 'HIA_Hou')
      • Pgp (P-glycoprotein) Inhibition, Broccatelli et al.
        • data = ADME(name = 'Pgp_Broccatelli')
      • Bioavailability, Ma et al.
        • data = ADME(name = 'Bioavailability_Ma')
    • Tox: Toxicity (from tdc.single_pred import Tox)
      • hERG blockers, Wang et al.
        • data = Tox(name = 'hERG')
      • DILI (Drug Induced Liver Injury), Xu et al.
        • data = Tox(name = 'DILI')
      • Skin Reaction, Alves et al.
        • data = Tox(name = 'Skin Reaction')
      • Carcinogens, Lagunin, et al.
        • data = Tox(name = 'Carcinogens_Lagunin')
      • Clintox, Gayvert, et al.
        • data = Tox(name = 'ClinTox')
    • HTS: High-Throughput Screening (from tdc.single_pred import HTS)
  • MoleculeNet

    • BACE: Quantitative (IC50) and qualitative (binary label) binding results for a set of inhibitors of human β-secretase 1(BACE-1)
    • BBBP: Binary labels of blood-brain barrier penetration(permeability)

Dependencies

Files

Part 1: Classic Yoked Learning

  • Code and functions to evaluate yoked learning with classical machine learning models (random forest, naive bayes and logistic regression).
    • yoked_machine_learning_pipeline.py contains functions for evaluating yoked learning
    • yoked_learning_main.py contains the main function to run yoked learning
    • example boxplot/lineplot.ipynb contains an example notebook that visualize comparisons between yoked learning, active learning, and passive learning

Part 2: Deep Yoked Learning

  • Code and functions to evaluate yoked learning with deep learning models (MLP).
    • Implementation methods include MolALKit, default MLP parameters (ffn_num_layers: 2, ffn_hidden_size: 300, dropout: 0, batch_size:50) or optimized MLP parameters chemprop
    • Single_dataset_comparison.ipynb contains an example notebook that visualizes the output comparisons
  • Please refer to MolALKit for details about Deep Yoked Learning implementations
  • Example implementation after data split:
molalkit_run --data_public bace --metrics roc-auc --learning_type explorative --model_config_selector RandomForest_RDKitNorm_Config \
    --split_type scaffold_order --split_sizes 0.5 0.5 --evaluate_stride 100 --seed 0 --save_dir bace_rf_yoked_mlp --n_jobs 4 \
    --model_config_evaluators MLP_RDKitNorm_BinaryClassification_Config

About

:woman_teacher: :arrow_right: :woman_student: Active learning teaching other models


Languages

Language:Jupyter Notebook 98.9%Language:Python 1.1%