Using test-time augmentation to investigate explainable AI: inconsistencies between method, model and human intuition
Stakeholders of machine learning models desire explainable artificial intelligence (XAI) to produce human-understandable and consistent interpretations. In computational toxicity, augmentation of text-based molecular representations has been used successfully for transfer learning on downstream tasks. Augmentations of molecular representations can also be used at inference to compare differences between multiple representations of the same ground-truth. In this study, we investigate the robustness of eight XAI methods using test-time augmentation for a molecular-representation model in the field of computational toxicity prediction. We report significant differences between explanations for different representations of the same ground-truth, and show that randomized models have similar variance. We hypothesize that text-based molecular representations in this and past research reflect tokenization more than learned parameters. Furthermore, we see a greater variance between in-domain predictions than out-of-domain predictions, indicating XAI measures something other than learned parameters. Finally, we investigate the relative importance given to expert-derived structural alerts and find similar importance given irregardless of applicability domain, randomization and varying training procedures. We therefore caution future research to validate their methods using a similar comparison to human intuition without further investigation.
Install pytorch 2.1.1 according to your system
pip install torch torchvision torchaudio
Install registry-factory and aidd-codebase and other requirements
pip install -r requirements.txt
Quick start using the same parameters of the original publication.
python smiles_cleaning.py
To download and clean the Ames dataset from Therapeutics Data Commens with other parameters:
python representation/prepare_data.py experiment=data_cleaning/ames
This creates a ames_cleaned.csv file. In the notebook ames_data_analysis, the various files are created, run that notebook if you wish to remake the original data files.
Then you can train a model with a specific experiment with the following command:
python representation/train.py experiment=xxx
In the experiment you can train various models, including pre-training on chembl with experiment=pretrain/enc_dec/ME2C or transfer-learn with experiment=ames_training/NN/enc_dec/ME2C
- BERT-style
- C2C
- R2C
- E2C
- MC2C
- MR2C
- ME2C
- BART-style
- C2C
- R2C
- E2C
- MC2C
- MR2C
- ME2C
- Transformer CNN
- enc_dec
- enc_only
- Transformer NN
- enc_dec
- enc_only
Awaiting Review...
[1] : Davies, M., Nowotka, M., Papadatos, G., Dedman, N., Gaulton, A., Atkinson, F., Bellis, L., Overington, J.P.: Chembl web services: streamlining access to drug discovery data and utilities. Nucleic acids research 43(W1), 612–620 (2015)
[2] : Mendez, D., Gaulton, A., Bento, A.P., Chambers, J., De Veij, M., F ́elix, E., Mag- ari ̃nos, M.P., Mosquera, J.F., Mutowo, P., Nowotka, M., et al.: Chembl: towards direct deposition of bioassay data. Nucleic acids research 47(D1), 930–940 (2019)
[3] : Xu, C., Cheng, F., Chen, L., Du, Z., Li, W., Liu, G., Lee, P.W., Tang, Y.: In silico prediction of chemical ames mutagenicity. Journal of chemical information and modeling 52(11), 2840–2847 (2012)