Lee-CBG / ActiveTCR

Official Tensorflow implementation for ActiveTCR: Active Learning Framwork for Cost-Effective TCR-Epitope Binding Affinity Prediction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ActiveTCR: An Active Learning Framework for Cost-Effective TCR-Epitope Binding Affinity Prediction

ActiveTCR is a unified framework designed to minimize the annotation cost and maximize the predictive performance of T-cell receptor (TCR) and epitope binding affinity prediction models. It incorporates active learning techniques to iteratively search for the most informative unlabeled TCR-epitope pairs, reducing annotation costs and redundancy. By leveraging four query strategies and comparing them to a random sampling baseline, ActiveTCR demonstrates significant cost reduction and improved performance in TCR-epitope binding affinity prediction. ActiveTCR is the first systematic investigation of data optimization in the context of TCR-epitope binding affinity prediction.

Overview

Publication

An Active Learning Framework for Cost-Effective TCR-Epitope Binding Affinity Prediction
Pengfei Zhang1,2, Seojin Bang3, Heewook Lee1,2, *
1 School of Computing and Augmented Intelligence, Arizona State University, 2 Biodesign Institute, Arizona State University, 3 Google DeepMind
Accepted for publication: IEEE BIBM 2023

Paper | Code | Poster | Slides | Presentation (YouTube)

Major results of ActiveTCR

  1. Use case a: reducing more than 40% annotation cost for unlabel TCR-epitope pools.

Overview


  1. Use case b: minimizing more than 40% redundancy among already annotated TCR-epitope pairs.

Overview


Dependencies

  • Linux
  • Python 3.6.13
  • Keras 2.6.0
  • TensorFlow 2.6.0

Steps to train a Binding Affinity Prediction model for TCR-epitope pairs.

1. Clone the repository

git clone https://github.com/Lee-CBG/ActiveTCR
cd ActiveTCR/
conda create --name bap python=3.6.13
pip install -r requirements.txt
source activate bap

2. Prepare TCR-epitope pairs for training and testing

  • Download training and testing data from datasets folder.
  • Obtain embeddings for TCR and epitopes following instructions of catELMo. Or directly download embeddings from Dropbox.

3. Train and test models

An example for use case a of ActiveTCR: reducing annotation cost for unlabeled TCR-epitope pools.

python -W ignore main.py \
                --split epi \
                --active_learning True \
                --query_strategy entropy_sampling \
                --train_strategy retrain \
                --query_balanced unbalanced \
                --gpu 0 \
                --run 0

An example for use case b of ActiveTCR: minimizing redundancy among labeled TCR-epitope pairs.

python -W ignore main.py \
                --split epi \
                --query_strategy entropy_sampling \
                --train_strategy retrain \
                --query_balanced unbalanced \
                --gpu 1 \
                --run 0

Citation

If you use this code or use our catELMo for your research, please cite our paper:


License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

About

Official Tensorflow implementation for ActiveTCR: Active Learning Framwork for Cost-Effective TCR-Epitope Binding Affinity Prediction

License:Other


Languages

Language:Python 100.0%