This is the official repository containing the code to reproduce result founded on a research paper titled MHCSeqNet2 - Improved Peptide-Class I MHC Binding Prediction for Alleles with Low Data
- How to prepare Environment
- How to Inference
- How to Train Prediction Model
- How to Train Pre-Training Model
- Data Preparation
- Dataset References
To avoid confusion, I accidentally set the training label in reverse order.
This resulted in prediction 0
means bind
while 1
means not bind
.
And isGenerated
Column is reversed order as well.
I want to express my sincere apology here, if there is a new version and the issue has been resolved, I'll announce it.
- Clone this repository
git clone https://github.com/cmb-chula/MHCSeqNet2.git
-
Edit docker-compose.yaml to change the volume mount location to suit your case.
-
Use this command to create container
docker-compose up -d --build
- Then you are free to access the container using exec
docker exec -it mhcseqnet2_dev-mhcseqnet2_1 bash
- First you must obtain/train the prediction model weight
mkdir -p resources/trained_weight/
wget -c https://github.com/cmb-chula/MHCSeqNet2/releases/download/v1.0/final_model.tar.gz -O - | tar -xz -C resources/trained_weight/
# By default mhctool.py will uses final_model_with_smsnetdata as its weight
wget -c https://github.com/cmb-chula/MHCSeqNet2/releases/download/v1.0/final_model_with_smsnetdata.tar.gz -O - | tar -xz -C resources/trained_weight/
- You can use file
mhctool.py
to view the usage and available options**.
**For real application it's recommended to always include --USE_ENSEMBLE
flag**
$ python mhctool.py --help
usage: mhctool.py [-h] [--MODE {CSV,CROSS}] [--CSV_PATH CSV_PATH] [--PEPTIDE_COLUMN_NAME PEPTIDE_COLUMN_NAME] [--ALLELE_COLUMN_NAME ALLELE_COLUMN_NAME] [--PEPTIDE_PATH PEPTIDE_PATH]
[--ALLELE_PATH ALLELE_PATH] [--IGNORE_UNKNOW] [--LOG_UNKNOW] [--LOG_UNKNOW_PATH LOG_UNKNOW_PATH] [--GPU_ID GPU_ID] [--USE_ENSEMBLE]
[--MODEL_TYPE {MHCSeqNet2,MHCSeqNet2_GRUPeptide,GloVeFastText,MultiHeadGloVeFastTextSplit,MultiHeadGloVeFastTextJointed}] [--ALLELE_MAPPER_PATH ALLELE_MAPPER_PATH]
[--OUTPUT_DIRECTORY OUTPUT_DIRECTORY] [--TEMP_FILE_PATH TEMP_FILE_PATH] [--SUPPRESS_LOG]
MHCTool
optional arguments:
-h, --help show this help message and exit
--MODE {CSV,CROSS} Mode `CSV` or `CROSS` Select the mode to run, the tool will execute based on current selection • csv mode allow you to choose a csv/tsv file which must contain the column
for peptide and allele • cross mode allow you to choose two files one containing peptides, and the other containing alleles which will be crossed together
--CSV_PATH CSV_PATH path directory to input csv when use `--MODE CSV`
--PEPTIDE_COLUMN_NAME PEPTIDE_COLUMN_NAME
the column name which containing the peptide
--ALLELE_COLUMN_NAME ALLELE_COLUMN_NAME
the column name which containing the allele
--PEPTIDE_PATH PEPTIDE_PATH
path directory to input peptide when use `--MODE CROSS`
--ALLELE_PATH ALLELE_PATH
path directory to input allele when use `--MODE CROSS`
--IGNORE_UNKNOW if setted it will skip the unknown
--LOG_UNKNOW if setted it will log the unknown that was skipped
--LOG_UNKNOW_PATH LOG_UNKNOW_PATH
the file which the unknow will be logged to
--MODEL_KF MODEL_KF specify model weight to use, if not using ensemble
--GPU_ID GPU_ID default GPU, you can specify a GPU to be used by given a number i.e, `--GPU_ID 0`
--USE_ENSEMBLE Run the result multiple times on multiple models and use the average as the score
--MODEL_TYPE {MHCSeqNet2,MHCSeqNet2_GRUPeptide,GloVeFastText,MultiHeadGloVeFastTextSplit,MultiHeadGloVeFastTextJointed}
specify model to use
--ALLELE_MAPPER_PATH ALLELE_MAPPER_PATH
path to the folder that contain yaml file needed for the tool. You can use this to add a new allele, please visit readme for more
--OUTPUT_DIRECTORY OUTPUT_DIRECTORY
where to save the final result to (only .csv or .tsv)
--TEMP_FILE_PATH TEMP_FILE_PATH
path to intermediate result file to maintain system compatibility and stability, the program need to store intermediate result.
--SUPPRESS_LOG use to suppress log, useful only for running from gui
Normally, after training has completed, train.py
will predict the result of each CV on its test set.
But to reproduce the result, one could use the following steps.
-
Edit model weight to limit to publicly available data at file
mhctool.py
-
Use the following commands
python mhctool.py \
--MODE CSV \
--CSV_PATH "resources/datasets/MSI011320/HLA_classI_MS_dataset_011320_processed_kf-1_test.csv" \
--IGNORE_UNKNOW \
--MODEL_KF 0 \
--PEPTIDE_COLUMN_NAME Peptide \
--ALLELE_COLUMN_NAME Allele \
--GPU_ID 0 \
--ALLELE_MAPPER_PATH resources/allele_mapper \
--OUTPUT_DIRECTORY "/tmp/prediction_result/HLA_classI_MS_dataset_011320_processed_kf-1_test_raw.csv" \
--TEMP_FILE_PATH "/tmp/prediction_result/_tmp_HLA_classI_MS_dataset_011320_processed_kf-1_test_raw.csv"
python mhctool.py \
--MODE CSV \
--CSV_PATH "resources/datasets/MSI011320/HLA_classI_MS_dataset_011320_processed_kf-2_test.csv" \
--IGNORE_UNKNOW \
--MODEL_KF 1 \
--PEPTIDE_COLUMN_NAME Peptide \
--ALLELE_COLUMN_NAME Allele \
--GPU_ID 0 \
--ALLELE_MAPPER_PATH resources/allele_mapper \
--OUTPUT_DIRECTORY "/tmp/prediction_result/HLA_classI_MS_dataset_011320_processed_kf-2_test_raw.csv" \
--TEMP_FILE_PATH "/tmp/prediction_result/_tmp_HLA_classI_MS_dataset_011320_processed_kf-2_test_raw.csv"
python mhctool.py \
--MODE CSV \
--CSV_PATH "resources/datasets/MSI011320/HLA_classI_MS_dataset_011320_processed_kf-3_test.csv" \
--IGNORE_UNKNOW \
--MODEL_KF 2 \
--PEPTIDE_COLUMN_NAME Peptide \
--ALLELE_COLUMN_NAME Allele \
--GPU_ID 0 \
--ALLELE_MAPPER_PATH resources/allele_mapper \
--OUTPUT_DIRECTORY "/tmp/prediction_result/HLA_classI_MS_dataset_011320_processed_kf-3_test_raw.csv" \
--TEMP_FILE_PATH "/tmp/prediction_result/_tmp_HLA_classI_MS_dataset_011320_processed_kf-3_test_raw.csv"
python mhctool.py \
--MODE CSV \
--CSV_PATH "resources/datasets/MSI011320/HLA_classI_MS_dataset_011320_processed_kf-4_test.csv" \
--IGNORE_UNKNOW \
--MODEL_KF 3 \
--PEPTIDE_COLUMN_NAME Peptide \
--ALLELE_COLUMN_NAME Allele \
--GPU_ID 0 \
--ALLELE_MAPPER_PATH resources/allele_mapper \
--OUTPUT_DIRECTORY "/tmp/prediction_result/HLA_classI_MS_dataset_011320_processed_kf-4_test_raw.csv" \
--TEMP_FILE_PATH "/tmp/prediction_result/_tmp_HLA_classI_MS_dataset_011320_processed_kf-4_test_raw.csv"
python mhctool.py \
--MODE CSV \
--CSV_PATH "resources/datasets/MSI011320/HLA_classI_MS_dataset_011320_processed_kf-5_test.csv" \
--IGNORE_UNKNOW \
--MODEL_KF 4 \
--PEPTIDE_COLUMN_NAME Peptide \
--ALLELE_COLUMN_NAME Allele \
--GPU_ID 0 \
--ALLELE_MAPPER_PATH resources/allele_mapper \
--OUTPUT_DIRECTORY "/tmp/prediction_result/HLA_classI_MS_dataset_011320_processed_kf-5_test_raw.csv" \
--TEMP_FILE_PATH "/tmp/prediction_result/_tmp_HLA_classI_MS_dataset_011320_processed_kf-5_test_raw.csv"
-
Edit model weight to limit to SMSNet data (If you hadn't edited anything yet, there's nothing to change) at file
mhctool.py
-
Use the following commands
python mhctool.py \
--MODE CSV \
--CSV_PATH "resources/datasets/MSI011320_ANTI051821Z_COMBINE/HLA_classI_MS_dataset_011320_antigen_information_051821_rev1_processed_kf-1_test.csv" \
--IGNORE_UNKNOW \
--MODEL_KF 0 \
--PEPTIDE_COLUMN_NAME Peptide \
--ALLELE_COLUMN_NAME Allele \
--GPU_ID 0 \
--ALLELE_MAPPER_PATH resources/allele_mapper \
--OUTPUT_DIRECTORY "/tmp/prediction_result/HLA_classI_MS_dataset_011320_antigen_information_051821_rev1_processed_kf-1_test.csv" \
--TEMP_FILE_PATH "/tmp/prediction_result/_tmp_HLA_classI_MS_dataset_011320_antigen_information_051821_rev1_processed_kf-1_test.csv"
python mhctool.py \
--MODE CSV \
--CSV_PATH "resources/datasets/MSI011320_ANTI051821Z_COMBINE/HLA_classI_MS_dataset_011320_antigen_information_051821_rev1_processed_kf-2_test.csv" \
--IGNORE_UNKNOW \
--MODEL_KF 1 \
--PEPTIDE_COLUMN_NAME Peptide \
--ALLELE_COLUMN_NAME Allele \
--GPU_ID 0 \
--ALLELE_MAPPER_PATH resources/allele_mapper \
--OUTPUT_DIRECTORY "/tmp/prediction_result/HLA_classI_MS_dataset_011320_antigen_information_051821_rev1_processed_kf-2_test.csv" \
--TEMP_FILE_PATH "/tmp/prediction_result/_tmp_HLA_classI_MS_dataset_011320_antigen_information_051821_rev1_processed_kf-2_test.csv"
python mhctool.py \
--MODE CSV \
--CSV_PATH "resources/datasets/MSI011320_ANTI051821Z_COMBINE/HLA_classI_MS_dataset_011320_antigen_information_051821_rev1_processed_kf-3_test.csv" \
--IGNORE_UNKNOW \
--MODEL_KF 2 \
--PEPTIDE_COLUMN_NAME Peptide \
--ALLELE_COLUMN_NAME Allele \
--GPU_ID 0 \
--ALLELE_MAPPER_PATH resources/allele_mapper \
--OUTPUT_DIRECTORY "/tmp/prediction_result/HLA_classI_MS_dataset_011320_antigen_information_051821_rev1_processed_kf-3_test.csv" \
--TEMP_FILE_PATH "/tmp/prediction_result/_tmp_HLA_classI_MS_dataset_011320_antigen_information_051821_rev1_processed_kf-3_test.csv"
python mhctool.py \
--MODE CSV \
--CSV_PATH "resources/datasets/MSI011320_ANTI051821Z_COMBINE/HLA_classI_MS_dataset_011320_antigen_information_051821_rev1_processed_kf-4_test.csv" \
--IGNORE_UNKNOW \
--MODEL_KF 3 \
--PEPTIDE_COLUMN_NAME Peptide \
--ALLELE_COLUMN_NAME Allele \
--GPU_ID 0 \
--ALLELE_MAPPER_PATH resources/allele_mapper \
--OUTPUT_DIRECTORY "/tmp/prediction_result/HLA_classI_MS_dataset_011320_antigen_information_051821_rev1_processed_kf-4_test.csv" \
--TEMP_FILE_PATH "/tmp/prediction_result/_tmp_HLA_classI_MS_dataset_011320_antigen_information_051821_rev1_processed_kf-4_test.csv"
python mhctool.py \
--MODE CSV \
--CSV_PATH "resources/datasets/MSI011320_ANTI051821Z_COMBINE/HLA_classI_MS_dataset_011320_antigen_information_051821_rev1_processed_kf-5_test.csv" \
--IGNORE_UNKNOW \
--MODEL_KF 4 \
--PEPTIDE_COLUMN_NAME Peptide \
--ALLELE_COLUMN_NAME Allele \
--GPU_ID 0 \
--ALLELE_MAPPER_PATH resources/allele_mapper \
--OUTPUT_DIRECTORY "/tmp/prediction_result/HLA_classI_MS_dataset_011320_antigen_information_051821_rev1_processed_kf-5_test.csv" \
--TEMP_FILE_PATH "/tmp/prediction_result/_tmp_HLA_classI_MS_dataset_011320_antigen_information_051821_rev1_processed_kf-5_test.csv"
-
Edit
scripts/make_figure_auc_full_vs_few_zoom.py
file in sectionKFOLD_RESULT_PATH
to match with your model path
KFOLD_RESULT_PATH: typing.List[typing.Tuple[str, str, str, str, bool, bool, str]] = [
('ExperimentalResult', 'this work', 'Prediction', 'isGenerated', True, True, 'resources/trained_weight/final_model'),
]
- Run make figure script
python scripts/make_figure_auc_full_vs_few_zoom.py
- Visit Data Preparation
- Obtain pre-train weight or train the pre-train model
- Run the following commands to start training
Please note that each fold can be trained simultaneously
python train.py \
--dataset=MSI011320 \
--root_dir=resources/datasets \
--run_kfold 1 \
--load_embedding_peptide \
--load_embedding_allele \
--embedding_allele_path=resources/trained_weight/embedding-3d/central_embeddings_matrix.npy \
--save_path=resources/trained_weight/final_model \
--experiment_name=final_model \
--epoch 420 \
--early_stop_patience 150 \
--batch_size_train=256 \
--batch_size_test=256
python train.py \
--dataset=MSI011320 \
--root_dir=resources/datasets \
--run_kfold 2 \
--load_embedding_peptide \
--load_embedding_allele \
--embedding_allele_path=resources/trained_weight/embedding-3d/central_embeddings_matrix.npy \
--save_path=resources/trained_weight/final_model \
--experiment_name=final_model \
--epoch 420 \
--early_stop_patience 150 \
--batch_size_train=256 \
--batch_size_test=256
python train.py \
--dataset=MSI011320 \
--root_dir=resources/datasets \
--run_kfold 3 \
--load_embedding_peptide \
--load_embedding_allele \
--embedding_allele_path=resources/trained_weight/embedding-3d/central_embeddings_matrix.npy \
--save_path=resources/trained_weight/final_model \
--experiment_name=final_model \
--epoch 420 \
--early_stop_patience 150 \
--batch_size_train=256 \
--batch_size_test=256
python train.py \
--dataset=MSI011320 \
--root_dir=resources/datasets \
--run_kfold 4 \
--load_embedding_peptide \
--load_embedding_allele \
--embedding_allele_path=resources/trained_weight/embedding-3d/central_embeddings_matrix.npy \
--save_path=resources/trained_weight/final_model \
--experiment_name=final_model \
--epoch 420 \
--early_stop_patience 150 \
--batch_size_train=256 \
--batch_size_test=256
python train.py \
--dataset=MSI011320 \
--root_dir=resources/datasets \
--run_kfold 5 \
--load_embedding_peptide \
--load_embedding_allele \
--embedding_allele_path=resources/trained_weight/embedding-3d/central_embeddings_matrix.npy \
--save_path=resources/trained_weight/final_model \
--experiment_name=final_model \
--epoch 420 \
--early_stop_patience 150 \
--batch_size_train=256 \
--batch_size_test=256
- Or train model with SMSNet data using the following commands
python train.py \
--dataset=MSI011320_ANTI051821Z_COMBINE \
--root_dir=resources/datasets \
--run_kfold 1 \
--load_embedding_peptide \
--load_embedding_allele \
--embedding_allele_path=resources/trained_weight/embedding-3d/central_embeddings_matrix.npy \
--save_path=resources/trained_weight/final_model_with_smsnetdata \
--experiment_name=final_model_with_smsnetdata \
--epoch 420 \
--early_stop_patience 150 \
--batch_size_train=256 \
--batch_size_test=256
python train.py \
--dataset=MSI011320_ANTI051821Z_COMBINE \
--root_dir=resources/datasets \
--run_kfold 2 \
--load_embedding_peptide \
--load_embedding_allele \
--embedding_allele_path=resources/trained_weight/embedding-3d/central_embeddings_matrix.npy \
--save_path=resources/trained_weight/final_model_with_smsnetdata \
--experiment_name=final_model_with_smsnetdata \
--epoch 420 \
--early_stop_patience 150 \
--batch_size_train=256 \
--batch_size_test=256
python train.py \
--dataset=MSI011320_ANTI051821Z_COMBINE \
--root_dir=resources/datasets \
--run_kfold 3 \
--load_embedding_peptide \
--load_embedding_allele \
--embedding_allele_path=resources/trained_weight/embedding-3d/central_embeddings_matrix.npy \
--save_path=resources/trained_weight/final_model_with_smsnetdata \
--experiment_name=final_model_with_smsnetdata \
--epoch 420 \
--early_stop_patience 150 \
--batch_size_train=256 \
--batch_size_test=256
python train.py \
--dataset=MSI011320_ANTI051821Z_COMBINE \
--root_dir=resources/datasets \
--run_kfold 4 \
--load_embedding_peptide \
--load_embedding_allele \
--embedding_allele_path=resources/trained_weight/embedding-3d/central_embeddings_matrix.npy \
--save_path=resources/trained_weight/final_model_with_smsnetdata \
--experiment_name=final_model_with_smsnetdata \
--epoch 420 \
--early_stop_patience 150 \
--batch_size_train=256 \
--batch_size_test=256
python train.py \
--dataset=MSI011320_ANTI051821Z_COMBINE \
--root_dir=resources/datasets \
--run_kfold 5 \
--load_embedding_peptide \
--load_embedding_allele \
--embedding_allele_path=resources/trained_weight/embedding-3d/central_embeddings_matrix.npy \
--save_path=resources/trained_weight/final_model_with_smsnetdata \
--experiment_name=final_model_with_smsnetdata \
--epoch 420 \
--early_stop_patience 150 \
--batch_size_train=256 \
--batch_size_test=256
For how to train peptide pre-training model, stay tuned!
For now, you could obtain the pre-train embedding from release
- Visit Data Preparation
- Train with the command below
python train.py \
--MODEL_TYPE=GloVeFastText \
--dataset=PRETRAIN_3D \
--save_path=resources/trained_weight/ \
--experiment_name=embedding-3d \
--central2context_path="resources/datasets/PRETRAIN_3D/dist-avg-distance_threshold_45/central2context.yaml" \
--pair_map_counter_path="resources/datasets/PRETRAIN_3D/dist-avg-distance_threshold_45/pair_map_counter.yaml" \
--batch_size_train=256 \
--epoch=50 \
--checkpoint_monitor='acc' \
--reduce_lr_monitor='loss' \
--reduce_lr_patience=2 \
--early_stop_monitor='acc' \
--early_stop_patience=3
-
Set the model weight path insidescripts/extract_embedding.py
to match with your path -
Extract the embedding weight
# python scripts/extract_embedding.py
echo "After the training is completed, central and context embedding weight will be available in the saved model folder"
mkdir -p resources/intermediate_netmhc2/
mkdir -p resources/trained_weight/embedding-3d/
wget -c https://github.com/cmb-chula/MHCSeqNet2/releases/download/v1.0/peptide_central_embedding.tar.gz -O - | tar -xz -C resources/intermediate_netmhc2/
wget -c https://github.com/cmb-chula/MHCSeqNet2/releases/download/v1.0/central_embeddings_matrix.tar.gz -O - | tar -xz -C resources/trained_weight/embedding-3d/
- Obtain raw 3D allele and peptide dataset from release page
mkdir -p resources/datasets/PRETRAIN_HUMAN_PROTEIN/
mkdir -p resources/datasets/PRETRAIN_3D/
wget -c https://github.com/cmb-chula/MHCSeqNet2/releases/download/v1.0/humanProtein_peptide.tar.gz -O - | tar -xz -C resources/datasets/PRETRAIN_HUMAN_PROTEIN/
wget -c https://github.com/cmb-chula/MHCSeqNet2/releases/download/v1.0/raw_3d_dataset.tar.gz -O - | tar -xz -C resources/datasets/PRETRAIN_3D/
- Run prepare script to create dataset
python scripts/prepare_pretraining_human_protein.py
python scripts/prepare_pretraining_3d_allele.py
The first HLA binding dataset (HLA_classI_MS_dataset_011320
) comes from combining several mass spectrometry-based mono-allelic HLA peptidomics studies [1][2][3][4][5] with peptide-HLA pairs curated by the Immune Epitope Database (IEDB[6]). Duplicated peptide-HLA pairs and peptides with modifications were removed. In total, there were 514,928 peptide-HLA pairs across 164 alleles
The second HLA binding dataset (antigen_information_051821_rev1
) was
derived by applying SMSNet, a de novo peptide sequencing
tool, to re-analyze two large mono-allelic HLA peptidomics
datasets [3][4]. This new dataset was recently explored [7]
but has not yet been utilized for HLA binding prediction. In
total, 43,190 new peptide-HLA pairs across 89 alleles with
peptide lengths within 8-15 amino acids were identified.
- Obtain dataset from release page
mkdir -p resources/datasets/raw_datasets/
wget -c https://github.com/cmb-chula/MHCSeqNet2/releases/download/v1.0/HLA_classI_MS_dataset_011320.tar.gz -O - | tar -xz -C resources/datasets/raw_datasets/
wget -c https://github.com/cmb-chula/MHCSeqNet2/releases/download/v1.0/antigen_information_051821_rev1.tar.gz -O - | tar -xz -C resources/datasets/raw_datasets/
- Run prepare script to create dataset
python scripts/prepare.py
[1] M. Di Marco, H. Schuster, L. Backert, M. Ghosh, H.-G. Rammensee, and S. Stevanovi ́c, “Unveiling the Peptide Motifs of HLA-C and HLA-G from Naturally Presented Peptides and Generation of Binding Prediction Matrices,” J Immunol, vol. 199, DOI 10.4049/jimmunol.1700938, no. 8, pp. 2639–2651, Sep. 2017. [Online]. Available: https://doi.org/10.4049/jimmunol.1700938
[2] M. Solleder, P. Guillaume, J. Racle, J. Michaux, H.-S. Pak, M. Müller, G. Coukos, M. Bassani-Sternberg, and D. Gfeller, “Mass Spectrometry Based Immunopeptidomics Leads to Robust Predictions of Phospho- rylated HLA Class I Ligands,” Mol Cell Proteomics, vol. 19, DOI 10.1074/mcp.TIR119.001641, no. 2, pp. 390–404, Dec. 2019. [Online]. Available: https://doi.org/10.1074/mcp.TIR119.001641
[3] J. G. Abelin, D. B. Keskin, S. Sarkizova, C. R. Hartigan, W. Zhang, J. Sidney, J. Stevens, W. Lane, G. L. Zhang, T. M. Eisenhaure, K. R. Clauser, N. Hacohen, M. S. Rooney, S. A. Carr, and C. J. Wu, “Mass Spectrometry Profiling of HLA-Associated Peptidomes in Mono-allelic Cells Enables More Accurate Epitope Prediction,” Immunity, vol. 46, DOI https://doi.org/10.1016/j.immuni.2017.02.007, no. 2, pp. 315–326, Feb. 2017. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1074761317300420
[4] S. Sarkizova, S. Klaeger, P. M. Le, L. W. Li, G. Oliveira, H. Keshishian, C. R. Hartigan, W. Zhang, D. A. Braun, K. L. Ligon, P. Bachireddy, I. K. Zervantonakis, J. M. Rosenbluth, T. Ouspenskaia, T. Law, S. Justesen, J. Stevens, W. J. Lane, T. Eisenhaure, G. Lan Zhang, K. R. Clauser, N. Hacohen, S. A. Carr, C. J. Wu, and D. B. Keskin, “A large peptidome dataset improves HLA class I epitope prediction across most of the human population,” Nature Biotechnology, vol. 38, DOI 10.1038/s41587-019-0322-9, no. 2, pp. 199–209, Feb. 2020. [Online]. Available: https://doi.org/10.1038/s41587-019-0322-9
[5] J. G. Abelin, D. Harjanto, M. Malloy, P. Suri, T. Colson, S. P. Goulding, A. L. Creech, L. R. Serrano, G. Nasir, Y. Nasrul- lah, C. D. McGann, D. Velez, Y. S. Ting, A. Poran, D. A. Rothenberg, S. Chhangawala, A. Rubinsteyn, J. Hammerbacher, R. B. Gaynor, E. F. Fritsch, J. Greshock, R. C. Oslund, D. Barthelme, T. A. Addona, C. M. Arieta, and M. S. Rooney, “Defining HLA-II Ligand Processing and Binding Rules with Mass Spectrometry Enhances Cancer Epitope Prediction,” Immunity, vol. 51, DOI 10.1016/j.immuni.2019.08.012, no. 4, pp. 766–779.e17, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1074761319303632
[6] R. Vita, S. Mahajan, J. A. Overton, S. K. Dhanda, S. Martini, J. R. Cantrell, D. K. Wheeler, A. Sette, and B. Peters, “The Immune Epitope Database (IEDB): 2018 update,” Nucleic Acids Research, vol. 47, DOI 10.1093/nar/gky1006, no. D1, pp. D339–D343, 10 2018. [Online]. Available: https://doi.org/10.1093/nar/gky1006
[7] B. Reynisson, B. Alvarez, S. Paul, B. Peters, and M. Nielsen, “NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data,” Nucleic Acids Research, vol. 48, DOI 10.1093/nar/gkaa379, no. W1, pp. W449–W454, 05 2020. [Online]. Available: https://doi.org/10.1093/nar/gkaa379