lujiarui / Str2Str

Codebase of the paper "Str2Str: A Score-based Framework for Zero-shot Protein Conformation Sampling" (ICLR 2024)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

tica plot is not same as Figure S4

Paulie-ai opened this issue · comments

commented

Hello Jiarui,
Recently i am using mdtraj to extract 1000 frame as reference, and using sampling 1000 frame for all 12 fast folding protein. Specificly, I am using interval to make microseconds MD data to 1000 frame. But the TICA plot is not even close, I used default eval.py and metrics.py, i am very confused about the reason for this results. Can you offer some help to this results? Thanks.
metrics_dev_0318-05-27.csv
tica_1FME_dev_0318-05-27
tica_2F4K_dev_0318-05-27
tica_2JOF_dev_0318-05-27
tica_2WAV_dev_0318-05-27
tica_A3D_dev_0318-05-27
tica_CLN025_dev_0318-05-27
tica_GTT_dev_0318-05-27
tica_lambda_dev_0318-05-27
tica_NTL9_dev_0318-05-27
tica_NuG2_dev_0318-05-27
tica_PRB_dev_0318-05-27
tica_UVF_dev_0318-05-27

Hi @Paulie-ai , for tICA analysis, we select a trajectory with more samples from D. E. Shaw's trajectories (Science 2011 and Science 2010) to ensure the correctness. For fast-folding proteins, we set stride=50 to get a trajectory with more than 10,000 samples. You can use the following code for you extraction and analysis:

mdconvert -t [output_pdb] -o [topology file] [trajectories] -s 50

1000 samples is not enough for tICA method, and using a lag time 20 could be too high (when using 1000 samples).