tica plot is not same as Figure S4

Question

tica plot is not same as Figure S4

Paulie-ai opened this issue 5 months ago · comments

Hello Jiarui,
Recently i am using mdtraj to extract 1000 frame as reference, and using sampling 1000 frame for all 12 fast folding protein. Specificly, I am using interval to make microseconds MD data to 1000 frame. But the TICA plot is not even close， I used default eval.py and metrics.py， i am very confused about the reason for this results. Can you offer some help to this results? Thanks.
metrics_dev_0318-05-27.csv

Bozitao Zhong · Answer 1 · Tue Mar 19 2024 22:51:21 GMT+0800 (China Standard Time)

Hi @Paulie-ai , for tICA analysis, we select a trajectory with more samples from D. E. Shaw's trajectories (Science 2011 and Science 2010) to ensure the correctness. For fast-folding proteins, we set stride=50 to get a trajectory with more than 10,000 samples. You can use the following code for you extraction and analysis:

mdconvert -t [output_pdb] -o [topology file] [trajectories] -s 50

1000 samples is not enough for tICA method, and using a lag time 20 could be too high (when using 1000 samples).