Overhaul perses writing of topology information

Question

Overhaul perses writing of topology information

jchodera opened this issue 2 years ago · comments

John Chodera commented 2 years ago

Currently, perses generates the following topology information on setup:

out-complex.pdb : a single system (old? new?) in complex
out-solvent.pdb : a single system (old? new?) in solvent
out-hybrid_factory.npy.npz : a really slow to deserialize set of way more information than we need to write a trajectory

Additionally, we have two other issues:

the HybridTopologyFactory contains a htf._hybrid_topology object that does not contain all bonds for the new atoms
the analysis_particle_indices used to slice out only non-water atoms to write to the NetCDF file has atom indices out of order because it stacks them as | environment, core, and unique old atoms | unique new atoms | counterions | while hybrid_topology originally has them in the order | environment, core, and unique old atoms | water | counterions | unique new atoms |

I propose we restructure this so we have:

models/ : organize all PDB files here
  {complex,solvent}-{old,new}.pdb : all atoms 
  {complex,solvent}-solute-{old,new}.pdb : solute atoms

and ensure the atoms in the NetCDF trajectories (checkpoint, standard) are written in the same order as the atoms in the PDB files.
Ideally, we could later write replica trajectories as XTC files directly instead of using the NetCDF file, though extracting coordinates doesn't take a huge amount of time.

We can do this for the new Protocol version, where we hopefully have a way to package these files in a more sane way.

Iván Pulido · Answer 1 · Tue Jul 11 2023 03:40:37 GMT+0800 (China Standard Time)

Currently the serialized .pdb files are for the old system. We could overhaul the serialization as you mention but I think we could have clashes with the solvent when serializing a solvated version of the new systems (for both complex and solvent phases). Should we re-solvate the new systems? I don't know if that defeats the purpose of serializing these objects.

Maybe we only need the solute versions for the new systems? And keep both the solvated and solute for the old ones.

Iván Pulido · Answer 2 · Tue Aug 01 2023 22:21:10 GMT+0800 (China Standard Time)

This should be solved via #1210