choderalab / perses

Experiments with expanded ensembles to explore chemical space

Home Page:http://perses.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Overhaul perses writing of topology information

jchodera opened this issue · comments

Currently, perses generates the following topology information on setup:

  • out-complex.pdb : a single system (old? new?) in complex
  • out-solvent.pdb : a single system (old? new?) in solvent
  • out-hybrid_factory.npy.npz : a really slow to deserialize set of way more information than we need to write a trajectory

Additionally, we have two other issues:

  • the HybridTopologyFactory contains a htf._hybrid_topology object that does not contain all bonds for the new atoms
  • the analysis_particle_indices used to slice out only non-water atoms to write to the NetCDF file has atom indices out of order because it stacks them as | environment, core, and unique old atoms | unique new atoms | counterions | while hybrid_topology originally has them in the order | environment, core, and unique old atoms | water | counterions | unique new atoms |

I propose we restructure this so we have:

models/ : organize all PDB files here
  {complex,solvent}-{old,new}.pdb : all atoms 
  {complex,solvent}-solute-{old,new}.pdb : solute atoms 

and ensure the atoms in the NetCDF trajectories (checkpoint, standard) are written in the same order as the atoms in the PDB files.
Ideally, we could later write replica trajectories as XTC files directly instead of using the NetCDF file, though extracting coordinates doesn't take a huge amount of time.

We can do this for the new Protocol version, where we hopefully have a way to package these files in a more sane way.

Currently the serialized .pdb files are for the old system. We could overhaul the serialization as you mention but I think we could have clashes with the solvent when serializing a solvated version of the new systems (for both complex and solvent phases). Should we re-solvate the new systems? I don't know if that defeats the purpose of serializing these objects.

Maybe we only need the solute versions for the new systems? And keep both the solvated and solute for the old ones.

This should be solved via #1210