douweschulte / pdbtbx

A library to open/edit/save (crystallographic) Protein Data Bank (PDB) and mmCIF files in Rust.

Home Page:https://crates.io/crates/pdbtbx

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bad Models in PDB

OWissett opened this issue · comments

PDB files are relatively poor in quality, generally speaking, and it isn't uncommon for PDB files to raise errors during parsing.

Notably, I have seen that quite a few PDB files often have issues relating to models, such as different models having numbers of atoms or residues (which is simply a result of the experiments done to get these files - I guess NMR?).

I propose that some sort of fallback strategy for loading PDBs with bad models should be implemented.

Most of the time, I only care about the first model in the file, so being able to load a file which would typically result in a PDB parsing failure, we could just ignore the other models.

To maintain backwards compatibility, we could maybe implement a new struct PdbLoadOptions or something similar, using a builder pattern to set pdb loading options.

Sounds like a very good plan. I initially set out to make this fully compliant to spec, but as the rest of the world does not seem to hold the spec as dear it is better to make this library be able to hanled the structures from the rest of the world. I tried to give users the option for strict or loose parsing with the StrictnessLevel. Extending this to something like how you described it could be very useful.