Possible bug parsing H atoms on nucleic structure
brianjimenez opened this issue · comments
When trying to parse a nucleic structure in PDB format, it seems than the number of atoms found is bigger than the number of the actual atoms in the PDB file. It seems from a few tests that hydrogen atoms are added multiple times to the internal PDB object.
Here it is a code showing the problem:
use std::env;
fn main() {
let cargo_path = match env::var("CARGO_MANIFEST_DIR") {
Ok(val) => val,
Err(_) => String::from("."),
};
let test_path: String = format!("{}/tests", cargo_path);
let structure_filename: String = format!("{}/nucleic.pdb", test_path);
println!("Reading input structure: {}", structure_filename);
let (structure, _errors) = pdbtbx::open(&structure_filename, pdbtbx::StrictnessLevel::Medium).unwrap();
println!("{}", structure.atom_count());
for atom in structure.atoms() {
println!("{}", atom);
}
}
And these are the last 10 lines of the output:
ATOM ID: H41, Number: 58, Element: H, X: 14.552, Y: 16.481, Z: 2.862, OCC: 1, B: 0, ANISOU: false
ATOM ID: H5'2, Number: 63, Element: H, X: 16.762, Y: 8.967, Z: -2.135, OCC: 0.03333333333333333, B: 0, ANISOU: false
ATOM ID: H42, Number: 59, Element: H, X: 14.094, Y: 15.743, Z: 4.072, OCC: 1, B: 0, ANISOU: false
ATOM ID: H5'2, Number: 63, Element: H, X: 16.762, Y: 8.967, Z: -2.135, OCC: 0.03333333333333333, B: 0, ANISOU: false
ATOM ID: H2'1, Number: 60, Element: H, X: 13.942, Y: 11.912, Z: -1.539, OCC: 1, B: 0, ANISOU: false
ATOM ID: H5'2, Number: 63, Element: H, X: 16.762, Y: 8.967, Z: -2.135, OCC: 0.03333333333333333, B: 0, ANISOU: false
ATOM ID: H2'2, Number: 61, Element: H, X: 12.539, Y: 11.338, Z: -1.169, OCC: 1, B: 0, ANISOU: false
ATOM ID: H5'2, Number: 63, Element: H, X: 16.762, Y: 8.967, Z: -2.135, OCC: 0.03333333333333333, B: 0, ANISOU: false
ATOM ID: H5'1, Number: 62, Element: H, X: 17.258, Y: 10.053, Z: -1.131, OCC: 1, B: 0, ANISOU: false
ATOM ID: H5'2, Number: 63, Element: H, X: 16.762, Y: 8.967, Z: -2.135, OCC: 0.03333333333333333, B: 0, ANISOU: false
I've prepared the full test ready to be executed: test_pdbtbx.tar.gz
Thank you in advance for your support, congratulations on the great work coding this library!
Also happening when using pdbtbx::StrictnessLevel::Strict
That is very interesting I will take a look, thanks for raising the issue!
Okay I found it, in the parsing code the conformer ID (Residue Name) was not properly trimmed. This meant that for each atom in the residue there was a separate conformer. This in the end made it so that the library duplicated the last atom so that it would be present in all conformers.
I added your PDB and example code to the tests to make sure this bug cannot ever surface again.
Thanks again for raising the issue.
That was fast! Thank you @douweschulte for your quick reply and fix. Any plans to release a new version on crates.io any time soon?
I will create a new patch version later today. But if you need it for use in a rust project it is also possible to link to a git repo directly instead of a package on crates.io.
Awesome, just seen it on crates.io 📦 Thanks a lot! 🍻