douweschulte / pdbtbx

A library to open/edit/save (crystallographic) Protein Data Bank (PDB) and mmCIF files in Rust.

Home Page:https://crates.io/crates/pdbtbx

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Get information across hierarchies

DocKDE opened this issue · comments

Hello!
I recently started working on a project involving reading and editing PDB files for a very specific purpose. Somewhere along the way I became aware of this crate and am considering switching to using it because it's much more feature-complete than what I have written.
However, a question came to mind: Is there a way to access information across different hierarchies in the Structure? E.g. get a list of atoms from a given residue or determine which residue a given atom belongs to? I have not seen this functionality in the docs but I may have missed it.
Great work by the way!

For your first example ('get a list of atoms from a given residue') the answer is quite straight forward: residue.atoms(). Any Structure can generate iterators over all children and their children in the hierarchy. So from a PDB you can get (read or write references to) all models/chains/residues/conformers/atoms. For documentation look into the .models() function and the functions following.

For your second example ('determine which residue a given atom') initially there was a feature to allow for this, which made it possible to acces the parent of each Structure. But this gave rise to a whole host of issues regarding the ownership of the Structures so it was decided that this feature should be deleted (#28). The idiomatic way to do this now is using loops to go over all levels in the hierarchy which gives acces to all parents. Arguably this does not look very pretty, but often a couple of levels can be skipped. If mutability is needed all iterators can be changed to mutable iterators in a safe way, for example by changing .atoms() to .atoms_mut().

for model in pdb.models() {
    for chain in model.chains() {
        for residue in chain.residues() {
            for conformer in residue.conformers() {
                for atom in conformer.atoms() {
                    // Do the calculations
                }
            }
        }
    }
}

I have not encountered a situation where this approach did not fulfill, but if you have a problem where having acces to the parents would be beneficial, we can discuss adding it back in.

I am honored you find the library interesting, if you have any more questions or suggestions feel free to ask. I will try to give a satisfactory answer to all questions and if something is missing I will add it in (or happily accept PRs).

Thank you for your reply. I hadn't realized that the conformer vector contains the atoms.
Looking at my code I have actually implemented it in a similar (although simpler) way so there are a lot of such nested loops. However, I chose to include fields for Residue information in the Atom structs so this information can be gained just given an Atom and the parent residue retrieved with a separate function. I think this is rarely used in practice, though. I can definitely understand that you want to circumvent ownership problems :)

I agree that it can be somewhat hard to grasp the organization of all the levels, especially because the initial understanding of anyone would not include the Conformer struct. But alas this is needed to support all the weird edge cases from the PDB format (for more details see: https://docs.rs/pdbtbx/0.5.1/pdbtbx/index.html#pdb-hierarchy).

One thing I want to add is that you can circumvent all those loops by immediately choosing the right level (such as in the example in the docs) in the following way:

for atom in pdb.atoms() {
    // Do the calculations
}

// Or to also get the residues
for residue in pdb.residues() {
    for atom in residue.atoms() {
        // Do the calculations
    }
}

If you have any more question, or a piece of code you want to get working with the library but does not immediately work feel free reach out.