douweschulte / pdbtbx

A library to open/edit/save (crystallographic) Protein Data Bank (PDB) and mmCIF files in Rust.

Home Page:https://crates.io/crates/pdbtbx

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mutable atoms_with_hierarchy

DocKDE opened this issue · comments

I realized that the AtomWithHierarchy struct comes with the limitation of containing only immutable references to everything. So I came across the issue that if I want to replace iterations over, e.g., all residues and then all atoms within those residues with just the iteration over AtomWithHierarchys, I cannot mutate those (which is something i frequently need to do).
Something like:

pub struct AtomWithHierarchyMut<'a> {
    /// The Chain containing this Atom
    pub chain: &'a mut Chain,
    /// The Residue containing this Atom
    pub residue: &'a mut Residue,
    /// The Conformer containing this Atom
    pub conformer: &'a mut Conformer,
    /// This Atom
    pub atom: &'a mut Atom,
}

And a method (or two) to instantiate this?

Would it be feasible to create a method like atoms_with_hierarchy_mut to remedy this? I haven't tried implementing this because I was afraid I had overlooked some obvious problem with borrowing this would raise and I wanted to discuss it first.

I agree that this is something that would be neat to be able to do, however the mutability rules of Rust do not permit to have a mut reference to a structure when part of that structure is referenced at the same time. This prevents your example in that it cannot give a mut reference to all properties. This is logical in a way because having a mut reference to the Conformer you could delete the Atom that is referenced by the AtomWithHierarchyMut which would invalidate the Atom reference. To combat this a structure could be made that provides a single level of mutability upon request and is invalidated at the same time (see example below).

pub struct AtomWithHierarchyMut<'a> {
    // Likely does not work with &mut directly, but essentially it would work like this
    /// The Chain containing this Atom
    chain: &'a mut Chain,
    /// The Residue containing this Atom
    residue: &'a mut Residue,
    /// The Conformer containing this Atom
    conformer: &'a mut Conformer,
    /// This Atom
    atom: &'a mut Atom,
}

impl AtomWithHierarchyMut {
    pub fn chain_mut(&mut Self) -> &mut Chain {
        // Generate mut ref
        // By needing a mut ref to the struct it is guaranteed to be valid according to the Rust rules:
        //   * There can never be two ref mut at the same time (to the same structure)
        //   * There can never be both a ref and a ref mut at the same time (to the same structure)
   }

   pub fn chain(&Self) -> &Chain {
        // Generate ref
        // By needing a ref to the struct it is guaranteed to never be used at the same time 
        // as the other method to generate a mut ref.
   }

   // all other structs wil also be defined...
}

This is indeed some arcane Rust but I would be willing to give it a try if you think it would be useful. Also if you think you are up to the challenge I would be willing to help you along.

Here are some pointers to information for how to build something like this:

Hm, indeed I thought I had overlooked something. I assume a struct with owned fields is not possible either, right? I seem to recall that you did it like this to circumvent ownership issues.
I have in fact read the Rust book and came across interior mutability but I don't have any practical experience with it. I'll see if I get around to it but I'm not sure when that will be or how far I'll get but we'll see. Might be good practice.

Owned fields will also not be possible, because multiple instances of the AtomWithHierarchyMut struct would then have to own the same structures.
I did indeed create a not mut variant to circumvent these problems, mut I agree that having a mut variant would certainly be helpful for certain circumstances.
If you come around to trying your hand at this feel free to ask all your questions if you ever get stuck or just need another pair of eyes to look at a problem. Also do not feel obliged to work on this I could also try my hand at it.
Sidenote, if you mark a PR as 'Draft' I can already view it and give comments but I cannot merge it into the master branch until it is finished.

Wow, my head is smoking after diving into the issue a bit and have come across a (probably stupid question). Since the AtomWithHierarchy struct is mainly a way for an Atom to know which superstructures it belongs to, wouldn't something like this be possible:

pub struct AtomWithHierarchyMut<'a> {
    // Likely does not work with &mut directly, but essentially it would work like this
    /// The Chain containing this Atom
    chain: &'a Chain,
    /// The Residue containing this Atom
    residue: &'a Residue,
    /// The Conformer containing this Atom
    conformer: &'a Conformer,
    /// This Atom
    atom: Atom,
}

I know a struct with all fields as owned doesn't work because the same residue/conformer/chain would need to be owned by several atoms but if only the Atom is owned, this problem shouldn't arise. That would of course mean that only the Atom can be modified but in my opinion that's the only goal anyway.
Other than that, I tried to devise a struct that has an atom: RefCell<&'a Atom> but unless I misunderstood something, this will not be able to be mutated because inside the RefCell the Atom is still behind a reference that won't allow this.
However, if a field like atom: RefCell<Atom> is used, one might as well go with a simple owned field.

So far my thoughts about this. If you have any input to share, I'll be very glad to hear it, the issue is a bit difficult to wrap my head around.

I do agree that it is quite a hard issue. But I do think I found a way to solve it. By using *mut T (mutable raw pointers) you can circumvent the Rust borrow checker (as long as you do it in unsafe blocks). So I saved raw pointers in the struct and needed to use raw pointers in the creation of the struct to be able to both use the reference to get its children and pass it to the AtomWithHierarchyMut constructor. The PhantomData is needed to let rust enforce proper lifetimes for the structure, in this way it can be enforced that although unsafe is used the structure itself follows the Rust borrow rules. The b lifetime in the impl block was needed to make sure that a hierarchy can be used mutable and immutably after each other, if this b would be set to `a a given structure can only be borrowed mutably or immutably for its entire lifetime.

Maybe this approach proves to be helpful it can certainly be extended to all parameters (and to rstar support) but I did not write that for sake of brevety.

atom_with_hierarchy.rs

#[derive(Debug)]
/// A version of the AtomWithHierarchy that allows mutable access to its members in a safe way
pub struct AtomWithHierarchyMut<'a> {
    /// The Chain containing this Atom
    chain: *mut Chain,
    /// The Residue containing this Atom
    residue: *mut Residue,
    /// The Conformer containing this Atom
    conformer: *mut Conformer,
    /// This Atom
    atom: *mut Atom,
    phantom: PhantomData<&'a usize>,
}

impl<'a, 'b> AtomWithHierarchyMut<'a> {
    pub(crate) fn new(
        chain: *mut Chain,
        residue: *mut Residue,
        conformer: *mut Conformer,
        atom: *mut Atom,
    ) -> Self {
        AtomWithHierarchyMut {
            chain,
            residue,
            conformer,
            atom,
            phantom: PhantomData,
        }
    }

    /// Create an AtomWithHierarchy from a Tuple containing all needed references
    pub fn from_tuple(
        hierarchy: (*mut Chain, *mut Residue, *mut Conformer, *mut Atom),
    ) -> AtomWithHierarchyMut<'a> {
        AtomWithHierarchyMut {
            chain: hierarchy.0,
            residue: hierarchy.1,
            conformer: hierarchy.2,
            atom: hierarchy.3,
            phantom: PhantomData,
        }
    }

    #[allow(clippy::unwrap_used)]
    /// Get a reference to the chain
    pub fn chain(&'b self) -> &'b Chain {
        unsafe { self.chain.as_ref().unwrap() }
    }

    #[allow(clippy::unwrap_used)]
    /// Get the chain with a mutable reference
    pub fn chain_mut(&'b mut self) -> &'b mut Chain {
        unsafe { self.chain.as_mut().unwrap() }
    }

    #[allow(clippy::unwrap_used)]
    /// Get a reference to the residue
    pub fn residue(&'b self) -> &'b Residue {
        unsafe { self.residue.as_ref().unwrap() }
    }

    #[allow(clippy::unwrap_used)]
    /// Get the residue with a mutable reference
    pub fn residue_mut(&'b mut self) -> &'b mut Residue {
        unsafe { self.residue.as_mut().unwrap() }
    }
}

model.rs

/// Returns all atom with their hierarchy struct for each atom in this model.
    #[allow(clippy::unwrap_used)]
    pub fn atoms_with_hierarchy_mut(&'a mut self) -> Vec<AtomWithHierarchyMut<'a>> {
        let mut output = Vec::new();
        unsafe {
            for ch in self.chains_mut() {
                let chain: *mut Chain = ch;
                for r in chain.as_mut().unwrap().residues_mut() {
                    let residue: *mut Residue = r;
                    for c in residue.as_mut().unwrap().conformers_mut() {
                        let conformer: *mut Conformer = c;
                        for atom in conformer.as_mut().unwrap().atoms_mut() {
                            output.push(AtomWithHierarchyMut::new(chain, residue, conformer, atom));
                        }
                    }
                }
            }
        }
        output
    }
// And the below in the test block
#[test]
    #[allow(clippy::unwrap_used)]
    fn test_hierarchy_mut() {
        let mut a = Model::new(0);
        a.add_chain(Chain::new("A").unwrap());
        a.add_atom(
            Atom::new(false, 0, "ATOM", 0.0, 0.0, 0.0, 0.0, 0.0, "C", 0).unwrap(),
            "A",
            (0, None),
            ("ALA", None),
        );
        for mut hierarchy in a.atoms_with_hierarchy_mut() {
            hierarchy.residue().serial_number();
            hierarchy.chain_mut().set_id("B");
            hierarchy.residue().serial_number();
            hierarchy.residue_mut().set_serial_number(1);
            hierarchy.chain_mut().set_id("C");
        }
        assert_eq!(a.chain(0).unwrap().id(), "C");
        assert_eq!(a.residue(0).unwrap().serial_number(), 1);
        assert_eq!(a.conformer(0).unwrap().name(), "ALA");
        assert_eq!(a.atom(0).unwrap().serial_number(), 0);
    }

Huh, I'd never date touch unsafe Rust. At least not yet. I think my Rust is not good enough to really judge the soundness of what you did there but looking at my thoughts on this (see the PR I just added :D): do you think it would be reasonable to do this with something like:

#[derive(Debug)]
/// A version of the AtomWithHierarchy that allows mutable access to its members in a safe way
pub struct AtomWithHierarchyMut<'a> {
    /// The Chain containing this Atom
    chain: Rc<RefCell<Chain>>,
    /// The Residue containing this Atom
    residue: Rc<RefCell<Residue>>,
    /// The Conformer containing this Atom
    conformer: Rc<RefCell<Conformer>>,
    /// This Atom
    atom: Rc<RefCell<Atom>>,
}

This should make it possible to have shared references for the struct fields one in interested in and then mutate those in place without resorting to unsafe. I haven't gotten very far with this thought yet but I might give it another try.

Haha yes I agree that unsafe Rust needs a bit of courage. It was also quite interesting to see a PR pop up right when I was writing a comment to this issue. The problem I see with your implementation is the cloning of all levels in the hierarchy. This means that the struct you generate has a fresh copy of all of its levels, so chain will contain a fresh copy of the full chain. This means that any change to any level will not change the same thing on another level, or the original PDB.

I do think we need unsafe as we 'want' to circumvent the Rust borrow rules, we want (at least internally) mutable references to all levels, and only pass them one at a time to the users of the struct so that from the outside the struct behaves according to the Rust rules.

I added the implementation (extended a bit) I showed before to the crate. For now it only creates the mutable versions on model.atoms_with_hierarchy_mut() obviously it can be useful on other places as well. If you want to see it at other places please let me know as building it in is somewhat hard with the unsafe stuff.

In my code, it would be most useful to have a method for the PDB struct. I think this also applies to the general case because most workflows start with calling an iterator on the PDB. I appreciate you taking the time to do this and I realize it isn't easy to do.

I will take a look at that, I do have some ideas to create it. And I do like the challenge of this piece of code ;-).

I refactored the structure into a more generic set of structs and traits. The end result should be more intuitive for the user. And this made it possible for every level in the hierarchy of a PDB struct to generate some kind of AtomWithHierarchy. On every level these functions are implemented: atoms_with_hierarchy, atoms_with_hierarchy_mut, binary_find_atom, and binary_find_atom_mut which all return some kind of hierarchy. I did not yet recreate any parallel versions for this, but I will look into them.

Do you think this would cover your use cases? Do you miss some functionality?

This is the updated main example:

use pdbtbx;
use pdbtbx::hierarchy::*;
let (mut pdb, _errors) = pdbtbx::open("example-pdbs/1ubq.pdb", pdbtbx::StrictnessLevel::Medium).unwrap();

for hierarchy in pdb.atoms_with_hierarchy() {
    println!("Atom {} in Conformer {} in Residue {} in Chain {} in Model {}",
        hierarchy.atom().serial_number(),
        hierarchy.conformer().name(),
        hierarchy.residue().serial_number(),
        hierarchy.chain().id(),
        hierarchy.model().serial_number(),
    );
}
// Or with mutable access to the members of the hierarchy
for mut hierarchy in pdb.atoms_with_hierarchy_mut() {
    let new_x = hierarchy.atom().x() * 1.5;
    hierarchy.atom_mut().set_x(new_x);
}

This looks like it would cover my use cases nicely. It would mostly serve to simplify constructs like this:

pdb.par_residues_mut().for_each(|res| {
        if list.contains(&res.serial_number()) {
            res.par_atoms_mut().for_each(|atom| match partial {
                None => edit(atom),
                Some(Partial::Sidechain) => {
                    if !atom.is_backbone() {
                        edit(atom)
                    }
                }
                Some(Partial::Backbone) => {
                    if atom.is_backbone() {
                        edit(atom)
                    }
                }
            })
        }
    })

Thanks for the work!

Great to hear!