douweschulte / pdbtbx

A library to open/edit/save (crystallographic) Protein Data Bank (PDB) and mmCIF files in Rust.

Home Page:https://crates.io/crates/pdbtbx

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Residue Name and Serial Number in Atom struct

DocKDE opened this issue · comments

Okay, so I have the following situation:

    pub fn find_contacts(&self, level: u8) {
        let mut vdw_radii = HashMap::new();
        vdw_radii.insert("H".to_string(), 1.54);
        vdw_radii.insert("C".to_string(), 1.90);
        vdw_radii.insert("N".to_string(), 1.79);
        vdw_radii.insert("O".to_string(), 1.71);
        vdw_radii.insert("P".to_string(), 2.23);
        vdw_radii.insert("S".to_string(), 2.14);
        vdw_radii.insert("CL".to_string(), 2.06);
        vdw_radii.insert("NA".to_string(), 2.25);
        vdw_radii.insert("CU".to_string(), 2.17);

        let mut table = Table::new();
        table.set_format(*format::consts::FORMAT_NO_LINESEP_WITH_TITLE);
        table.set_titles(row![
            "Atom ID 1",
            "Atom Name 1",
            "Residue Name 1",
            "Atom ID 2",
            "Atom Name 2",
            "Residue Name 2",
            "Distance"
        ]);

        let mut tree = RTree::new();

        for residue in &self.residues {
            for atom in &residue.atoms {
                tree.insert(PointWithData::new(atom, atom.coords))
            }
        }

        for residue in &self.residues {
            for atom in &residue.atoms {
                let radius: f64 = match level {
                    0 => 1.0,
                    1 => {
                        let rad = vdw_radii
                            .get(&atom.element.to_uppercase())
                            .expect("No Radius found for given element.");
                        rad * rad
                    }
                    _ => panic!("Too high level given"),
                };
                let contacts = tree.locate_within_distance(atom.coords, radius);
                for item in contacts {
                    if item.data.atom_id < atom.atom_id
                        && !(item.data.res_id == atom.res_id && item.data.res_name == atom.res_name)
                        && !(item.data.atom_name == "C"
                            && atom.atom_name == "N"
                            && item.data.res_id + 1 == atom.res_id)
                    {
                        let distance = item.data.calc_distance(atom);
                            table.add_row(row![
                                bByFd =>
                                item.data.atom_id,
                                item.data.atom_name,
                                item.data.res_name,
                                atom.atom_id,
                                atom.atom_name,
                                atom.res_name,
                                format!("{:.2}", distance)
                    }
                }
            }
        }
        if table.len() > 1 {
            if level == 0 {
                println!("\nClash Analysis");
            } else if level == 1 {
                println!("\nContact Analysis");
            }
            table.printstd();
        }
    }

This function builds an R-Tree from Atoms and uses that to conduct searches for other Points that are spatially nearby. The idea is to use such information to find clashes of atoms that are too close by (had a problem with such a case recently). In order to filter out results from atoms that belong to the same residue (and thus are close to each other by construction) I use the information about the respective residue that I stored in the Atom structs.
During the process of refactoring my code with pdbtbx I couldn't think of a straightforward way to do this. In most cases I can get what I need by just iterating over residues and the atoms within them but here I only have the Points in the R-Tree as results which are Atom structs. I would have to do a separate iteration over the PDB struct to determine which residues these belonged to and then go from there.
So for cases such as this I would propose to add a residue_serial_number and residue_name field to the Atom struct (or maybe re-add since I think you had something like this at some point). If the respective fields of the Residue struct are kept, this should not break existing code or cause trouble with ownership, yes? I realize this would necessitate changing the instantiation methods and unit test and I tried implementing this myself but failed making the macro errors that cropped up go away...
What are your thoughts on this?

Maybe the easiest would be to add the residue information to your points in the Rtree. Would the following code work for your use case? In this way you will have all information you need while the structures will remain as minimalistic as possible.

PointWithData::new(((residue.name(), residue..serial_number()), atom), atom.pos())

For your information the technically the residue.serial_number() could occur multiple times, if this is a problem you should also use the residue.insertion_code(). Using this you should be able to rewrite your example to the following:

for residue in &pdb.residues() {
    for atom in &residue.atoms() {
        // The radius code can be simplified (see https://docs.rs/pdbtbx/0.6.1/pdbtbx/struct.Atom.html#method.atomic_radius)
        // Also if you feel the name could be better let me know, I am thinking about renaming it to vdw_radius and including other radii as well
        let contacts = tree.locate_within_distance(atom.pos(), atom.atomic_radius());
        for (other_residue, other_atom) in contacts {
            if other_atom < atom // Internally uses the serial number, available since v0.6.0
                // This is how testing for residue equality could look 
                && other_residue.0 != residue.name() 
                && other_residue.1 != residue.serial_number()
                // And the protein backbone
                && !(other_residue.0 == Some("C")
                    && residue.name() == Some("N")
                    && other_residue.1 + 1 == residue.serial_number())
            {
                table.add_row(row![
                    bByFd =>
                    other_atom.serial_number(),
                    other_atom.name(),
                    other_residue.0,
                    atom.serial_number(),
                    atom.name(),
                    residue.name(),
                    format!("{:.2}", atom.distance(other_atom))]);
            }
        }
    }
}

I also took the liberty to change some functions to ones I built into pdbtbx, use whatever you like, but I try to include such functions for general use. So if you find yourself define a couple functions like that, please report them so I can include them in the library.
Please let me know if this works for you, otherwise I will think about something else.

Edit: I took the liberty to colour your code example (by adding rust after the triple quotes to mark the start of the code block)

Yes, that's a good idea, thanks! Your code doesn't quite compile like this but I was able to make it work.
Some minor points: To insert a PointWithData into the RTree object the point needs to be an array and the pos method returns a tuple. That's why I constructed this manually from the x, y and z fields.
I was aware of the atomic_radius method, in fact I somewhat stole the idea (as you may have recognized). However, I saw the need to implement several different values for the radius variable in order to check for different levels of "closeness" of all the atoms. What I needed most was a way to check for atoms that are so close that their Lennard-Jones-Potential term will explode in a MM calculation and the atomic radius that's currently defined in pdbtbx is too large for this. For this specific use case I used something completely arbitrary but implementing several radii for several purposes might be worthwhile somewhere down the road.
Otherwise, I'm currently in the process of refactoring what I have to make use of what you wrote as much as possible so I'll make use of the methods you provided as well as I can but it takes a while to change everything. For now I'm trying to get it to work first and then make it more streamlined.
Thanks again for your great work and the help! I really appreciate it!