rdk / p2rank

P2Rank: Protein-ligand binding site prediction tool based on machine learning. Stand-alone command line program / Java library for predicting ligand binding pockets from protein structure.

Home Page:https://rdk.github.io/p2rank/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ResidueNumberWrapper ignore chain ?

skodapetr opened this issue · comments

I've been struggling to get same results with external csv feature as with conservation.

And I seem that they handle chains in a different way. For example in joined(mlig) dataset for 1gm8.
There are two conservation files, each for one chain.

A: 469 0.34542 K ...
B 655 0.74410 W ...

Input for my CSV feature is:

"A",,"179","0.34542"
"B",,"179","0.7441"

However, values assigned by my feature are not the same as by the conservation.

The loaded conservation score holds information for only one residue with number 179 and this value is from chain B - that is loaded later.

The conservation feature utilize Map with key ResidueNumberWrapper (unfortunately I have no idea why this class is there :( from the implementation of hash and hashCode:

    @Override
    public boolean equals(Object o) {
        if (this.is(o)) return true;
        if (o.is(null) || getClass() != o.getClass()) return false;
        ResidueNumberWrapper that = (ResidueNumberWrapper) o;
        return resNum != null ? equalsPositional(resNum,that.resNum) : that.resNum == null;
    }

    @Override
    public int hashCode() {
        if (resNum == null) return 0;
        final int prime = 31;
        int result = 1;
        result = prime * result + ((resNum.getInsCode() == null) ? 0 : resNum.getInsCode().hashCode());
        result = prime * result + ((resNum.getSeqNum() == null) ? 0 : resNum.getSeqNum().hashCode());
        return result;
    }

    public static boolean equalsPositional(ResidueNumber r1, ResidueNumber r2) {
        if (r1 == r2)
            return true;
        if (r2 == null)
            return false;
        if (r1.getInsCode() == null) {
            if (r2.getInsCode() != null)
                return false;
        } else if (!r1.getInsCode().equals(r2.getInsCode()))
            return false;
        if (r1.getSeqNum() == null) {
            if (r2.getSeqNum() != null)
                return false;
        } else if (!r1.getSeqNum().equals(r2.getSeqNum()))
            return false;

        return true;
    }

it seems that chain is not considered, meaning that this class is not able to distinguish residues with same number and insert code from different chains.

If you are interested I can post command and a branch with CSV feature that make this bug visible.

commented

I have no idea why ResidueNumberWrapper is needed and why chain id is not considered. You might ask Lukas what was the reason for it. I have never noticed it before... But anyway currently the fact that chain id is not considered just looks like a spectacular bug with implication that conservation never worked properly for multi-chain proteins. It could have been introduced by me when merging Lukas's code.

Should be easy to fix by delegating equals() and hashCode() or getting rid of ResidueNumberWrapper or getting rid of the wrapper altogether.

commented

Fix is on develop branch (6b4a246). Let me know if it works.

commented

Fixed.