rdkit / mmpdb

A package to identify matched molecular pairs and use them to predict property changes.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can I get a table of all rules, as well as the number of pairs, and statistics for each rule?

chengthefang opened this issue · comments

Hi,

I wonder if there is a way to obtain the information of all rules, as well as the number of pairs and the statistics for each rule for a built fragment(mmpdb) database. A simple output table I expect is like:

            from_smiles(smirks) .   To_smiles(smirks)    # of pairs      Mean    std

rule1 ***** *****
rule2 ***** *****
.....

I am pretty interested in presenting the rules from a database in a similar way to the Tables1-5 & Figure 5 in your publication "J. Med. Chem. 2018, 61, 3277−3292". Would you mind me give some hints how to achieve that through mmpdb codes?

Thanks,
Cheng

There is not. These sorts of reports are not hard to do in SQL - though that requires someone who knows SQL, of course. You might start by taking a look at schema.py:

    def get_property_rule(self, property_name_id, rule_environment_id, is_reversed, cursor=None):
        c = self.mmpa_db.execute(
            "SELECT rule_environment.rule_id, from_smiles.smiles, from_smiles.num_heavies, to_smiles.smiles, to_smiles.num_heavies, "
            "          rule_environment.radius, rule_environment.environment_fingerprint_id, "
            "          environment_fingerprint.fingerprint, "
            "          rule_environment_statistics.id, count, avg, std, kurtosis, skewness, min, q1, median, q3, max, paired_t, p_value "
            "  FROM rule, rule_environment, environment_fingerprint, rule_environment_statistics, "
            "          rule_smiles as from_smiles, rule_smiles as to_smiles "
            " WHERE rule_environment.id = ? "
            "   AND rule_environment_statistics.rule_environment_id = ? "
            "   AND rule_environment_statistics.property_name_id = ? "
            "   AND rule_environment.rule_id = rule.id "
            "   AND rule_environment.environment_fingerprint_id = environment_fingerprint.id "
            "   AND rule.from_smiles_id = from_smiles.id "
            "   AND rule.to_smiles_id = to_smiles.id ",
            (rule_environment_id, rule_environment_id, property_name_id), cursor=cursor)

Bear in mind that each property and environment radius has its own set of rules.

Hi Adalke,

Thank you for your kind comments! I would to give a try.

Best,
Cheng

Hi Cheng,

did this solve your problem? If yes, I woudl like to close the issue.

Best regards,
Christian

@KramerChristian Thanks, Christian. This solved my problem. I will close the issue from my end.

I don't know how to use SQL to get the table of all rules, as well as the number of pairs, and statistics for each rule. Could you please share your steps to achieve this? @adalke @chengthefang

There is not. These sorts of reports are not hard to do in SQL - though that requires someone who knows SQL, of course. You might start by taking a look at schema.py:

    def get_property_rule(self, property_name_id, rule_environment_id, is_reversed, cursor=None):
        c = self.mmpa_db.execute(
            "SELECT rule_environment.rule_id, from_smiles.smiles, from_smiles.num_heavies, to_smiles.smiles, to_smiles.num_heavies, "
            "          rule_environment.radius, rule_environment.environment_fingerprint_id, "
            "          environment_fingerprint.fingerprint, "
            "          rule_environment_statistics.id, count, avg, std, kurtosis, skewness, min, q1, median, q3, max, paired_t, p_value "
            "  FROM rule, rule_environment, environment_fingerprint, rule_environment_statistics, "
            "          rule_smiles as from_smiles, rule_smiles as to_smiles "
            " WHERE rule_environment.id = ? "
            "   AND rule_environment_statistics.rule_environment_id = ? "
            "   AND rule_environment_statistics.property_name_id = ? "
            "   AND rule_environment.rule_id = rule.id "
            "   AND rule_environment.environment_fingerprint_id = environment_fingerprint.id "
            "   AND rule.from_smiles_id = from_smiles.id "
            "   AND rule.to_smiles_id = to_smiles.id ",
            (rule_environment_id, rule_environment_id, property_name_id), cursor=cursor)

Bear in mind that each property and environment radius has its own set of rules.