[Question] How to parse .json, .obo, or .owl to get dictionary of enzymes {id_go:{ec_1, ec_2, ..., ec_n}}
jolespin opened this issue · comments
Josh L. Espinoza commented
I'm trying to understand how I can use GOTATOOLS to parse any of the GO files to yield a dictionary that has the following structure:
{id_go: {ec_1, ec_2, ..., ec_n}}
I was able to load the obo file but I couldn't figure out how to get the enzymes:
from goatools.base import get_godag
godag = get_godag('Databases/GO/go-basic.obo', optional_attrs='relationship')
go = godag['GO:0000015']
for id_go, go in godag.items():
print(id_go, go.get_all_children())
#GO:0000001 set()
#GO:0000002 set()
#GO:0000006 set()
#GO:0000007 set()
#GO:0000009 {'GO:0033164', 'GO:0052917'}
They are definitely in there, I just don't how to access them:
%%bash
grep -c "EC:" /Users/jolespin/Databases/GO/go-basic.obo
# Databases/GO/go-basic.obo:26098
Haibao Tang commented
You are close, EC number is under xref
(you can check which field they are under in the .obo
file).
Here are some sample code:
from goatools.base import get_godag
godag = get_godag("go-basic.obo", optional_attrs="xref")
for id_go, go in godag.items():
ecs = [x for x in go.xref if x.startswith("EC:")]
if ecs:
print(id_go, ecs)
This prints out:
...
GO:0008557 ['EC:7.6.2.1']
GO:1901237 ['EC:7.3.2.6']
GO:0090450 ['EC:3.6.1.64']
GO:0043851 ['EC:2.1.1.246']