Jax CKB disease terms incorrect
ahwagner opened this issue · comments
Currently, 87% of associations from Jax CKB have DOID:162 - Cancer
. This isn't correct. Other resources are much more reasonable at 0-7%.
On reloading with the most recent data release, this number is closer to 24%--I'm not sure why this changed so dramatically (possible error in my original query?), but it is still far above our other resources.
The following terms are mapped to the generic term cancer
$ grep "\tCancer" disease_alias.tsv
Advanced Solid Tumor Cancer
Solid tumor Cancer
All Solid Tumors Cancer
Any cancer type Cancer
Solid tumors Cancer
Malignant neoplastic disease Cancer
All Tumors Cancer
Regarding jax:
$ cat jax.json | jq '.jax | .indication.name' | grep 'Advanced Solid Tumor' | wc -l
1330
That is 1330 out of 5754.
$ cat jax.json | jq '.jax | .indication.name' | wc -l
5754
These numbers are accurate per discussion with Sara Patterson at CKB. Resolving.