foundation29org / RareCrowds

Package to serve public and freely-available data from rare disease patients.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Could not subset

AtoanyFierro opened this issue · comments

I tried to run this lines:

These lines come from the previous code

ann = dann.data
del phen
print(f'# total initial entities: {len(ann)}')

Keep only disorders

for dis,val in list(ann.items()):
if val['group'] != 'Disorder':
del ann[dis]
print(f'# disases: {len(ann)}')

Keep only those with phenotypic information

for dis,val in list(ann.items()):
if not val.get('phenotype'):
del ann[dis]
print(f'# disases with phenotype data: {len(ann)}')

Remove clinial syndromes

for dis,val in list(ann.items()):
if val['type'].lower() == 'clinical syndrome':
del ann[dis]
print(f'# diseases w/o clinical syndromes: {len(ann)}')

Keep only selected prevalences

valid_prev = ['>1 / 1000', '6-9 / 10 000', '1-5 / 10 000', '1-9 / 100 000', 'Unknown', 'Not yet documented']
for dis, val in list(ann.items()):
if 'prevalence' in val:
classes = [a['class'] for a in val['prevalence'] if a['type'] == 'Point prevalence']
if not any(x in valid_prev for x in classes):
del ann[dis]
else:
del ann[dis]
print(f'# disases with valid prevalence: {len(ann)}')

and I get this:

total initial entities: 12082


KeyError Traceback (most recent call last)
in
6 ## Keep only disorders
7 for dis,val in list(ann.items()):
----> 8 if val['group'] != 'Disorder':
9 del ann[dis]
10 print(f'# disases: {len(ann)}')

KeyError: 'group'

How can I run these lines in order to perform a subset?

These lines are not intended as official package code. However, the error you are seeing is caused by #15. This is because in this case val does not have the 'group' key. Once #15 is solved it should work.

I am closing this as #15 should solve it