Filtering CUI/TUI returned entities?
ddofer opened this issue · comments
When doing NER/NEL to UMLS/CUI entities, is there any way to configure the nlp pipe to exclude candidates by a predefined filtering list of CUIs or TUIs? e.g. to exclude any detected CUIs with TUI: T079 (Temporal Concept)?
Currently I'm doing it by post-hoc filtering, which is both inelegant, inneffecient, and doesn't help remove noisy detections. i.e., if the linker returns the first detected entity froma text, then post-hoc filtering to remove the TUI means I miss the relevant entities.
Current code extract:
`nlp.add_pipe("scispacy_linker",
config={"resolve_abbreviations": True,
"linker_name": "umls",
"max_entities_per_mention": 4, #5
"threshold":0.87 ## default is 0.8, paper mentions 0.99 as thresh
})
#...
EXCLUDE_TUIS_LIST = ["T079","T093"] #List of umls cui semtypes to exclude.
novel_cols_candidates_names = []
no_entities_list = []
novel_candidate_cuis = []
novel_candidate_cuis_nomenclatures = []
TUIs_list = []
for f in icu_feature_terms["name"]:
print(f)
doc =nlp(f)
linker = nlp.get_pipe("scispacy_linker")
if len(doc.ents)>0:
for j,entity in enumerate(doc.ents):
print(f"Entity #{j}:{entity}")
list_feature_cuis = [i[0] for i in entity._.kb_ents]
## add tui filt
s1 = len(list_feature_cuis)
# print(s1)
tui_filter_mask = [linker.kb.cui_to_entity[c][3][0] not in EXCLUDE_TUIS_LIST for c in list_feature_cuis]
list_feature_cuis = list(compress(list_feature_cuis,tui_filter_mask))
list_cuis_nomenclatures = [linker.kb.cui_to_entity[i[0]][1] for i in entity._.kb_ents]
# linker = nlp.get_pipe("scispacy_linker") #ORIG
list_cuis_nomenclatures = list(compress(list_cuis_nomenclatures,tui_filter_mask))
num_candidates = len(list_feature_cuis)
for c in list_feature_cuis:
TUIs_list.append(linker.kb.cui_to_entity[c][3][0]) # c[0]][3][0])
for cui in list_feature_cuis:
novel_cols_candidates_names.extend([f]*(num_candidates))
novel_candidate_cuis.extend(list_feature_cuis)
novel_candidate_cuis_nomenclatures.extend(list_cuis_nomenclatures)
else:
no_entities_list.append(f)
print(f"No Entity candidates for {f}")
`
Hi, this is not something exists right now, although is a reasonable feature request if you wanted to give implementing it a go! Otherwise, I recommend doing what you are doing and post hoc filtering (setting the threshold such that you get enough candidates after filtering)