danlou / MedLinker

ECIR 2020 - MedLinker: Medical Entity Linking with Neural Representations and Dictionary Matching

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data is not downloadable

asad1996172 opened this issue · comments

I tried running the piece of code given in the ReadMe.md but am running into an error which is related to data folder. I wasn't able to download data.zip as it takes me to a 404 page. Can you guide me?

Thanks.

Hi,

Sorry for the late reply... Upon closer reading of the terms, I've confirmed that I'm not allowed to share data derivative from UMLS, unfortunately.

Still, as I pointed out in the README, you should be able to re-generate these contents using the create_umls_kb.py script.

As another alternative, you may be able to use/adapt scispacy's UMLS KB as a replacement.

In case you're only interested in the adaptations to the MedMentions dataset, I've uploaded that separately here:
https://drive.google.com/file/d/1wJdW3Tcb6VZ0z-d8XQahk2Gm4Cj0BrRu/view?usp=sharing

Best

Hello, even I am facing a similar issue. I tried to run the create_umls_kb.py script but it's giving the following error:

"""

Traceback (most recent call last):
File "scripts/create_umls_kb.py", line 10, in
umls_tree = construct_umls_tree_from_tsv('data/umls_semantic_type_tree.tsv') # change to your location
File "/home/keshav/anaconda3/envs/medlinker/lib/python3.6/site-packages/scispacy/umls_semantic_type_tree.py", line 82, in construct_umls_tree_from_tsv
for line in open(filepath, "r"):
FileNotFoundError: [Errno 2] No such file or directory: 'data/umls_semantic_type_tree.tsv'

"""

I tried to download the data from the google drive link. I have requested for the access as well.
But still there's no luck. Can you please let me know what do I have to so ?
Thanks
Regards

Sorry, didn't realize that had restricted permissions.
I've now accepted your request and updated permissions.

Best

In case you're having trouble accessing the umls_semantic_type_tree.tsv file from scispacy, you may also find that here:
https://drive.google.com/file/d/1UGRWvynFmLb5gSF0kc16Bsh4DTCdVMJ2/view?usp=sharing

Hello Danlou
Wished to ask one more thing, how do we train from scratch ?

The code available in this repo can help you train from scratch.

Check the 'create' methods in the 'matcher' scripts, as well as precompute_contextual.py for extracting embeddings from the NLMs.