Data is not downloadable

Question

Data is not downloadable

asad1996172 opened this issue 4 years ago · comments

I tried running the piece of code given in the ReadMe.md but am running into an error which is related to data folder. I wasn't able to download data.zip as it takes me to a 404 page. Can you guide me?

Thanks.

Dan Lou · Answer 1 · Fri Jul 31 2020 18:20:57 GMT+0800 (China Standard Time)

Hi,

Sorry for the late reply... Upon closer reading of the terms, I've confirmed that I'm not allowed to share data derivative from UMLS, unfortunately.

Still, as I pointed out in the README, you should be able to re-generate these contents using the create_umls_kb.py script.

As another alternative, you may be able to use/adapt scispacy's UMLS KB as a replacement.

In case you're only interested in the adaptations to the MedMentions dataset, I've uploaded that separately here:
https://drive.google.com/file/d/1wJdW3Tcb6VZ0z-d8XQahk2Gm4Cj0BrRu/view?usp=sharing

Best

Keshav Biyani · Answer 2 · Thu Aug 20 2020 19:55:30 GMT+0800 (China Standard Time)

Hello, even I am facing a similar issue. I tried to run the create_umls_kb.py script but it's giving the following error:

"""

Traceback (most recent call last):
File "scripts/create_umls_kb.py", line 10, in
umls_tree = construct_umls_tree_from_tsv('data/umls_semantic_type_tree.tsv') # change to your location
File "/home/keshav/anaconda3/envs/medlinker/lib/python3.6/site-packages/scispacy/umls_semantic_type_tree.py", line 82, in construct_umls_tree_from_tsv
for line in open(filepath, "r"):
FileNotFoundError: [Errno 2] No such file or directory: 'data/umls_semantic_type_tree.tsv'

"""

I tried to download the data from the google drive link. I have requested for the access as well.
But still there's no luck. Can you please let me know what do I have to so ?
Thanks
Regards

Dan Lou · Answer 3 · Thu Aug 20 2020 21:21:31 GMT+0800 (China Standard Time)

Sorry, didn't realize that had restricted permissions.
I've now accepted your request and updated permissions.

Best

Dan Lou · Answer 4 · Thu Aug 20 2020 21:27:13 GMT+0800 (China Standard Time)

In case you're having trouble accessing the umls_semantic_type_tree.tsv file from scispacy, you may also find that here:
https://drive.google.com/file/d/1UGRWvynFmLb5gSF0kc16Bsh4DTCdVMJ2/view?usp=sharing

Keshav Biyani · Answer 5 · Mon Aug 24 2020 20:38:40 GMT+0800 (China Standard Time)

Hello Danlou
Wished to ask one more thing, how do we train from scratch ?

Dan Lou · Answer 6 · Thu Aug 27 2020 18:31:04 GMT+0800 (China Standard Time)

The code available in this repo can help you train from scratch.

Check the 'create' methods in the 'matcher' scripts, as well as precompute_contextual.py for extracting embeddings from the NLMs.