clips / clicr

Machine reading comprehension on clinical case reports

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

can not run build_json_dataset.py

XiangruiCAI opened this issue · comments

I tried to run python build_json_dataset.py under the dataset-code folder, but encountered the following problem:
Traceback (most recent call last): File "build_json_dataset.py", line 6, in <module> from describe_data import * File "/home/xiangrui/clicr/dataset-code/describe_data.py", line 6, in <module> from text import remove_concept_marks ImportError: No module named 'text'

I really appreciate your efforts for opening this dataset. It will promote the research of reading comprehension on medical domain. It would be more helpful to provide detail commands to build/obtain the dataset. Thank you!

As mentioned in the readme, build_json_dataset.py still lacks the code for scraping the articles from the web. Ultimately, building from scratch is not necessary as it implies that you have access to BMJ Case Reports, in which case I can just share the dataset directly. If you or your organization have a subscription, please send me the proof of your subscription to my e-mail. Unfortunately at the moment we can not share the dataset to non-subscribers, but may be able to do so in the future provided that we reach an agreement with the publisher.