asreview / asreview

Active learning for systematic reviews

Home Page:https://asreview.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add Dataset throw an exception

jf29medma opened this issue · comments

Describe the bug
Right now I wanted to try out the python API, following this simulation https://asreview.readthedocs.io/en/latest/simulation_api_example.html

When entering the 5th code an exception occured "BadFileFormatError". Normally I would expect that this line would add the dataset to the project. I tried using my own data that works in the ASReview lab, but it throws the same exception as in the example.

Lines of code:
project.add_dataset("van_de_Schoot_2017.csv")

The full error:
Screenshot 2023-08-21 075645

Thank you for reporting! What version of ASReview are you using??

I am using version 1.2.1

Hello,
I could not find a solution yet. Did it work for you when you follow the sample tutorial?

Thank you!

Hi @jf29medma, the problem is with downloading the dataset! In the code, step 4 downloads a dataset, but it seems the link is broken.

Visiting https://raw.githubusercontent.com/asreview/systematic-review-datasets/master/datasets/van_de_Schoot_2017/output/van_de_Schoot_2017.csv for me shows an 404: Not Found.

And so does the downloaded dataset:
image

You can either select your own dataset instead (by moving it to tmp_data/api_simulation/data/ or use a different link in the curl code).

We do not host datasets in plaintext anymore for legal reasons, but you can use our demo dataset: https://raw.githubusercontent.com/asreview/asreview/master/tests/demo_data/generic_labels.csv

You could also take a look at SYNERGY for getting research datasets.

@PeterLombaers @J535D165 We should update the API example

First of all thank you @jteijema

Update: It worked for me doing the tutorial locally in VSC! Doing this with jupyter notebook I run into above mentioned issues, maybe there are some issues accessing data in my google drive. If I find a solution I will mention it here :D

@jf29medma, glad to hear the tutorial worked for you in VSC. If you find a solution to the issues you encountered with Jupyter Notebook, please do share. It could be helpful for the community. 😊

Thanks for reporting and the suggestion to use SYNERGY. SYNERGY would be the best solution. I made a quickfix for this example:

curl https://raw.githubusercontent.com/asreview/systematic-review-datasets/metadata-v1-final/datasets/van_de_Schoot_2017/output/van_de_Schoot_2017.csv > tmp_data/api_simulation/data/van_de_Schoot_2017.csv